python - Pandas: group by index value, then calculate quantile? -
i have dataframe indexed on month column (set using df = df.set_index('month'), in case that's relevant):
org_code ratio_cost month 2010-08-01 1847 8.685939 2010-08-01 1848 7.883951 2010-08-01 1849 6.798465 2010-08-01 1850 7.352603 2010-09-01 1847 8.778501 i want add new column called quantile, assign quantile value each row, based on value of ratio_cost month.
so example above might this:
org_code ratio_cost quantile month 2010-08-01 1847 8.685939 100 2010-08-01 1848 7.883951 66.6 2010-08-01 1849 6.798465 0 2010-08-01 1850 7.352603 33.3 2010-09-01 1847 8.778501 100 how can this? i've tried this:
df['quantile'] = df.groupby('month')['ratio_cost'].rank(pct=true) but keyerror: 'month'.
update: can reproduce bug.
here csv file: http://pastebin.com/raw/6xbjvel0
and here code reproduce error:
df = pd.read_csv('temp.csv') df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month') df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=true) print df['percentile'] i'm using pandas 0.17.1 on osx.
you have sort_index before rank:
import pandas pd df = pd.read_csv('http://pastebin.com/raw/6xbjvel0') df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month') df = df.sort_index() df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=true) print df['percentile'].head() month 2010-08-01 0.2500 2010-08-01 0.6875 2010-08-01 0.6250 2010-08-01 0.9375 2010-08-01 0.7500 name: percentile, dtype: float64
Comments
Post a Comment