python - Pandas: group by index value, then calculate quantile? -


i have dataframe indexed on month column (set using df = df.set_index('month'), in case that's relevant):

             org_code  ratio_cost    month 2010-08-01   1847      8.685939      2010-08-01   1848      7.883951      2010-08-01   1849      6.798465      2010-08-01   1850      7.352603      2010-09-01   1847      8.778501      

i want add new column called quantile, assign quantile value each row, based on value of ratio_cost month.

so example above might this:

             org_code  ratio_cost   quantile month 2010-08-01   1847      8.685939     100  2010-08-01   1848      7.883951     66.6  2010-08-01   1849      6.798465     0   2010-08-01   1850      7.352603     33.3 2010-09-01   1847      8.778501     100 

how can this? i've tried this:

df['quantile'] = df.groupby('month')['ratio_cost'].rank(pct=true) 

but keyerror: 'month'.

update: can reproduce bug.

here csv file: http://pastebin.com/raw/6xbjvel0

and here code reproduce error:

df = pd.read_csv('temp.csv') df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month') df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=true) print df['percentile'] 

i'm using pandas 0.17.1 on osx.

you have sort_index before rank:

import pandas pd  df = pd.read_csv('http://pastebin.com/raw/6xbjvel0')  df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month')  df = df.sort_index()  df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=true) print df['percentile'].head()  month 2010-08-01    0.2500 2010-08-01    0.6875 2010-08-01    0.6250 2010-08-01    0.9375 2010-08-01    0.7500 name: percentile, dtype: float64 

Comments

Popular posts from this blog

Hatching array of circles in AutoCAD using c# -

ios - UITEXTFIELD InputView Uipicker not working in swift -