python - Pandas: group by index value, then calculate quantile? -
i have dataframe indexed on month
column (set using df = df.set_index('month')
, in case that's relevant):
org_code ratio_cost month 2010-08-01 1847 8.685939 2010-08-01 1848 7.883951 2010-08-01 1849 6.798465 2010-08-01 1850 7.352603 2010-09-01 1847 8.778501
i want add new column called quantile
, assign quantile value each row, based on value of ratio_cost
month.
so example above might this:
org_code ratio_cost quantile month 2010-08-01 1847 8.685939 100 2010-08-01 1848 7.883951 66.6 2010-08-01 1849 6.798465 0 2010-08-01 1850 7.352603 33.3 2010-09-01 1847 8.778501 100
how can this? i've tried this:
df['quantile'] = df.groupby('month')['ratio_cost'].rank(pct=true)
but keyerror: 'month'
.
update: can reproduce bug.
here csv file: http://pastebin.com/raw/6xbjvel0
and here code reproduce error:
df = pd.read_csv('temp.csv') df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month') df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=true) print df['percentile']
i'm using pandas 0.17.1 on osx.
you have sort_index
before rank
:
import pandas pd df = pd.read_csv('http://pastebin.com/raw/6xbjvel0') df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month') df = df.sort_index() df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=true) print df['percentile'].head() month 2010-08-01 0.2500 2010-08-01 0.6875 2010-08-01 0.6250 2010-08-01 0.9375 2010-08-01 0.7500 name: percentile, dtype: float64
Comments
Post a Comment