csv - How to read index data as string with Python pandas? -
i'm trying read csv file dataframe pandas, , want read index row string. however, if row index doesn't have characters, pandas handles data integer. how read string?
to specific, code follow.
sample.csv
uid,f1,f2,f3 01,0.1,1,10 02,0.2,2,20 03,0.3,3,30
the code
df = pd.read_csv('sample.csv', index_col="uid" dtype=float) print df.index.values
the result
>>> [1 2 3]
but, hope result
>>> ['01', '02', '03']
and additional condition.
the rest of index data have numeric value , they're many , can't point them specific column names.
pass dtype
param specify dtype:
in [159]: import pandas pd import io t="""uid,f1,f2,f3 01,0.1,1,10 02,0.2,2,20 03,0.3,3,30""" df = pd.read_csv(io.stringio(t), dtype={'uid':str}) df.set_index('uid', inplace=true) df.index out[159]: index(['01', '02', '03'], dtype='object', name='uid')
so in case following should work:
df = pd.read_csv('sample.csv', dtype={'uid':str}) df.set_index('uid', inplace=true)
there still outstanding bug here dtype param ignored on cols treated index following doesn't work:
df = pd.read_csv('sample.csv', dtype={'uid':str}, index_col='uid')
you can dynamically if assume first column index column:
in [171]: t="""uid,f1,f2,f3 01,0.1,1,10 02,0.2,2,20 03,0.3,3,30""" cols = pd.read_csv(io.stringio(t), nrows=1).columns.tolist() index_col_name = cols[0] dtypes = dict(zip(cols[1:], [float]* len(cols[1:]))) dtypes[index_col_name] = str df = pd.read_csv(io.stringio(t), dtype=dtypes) df.set_index('uid', inplace=true) df.info() <class 'pandas.core.frame.dataframe'> index: 3 entries, 01 03 data columns (total 3 columns): f1 3 non-null float64 f2 3 non-null float64 f3 3 non-null float64 dtypes: float64(3) memory usage: 96.0+ bytes in [172]: df.index out[172]: index(['01', '02', '03'], dtype='object', name='uid')
here read header row column names:
cols = pd.read_csv(io.stringio(t), nrows=1).columns.tolist()
we generate dict of column names desired dtypes:
index_col_name = cols[0] dtypes = dict(zip(cols[1:], [float]* len(cols[1:]))) dtypes[index_col_name] = str
we index name, assuming it's first entry , create dict rest of cols , assign float
desired dtype , add index col specifying type str
, can pass dtype
param read_csv
Comments
Post a Comment