pandas - Read rows in text file based on different column headers using Python -


i have text file contains data 2 tables 2 column headers. want read each line , save lines or rows in 2 files each table. want lines after (number amount) columns header in 1 file , (code volume dim) column header in other. or, want separate data similar table.

first few lines of file. number of rows in tables may vary.

    number amount     10 34     23 65     54 07      code volume dim     1 56 34     23 57 565      number amount     40 674     73 2365      code volume dim     341 3456 6534     23 0957 908565      number amount     210 4534     2343 4565 

same pattern repeats.

if text number same in file, , text code too, can use read_csv , filter columns rows subset contains , isnull, notnull:

import pandas pd import numpy np import io  temp=u""" number amount     10 34     23 65     54 07      code volume dim     1 56 34     23 57 565      number amount     40 674     73 2365      code volume dim     341 3456 6534     23 0957 908565      number amount     210 4534     2343 4565""" 
#after testing replace io.stringio(temp) filename df = pd.read_csv(io.stringio(temp), sep="\s+", index_col=none, skip_blank_lines=true, names=['a', 'b', 'c']) print df                b       c 0   number  amount     nan 1       10      34     nan 2       23      65     nan 3       54      07     nan 4     code  volume     dim 5        1      56      34 6       23      57     565 7   number  amount     nan 8       40     674     nan 9       73    2365     nan 10    code  volume     dim 11     341    3456    6534 12      23    0957  908565 13  number  amount     nan 14     210    4534     nan 15    2343    4565     nan 
df = df[~((df.a.str.contains('number')) | (df.a.str.contains('code')))]  df1 = df[df.c.isnull()] df1 = df1[['a', 'b']].reset_index(drop=true) df1.columns = ['number','amount']  #if numbers integers df1 = df1.astype(int) print df1    number  amount 0      10      34 1      23      65 2      54       7 3      40     674 4      73    2365 5     210    4534 6    2343    4565  df2 = df[df.c.notnull()].reset_index(drop=true) df2.columns = ['code','volume', 'dim'] #if numbers integers df2 = df2.astype(int) print df2    code  volume     dim 0     1      56      34 1    23      57     565 2   341    3456    6534 3    23     957  908565 

Comments

Popular posts from this blog

Hatching array of circles in AutoCAD using c# -

ios - UITEXTFIELD InputView Uipicker not working in swift -