pandas - Read rows in text file based on different column headers using Python -
i have text file contains data 2 tables 2 column headers. want read each line , save lines or rows in 2 files each table. want lines after (number amount) columns header in 1 file , (code volume dim) column header in other. or, want separate data similar table.
first few lines of file. number of rows in tables may vary.
number amount 10 34 23 65 54 07 code volume dim 1 56 34 23 57 565 number amount 40 674 73 2365 code volume dim 341 3456 6534 23 0957 908565 number amount 210 4534 2343 4565
same pattern repeats.
if text number
same in file, , text code
too, can use read_csv
, filter columns rows subset contains
, isnull
, notnull
:
import pandas pd import numpy np import io temp=u""" number amount 10 34 23 65 54 07 code volume dim 1 56 34 23 57 565 number amount 40 674 73 2365 code volume dim 341 3456 6534 23 0957 908565 number amount 210 4534 2343 4565"""
#after testing replace io.stringio(temp) filename df = pd.read_csv(io.stringio(temp), sep="\s+", index_col=none, skip_blank_lines=true, names=['a', 'b', 'c']) print df b c 0 number amount nan 1 10 34 nan 2 23 65 nan 3 54 07 nan 4 code volume dim 5 1 56 34 6 23 57 565 7 number amount nan 8 40 674 nan 9 73 2365 nan 10 code volume dim 11 341 3456 6534 12 23 0957 908565 13 number amount nan 14 210 4534 nan 15 2343 4565 nan
df = df[~((df.a.str.contains('number')) | (df.a.str.contains('code')))] df1 = df[df.c.isnull()] df1 = df1[['a', 'b']].reset_index(drop=true) df1.columns = ['number','amount'] #if numbers integers df1 = df1.astype(int) print df1 number amount 0 10 34 1 23 65 2 54 7 3 40 674 4 73 2365 5 210 4534 6 2343 4565 df2 = df[df.c.notnull()].reset_index(drop=true) df2.columns = ['code','volume', 'dim'] #if numbers integers df2 = df2.astype(int) print df2 code volume dim 0 1 56 34 1 23 57 565 2 341 3456 6534 3 23 957 908565
Comments
Post a Comment