bash - Removing rows from one file which do not mach another -

- June 15, 2012

i looking efficient way delete rows in file1 not exist in file2 in bash:

file1.txt:

file1 <- 'probeset_id  sample1 sample2 sample3 ax-2           100     200    180 ax-1           90      180    267 ax-3           80      890    124' file1 <- read.table(text=file1, header=t) write.table(file1, "file1.txt", col.names=t, quote=f, row.names=f)

file2.txt:

file2 <- 'probeset_id             ax-1             ax-2  '     file2 <- read.table(text=file2, header=t)     write.table(file2, "file2.txt", col.names=f, quote=f, row.names=f)

the expected output:

out <- 'probeset_id  sample1 sample2 sample3     ax-1           90      180    267     ax-2           100     200    180'     out <- read.table(text=out, header=t)     write.table(out, "out.txt", col.names=t, quote=f, row.names=f)

the additional problem file2 not sorted file1. trying use:

head -n 1 file1.txt ; grep -f file2.txt file1.txt

however, taking long time. ideas perform in more efficient way (the real files quite big)?

awk of great use in case

 awk 'nr==fnr{line[$1]++; next}  $1 in line'

example

$  awk 'nr==fnr{line[$1]++; next}  $1 in line' file2 file1 probeset_id  sample1 sample2 sample3 ax-2           100     200    180 ax-1           90      180    267

what does?

nr==fnr{line[$1]++; next} saves lines in file2 in associative array line ( indexed first column )
nr==fnr true first file in list, file2.
- nr number or records read till now.
- fnr number of records read in current file.
$1 in line checks if column 1 in file1 saved in line, if true, awk takes default action of printing current records.

Search This Blog

JAV

bash - Removing rows from one file which do not mach another -

Comments

Post a Comment

Popular posts from this blog

ios - UITEXTFIELD InputView Uipicker not working in swift -

Hatching array of circles in AutoCAD using c# -

Python Pig Latin Translator -