bash - Removing rows from one file which do not mach another -


i looking efficient way delete rows in file1 not exist in file2 in bash:

file1.txt:

file1 <- 'probeset_id  sample1 sample2 sample3 ax-2           100     200    180 ax-1           90      180    267 ax-3           80      890    124' file1 <- read.table(text=file1, header=t) write.table(file1, "file1.txt", col.names=t, quote=f, row.names=f) 

file2.txt:

file2 <- 'probeset_id             ax-1             ax-2  '     file2 <- read.table(text=file2, header=t)     write.table(file2, "file2.txt", col.names=f, quote=f, row.names=f) 

the expected output:

out <- 'probeset_id  sample1 sample2 sample3     ax-1           90      180    267     ax-2           100     200    180'     out <- read.table(text=out, header=t)     write.table(out, "out.txt", col.names=t, quote=f, row.names=f) 

the additional problem file2 not sorted file1. trying use:

head -n 1 file1.txt ; grep -f file2.txt file1.txt 

however, taking long time. ideas perform in more efficient way (the real files quite big)?

awk of great use in case

 awk 'nr==fnr{line[$1]++; next}  $1 in line' 

example

$  awk 'nr==fnr{line[$1]++; next}  $1 in line' file2 file1 probeset_id  sample1 sample2 sample3 ax-2           100     200    180 ax-1           90      180    267 

what does?

  • nr==fnr{line[$1]++; next} saves lines in file2 in associative array line ( indexed first column )

  • nr==fnr true first file in list, file2.

    • nr number or records read till now.
    • fnr number of records read in current file.
  • $1 in line checks if column 1 in file1 saved in line, if true, awk takes default action of printing current records.

Comments

Popular posts from this blog

Hatching array of circles in AutoCAD using c# -

ios - UITEXTFIELD InputView Uipicker not working in swift -

Python Pig Latin Translator -