bash - Removing rows from one file which do not mach another -
i looking efficient way delete rows in file1 not exist in file2 in bash:
file1.txt:
file1 <- 'probeset_id sample1 sample2 sample3 ax-2 100 200 180 ax-1 90 180 267 ax-3 80 890 124' file1 <- read.table(text=file1, header=t) write.table(file1, "file1.txt", col.names=t, quote=f, row.names=f) file2.txt:
file2 <- 'probeset_id ax-1 ax-2 ' file2 <- read.table(text=file2, header=t) write.table(file2, "file2.txt", col.names=f, quote=f, row.names=f) the expected output:
out <- 'probeset_id sample1 sample2 sample3 ax-1 90 180 267 ax-2 100 200 180' out <- read.table(text=out, header=t) write.table(out, "out.txt", col.names=t, quote=f, row.names=f) the additional problem file2 not sorted file1. trying use:
head -n 1 file1.txt ; grep -f file2.txt file1.txt however, taking long time. ideas perform in more efficient way (the real files quite big)?
awk of great use in case
awk 'nr==fnr{line[$1]++; next} $1 in line' example
$ awk 'nr==fnr{line[$1]++; next} $1 in line' file2 file1 probeset_id sample1 sample2 sample3 ax-2 100 200 180 ax-1 90 180 267 what does?
nr==fnr{line[$1]++; next}saves lines infile2in associative arrayline( indexed first column )nr==fnrtrue first file in list,file2.nrnumber or records read till now.fnrnumber of records read in current file.
$1 in linechecks if column 1 infile1saved inline, if true,awktakes default action of printing current records.
Comments
Post a Comment