0

I want to match the numbers in the first file with the 2nd column of second file and get the matching lines in a separate output file. Kindly let me know what is wrong with the code?

I have a list of numbers in a file IDS.txt

10028615
1003
10096344
10100
10107393
10113978
10163178
118747520

I have a second File called src1src22.txt

From src:'1'    To src:'22'
CHEMBL3549542   118747520
CHEMBL548732    44526300
CHEMBL1189709   11740251
CHEMBL405440    44297517
CHEMBL310280    10335685

expected newoutput.txt

CHEMBL3549542   118747520

I have written this code

while read line; do cat src1src22.txt | grep -i -w "$line"  >> newoutput.txt done<IDS.txt
KHAN irfan
  • 253
  • 2
  • 9
  • *what is wrong with the code* What is the Output you get and what is the Output you expecting – Jens Mar 30 '17 at 13:31
  • kindly observe the expected output. I do not get any output. – KHAN irfan Mar 30 '17 at 13:37
  • whenever I see this I ask if bash is requirement, because even it is possible to do it in bash; it's not efficient in text processing; as your platform is linux, you definitely have python that suits much better this task. – Drako Mar 30 '17 at 13:50
  • 1
    Repeatedly looping over `grep` is horribly inefficient. Joining two files is a common task; the canonical duplicate is http://stackoverflow.com/questions/13272717/inner-join-on-two-text-files – tripleee Mar 30 '17 at 13:50

3 Answers3

1

Your command line works - except you're missing a semicolon:

while read line; do grep -i -w "$line" src1src22.txt; done < IDS.txt >> newoutput.txt
fzd
  • 765
  • 1
  • 6
  • 19
  • You should still add the missing quotes, and probably move the output redirection outside the loop. – tripleee Mar 30 '17 at 13:55
1

I have found an efficient way to perform the task. Instead of a loop try this -f gives the pattern in the file next to it and searches in the next file. The chance of invalid character length which can occur with grep is reduced and looping slows the process down.

 grep -iw -f IDS.txt src1src22.tx >>newoutput.txt
KHAN irfan
  • 253
  • 2
  • 9
0

Try this -

awk 'NR==FNR{a[$2]=$1;next} $1 in a{print a[$1],$0}' f2 f1
CHEMBL3549542 118747520

Where f2 is src1src22.txt

VIPIN KUMAR
  • 3,019
  • 1
  • 23
  • 34