How to find words from one file in another file?

linux shell awk grep text-manipulation

13,114

Solution 1

You can use grep -f:

grep -Ff "first-file" "second-file"

OR else to match full words:

grep -w -Ff "first-file" "second-file"

UPDATE: As per the comments:

awk 'FNR==NR{a[$1]; next} ($1 in a){delete a[$1]; print $1}' file1 file2

Solution 2

Use grep like this:

grep -f firstfile secondfile

SECOND OPTION

Thank you to Ed Morton for pointing out that the words in the file "reserved" are treated as patterns. If that is an issue - it may or may not be - the OP can maybe use something like this which doesn't use patterns:

File "reserved"

cat
dog
fox

and file "text"

The cat jumped over the lazy
fox but didn't land on the
moon at all.
However it did land on the dog!!!

Awk script is like this:

awk 'BEGIN{i=0}FNR==NR{res[i++]=$1;next}{for(j=0;j<i;j++)if(index($0,res[j]))print $0}' reserved text

with output:

The cat jumped over the lazy
fox but didn't land on the
However it did land on the dog!!!

THIRD OPTION

Alternatively, it can be done quite simply, but more slowly in bash:

while read r; do grep $r secondfile; done < firstfile

13,114

Author by

ocslegna

Computer Science Student at the University of Buenos Aires. Lecturer in CS&Programming events. I love to read and go gradually learning new subjects and topics on computer science and everything related to it. I enjoy continually being attached to computer advances every day and be able to collaborate as much as possible with the community. Currently, I'm programming in .NET along with Python and also I have some experience in C++.

Updated on July 25, 2022

Comments

ocslegna almost 2 years
In one text file, I have 150 words. I have another text file, which has about 100,000 lines.

How can I check for each of the words belonging to the first file whether it is in the second or not?

I thought about using grep, but I could not find out how to use it to read each of the words in the original text.

Is there any way to do this using awk? Or another solution?

I tried with this shell script, but it matches almost every line:
```
#!/usr/bin/env sh
cat words.txt | while read line; do  
    if grep -F "$FILENAME" text.txt
    then
        echo "Se encontró $line"
    fi
done
```
Another way I found is:
```
fgrep -w -o -f "words.txt" "text.txt"
```
hek2mgl over 10 years

Cool! Didn't know that! I was about to suggest somethin like: grep -E $(cat search | tr '\n' '|') text :)
ocslegna over 10 years

Thank you @anubhava! Your answer was helpful.
Ed Morton over 10 years

This is looking for strings so that's good but will match the to theatre - is that desirable?
Ed Morton over 10 years

This is looking for regexps and so will match both the and a.r to theatre - is that desirable?
anubhava over 10 years

Yes -w option can be added to make sure complete word is matches (if so desired).
ocslegna over 10 years

with: fgrep -w -o -f "first-file" "second-file Returns all words were found, but they are repeated. How do I show them only once?
anubhava over 10 years

So you only want to show a matching line from second file only first time?
ocslegna over 10 years

I want to see if the words of text1 are present in the second.
anubhava over 10 years

Right but I just want to understand the output you need. So just list of words from text1 that are present in second right?
ocslegna over 10 years

@anubhava Exactly, in text1 i got 150 reserved words(red hat i.e) and in the second file ****.sql i got 100.000 lines and i just only want to know is if the words from file1 are present in the second.
ocslegna over 10 years

@anubhava Thank you! Works <:-]
ocslegna over 10 years

@anubhava Do you know why this is working in a fedora server but in a red hat serv don't?
anubhava over 10 years

Are the fixes exactly same on both servers? (check with cat -vte file command)
ocslegna over 10 years

Yes, its the same: $ in both servers
anubhava over 10 years

It could be due to different awk versions, I guess. Is it not showing any output on red hat?
ocslegna over 10 years

@anubhava In the beggining, no. I cp and edit the same .sql file and wrote a new one. I think it didn´t at the beggining because the .sql file was stored in an ftp and another server before red hat. Now, for the moment, is working.
ocslegna over 10 years

@anubhava Thanks for the time ^^,!
kvantour almost 5 years

Be aware that a direct invocation as either egrep or fgrep is deprecated, but is provided to allow historical applications that rely on them to run unmodified. (source man grep)
anubhava almost 5 years

Good point @kvantour I updated answer to use grep -Ff instead of fgrep in this 5 year old answer.