How to find words from one file in another file?

13,114

Solution 1

You can use grep -f:

grep -Ff "first-file" "second-file"

OR else to match full words:

grep -w -Ff "first-file" "second-file"

UPDATE: As per the comments:

awk 'FNR==NR{a[$1]; next} ($1 in a){delete a[$1]; print $1}' file1 file2

Solution 2

Use grep like this:

grep -f firstfile secondfile

SECOND OPTION

Thank you to Ed Morton for pointing out that the words in the file "reserved" are treated as patterns. If that is an issue - it may or may not be - the OP can maybe use something like this which doesn't use patterns:

File "reserved"

cat
dog
fox

and file "text"

The cat jumped over the lazy
fox but didn't land on the
moon at all.
However it did land on the dog!!!

Awk script is like this:

awk 'BEGIN{i=0}FNR==NR{res[i++]=$1;next}{for(j=0;j<i;j++)if(index($0,res[j]))print $0}' reserved text

with output:

The cat jumped over the lazy
fox but didn't land on the
However it did land on the dog!!!

THIRD OPTION

Alternatively, it can be done quite simply, but more slowly in bash:

while read r; do grep $r secondfile; done < firstfile 
Share:
13,114
ocslegna
Author by

ocslegna

Computer Science Student at the University of Buenos Aires. Lecturer in CS&amp;Programming events. I love to read and go gradually learning new subjects and topics on computer science and everything related to it. I enjoy continually being attached to computer advances every day and be able to collaborate as much as possible with the community. Currently, I'm programming in .NET along with Python and also I have some experience in C++.

Updated on July 25, 2022

Comments

  • ocslegna
    ocslegna almost 2 years

    In one text file, I have 150 words. I have another text file, which has about 100,000 lines.

    How can I check for each of the words belonging to the first file whether it is in the second or not?

    I thought about using grep, but I could not find out how to use it to read each of the words in the original text.

    Is there any way to do this using awk? Or another solution?

    I tried with this shell script, but it matches almost every line:

    #!/usr/bin/env sh
    cat words.txt | while read line; do  
        if grep -F "$FILENAME" text.txt
        then
            echo "Se encontró $line"
        fi
    done
    

    Another way I found is:

    fgrep -w -o -f "words.txt" "text.txt"
    
  • hek2mgl
    hek2mgl over 10 years
    Cool! Didn't know that! I was about to suggest somethin like: grep -E $(cat search | tr '\n' '|') text :)
  • ocslegna
    ocslegna over 10 years
    Thank you @anubhava! Your answer was helpful.
  • Ed Morton
    Ed Morton over 10 years
    This is looking for strings so that's good but will match the to theatre - is that desirable?
  • Ed Morton
    Ed Morton over 10 years
    This is looking for regexps and so will match both the and a.r to theatre - is that desirable?
  • anubhava
    anubhava over 10 years
    Yes -w option can be added to make sure complete word is matches (if so desired).
  • ocslegna
    ocslegna over 10 years
    with: fgrep -w -o -f "first-file" "second-file Returns all words were found, but they are repeated. How do I show them only once?
  • anubhava
    anubhava over 10 years
    So you only want to show a matching line from second file only first time?
  • ocslegna
    ocslegna over 10 years
    I want to see if the words of text1 are present in the second.
  • anubhava
    anubhava over 10 years
    Right but I just want to understand the output you need. So just list of words from text1 that are present in second right?
  • ocslegna
    ocslegna over 10 years
    @anubhava Exactly, in text1 i got 150 reserved words(red hat i.e) and in the second file ****.sql i got 100.000 lines and i just only want to know is if the words from file1 are present in the second.
  • ocslegna
    ocslegna over 10 years
    @anubhava Thank you! Works <:-]
  • ocslegna
    ocslegna over 10 years
    @anubhava Do you know why this is working in a fedora server but in a red hat serv don't?
  • anubhava
    anubhava over 10 years
    Are the fixes exactly same on both servers? (check with cat -vte file command)
  • ocslegna
    ocslegna over 10 years
    Yes, its the same: $ in both servers
  • anubhava
    anubhava over 10 years
    It could be due to different awk versions, I guess. Is it not showing any output on red hat?
  • ocslegna
    ocslegna over 10 years
    @anubhava In the beggining, no. I cp and edit the same .sql file and wrote a new one. I think it didn´t at the beggining because the .sql file was stored in an ftp and another server before red hat. Now, for the moment, is working.
  • ocslegna
    ocslegna over 10 years
    @anubhava Thanks for the time ^^,!
  • kvantour
    kvantour almost 5 years
    Be aware that a direct invocation as either egrep or fgrep is deprecated, but is provided to allow historical applications that rely on them to run unmodified. (source man grep)
  • anubhava
    anubhava almost 5 years
    Good point @kvantour I updated answer to use grep -Ff instead of fgrep in this 5 year old answer.