How to find words from one file in another file?
Solution 1
You can use grep -f
:
grep -Ff "first-file" "second-file"
OR else to match full words:
grep -w -Ff "first-file" "second-file"
UPDATE: As per the comments:
awk 'FNR==NR{a[$1]; next} ($1 in a){delete a[$1]; print $1}' file1 file2
Solution 2
Use grep like this:
grep -f firstfile secondfile
SECOND OPTION
Thank you to Ed Morton for pointing out that the words in the file "reserved" are treated as patterns. If that is an issue - it may or may not be - the OP can maybe use something like this which doesn't use patterns:
File "reserved"
cat
dog
fox
and file "text"
The cat jumped over the lazy
fox but didn't land on the
moon at all.
However it did land on the dog!!!
Awk script is like this:
awk 'BEGIN{i=0}FNR==NR{res[i++]=$1;next}{for(j=0;j<i;j++)if(index($0,res[j]))print $0}' reserved text
with output:
The cat jumped over the lazy
fox but didn't land on the
However it did land on the dog!!!
THIRD OPTION
Alternatively, it can be done quite simply, but more slowly in bash:
while read r; do grep $r secondfile; done < firstfile
ocslegna
Computer Science Student at the University of Buenos Aires. Lecturer in CS&Programming events. I love to read and go gradually learning new subjects and topics on computer science and everything related to it. I enjoy continually being attached to computer advances every day and be able to collaborate as much as possible with the community. Currently, I'm programming in .NET along with Python and also I have some experience in C++.
Updated on July 25, 2022Comments
-
ocslegna almost 2 years
In one text file, I have 150 words. I have another text file, which has about 100,000 lines.
How can I check for each of the words belonging to the first file whether it is in the second or not?
I thought about using
grep
, but I could not find out how to use it to read each of the words in the original text.Is there any way to do this using
awk
? Or another solution?I tried with this shell script, but it matches almost every line:
#!/usr/bin/env sh cat words.txt | while read line; do if grep -F "$FILENAME" text.txt then echo "Se encontró $line" fi done
Another way I found is:
fgrep -w -o -f "words.txt" "text.txt"
-
hek2mgl over 10 yearsCool! Didn't know that! I was about to suggest somethin like:
grep -E $(cat search | tr '\n' '|') text
:) -
ocslegna over 10 yearsThank you @anubhava! Your answer was helpful.
-
Ed Morton over 10 yearsThis is looking for strings so that's good but will match
the
totheatre
- is that desirable? -
Ed Morton over 10 yearsThis is looking for regexps and so will match both
the
anda.r
totheatre
- is that desirable? -
anubhava over 10 yearsYes
-w
option can be added to make sure complete word is matches (if so desired). -
ocslegna over 10 yearswith:
fgrep -w -o -f "first-file" "second-file
Returns all words were found, but they are repeated. How do I show them only once? -
anubhava over 10 yearsSo you only want to show a matching line from second file only first time?
-
ocslegna over 10 yearsI want to see if the words of text1 are present in the second.
-
anubhava over 10 yearsRight but I just want to understand the output you need. So just list of words from
text1
that are present in second right? -
ocslegna over 10 years@anubhava Exactly, in
text1
i got 150 reserved words(red hat i.e) and in the second file****.sql
i got 100.000 lines and i just only want to know is if the words from file1 are present in the second. -
ocslegna over 10 years@anubhava Thank you! Works <:-]
-
ocslegna over 10 years@anubhava Do you know why this is working in a fedora server but in a red hat serv don't?
-
anubhava over 10 yearsAre the fixes exactly same on both servers? (check with
cat -vte file
command) -
ocslegna over 10 yearsYes, its the same:
$
in both servers -
anubhava over 10 yearsIt could be due to different awk versions, I guess. Is it not showing any output on red hat?
-
ocslegna over 10 years@anubhava In the beggining, no. I cp and edit the same .sql file and wrote a new one. I think it didn´t at the beggining because the .sql file was stored in an ftp and another server before red hat. Now, for the moment, is working.
-
ocslegna over 10 years@anubhava Thanks for the time ^^,!
-
kvantour almost 5 yearsBe aware that a direct invocation as either
egrep
orfgrep
is deprecated, but is provided to allow historical applications that rely on them to run unmodified. (sourceman grep
) -
anubhava almost 5 yearsGood point @kvantour I updated answer to use
grep -Ff
instead offgrep
in this 5 year old answer.