grep pattern exacly matching from file and search only in first column
10,439
Solution 1
You probably want the -w
flag - from man grep
-w, --word-regexp
Select only those lines containing matches that form whole
words. The test is that the matching substring must either be
at the beginning of the line, or preceded by a non-word
constituent character. Similarly, it must be either at the end
of the line or followed by a non-word constituent character.
Word-constituent characters are letters, digits, and the
underscore.
i.e.
grep -wFf patfile file
denovo1 xxx yyyy oggugu ddddd
denovo22 hhhh yyyy kkkk iiii
To enforce matching only in the first column, you would need to modify the entries in the pattern file to add a line anchor: you could also make use of the \b
word anchor instead of the command-line -w
switch e.g. in patfile
:
^denovo1\b
^denovo3\b
^denovo22\b
then
grep -f patfile file
denovo1 xxx yyyy oggugu ddddd
denovo22 hhhh yyyy kkkk iiii
Note that you must drop the -F
switch if the file contains regular expressions instead of simple fixed strings.
Solution 2
one can use awk too:
awk 'NR==FNR{a[$0]=$0}NR>FNR{if($1==a[$1])print $0}' pattern_file big_file
output:
denovo1 xxx yyyy oggugu ddddd
denovo22 hhhh yyyy kkkk iiii
Related videos on Youtube
Author by
Francesca de Filippis
Updated on September 18, 2022Comments
-
Francesca de Filippis almost 2 years
I have a bigfile like this:
denovo1 xxx yyyy oggugu ddddd denovo11 ggg hhhh bbbb gggg denovo22 hhhh yyyy kkkk iiii denovo2 yyyyy rrrr fffff jjjj denovo33 hhh yyy eeeee fffff
then my pattern file is:
denovo1 denovo3 denovo22
I'm trying to use
fgrep
in order to extract only the lines exactly matching the pattern in my file (so I wantdenovo1
but notdenovo11
). I tried to use-x
for the exact match, but then I got an empty file. I tried:fgrep -x --file="pattern" bigfile.txt > clusters.blast.uniq
Is there a way to make grep searching only in the first column?
-
Costas over 9 yearsFor first column|word you should add to pattern
^
like^denovo1
which mean from start the line. For searchdenovo1
withdenovo11
not included you can modify pattern likedenovo1\b
ordenovo1\>
or add the option-w
which limits a search by pattern compliant to whole word -
Francesca de Filippis over 9 yearsShould I add the \b in my pattern file at the end of each line?
-
Costas over 9 yearsYes, you can choice any of variants. It is easy to do with
sed --in-place 's/$/\\b/' pattern.file
-
-
smw over 9 years+1 that's a nice alternative since it achieves the column restriction without modification to the pattern file