grep pattern exacly matching from file and search only in first column

10,439

Solution 1

You probably want the -wflag - from man grep

   -w, --word-regexp
          Select  only  those  lines  containing  matches  that form whole
          words.  The test is that the matching substring must  either  be
          at  the  beginning  of  the  line,  or  preceded  by  a non-word
          constituent character.  Similarly, it must be either at the  end
          of  the  line  or  followed by a non-word constituent character.
          Word-constituent  characters  are  letters,  digits,   and   the
          underscore.

i.e.

grep -wFf patfile file
denovo1 xxx yyyy oggugu ddddd
denovo22 hhhh yyyy kkkk iiii

To enforce matching only in the first column, you would need to modify the entries in the pattern file to add a line anchor: you could also make use of the \b word anchor instead of the command-line -w switch e.g. in patfile:

^denovo1\b
^denovo3\b
^denovo22\b

then

grep -f patfile file
denovo1 xxx yyyy oggugu ddddd
denovo22 hhhh yyyy kkkk iiii

Note that you must drop the -F switch if the file contains regular expressions instead of simple fixed strings.

Solution 2

one can use awk too:

awk 'NR==FNR{a[$0]=$0}NR>FNR{if($1==a[$1])print $0}' pattern_file big_file

output:

denovo1 xxx yyyy oggugu ddddd
denovo22 hhhh yyyy kkkk iiii
Share:
10,439

Related videos on Youtube

Francesca de Filippis
Author by

Francesca de Filippis

Updated on September 18, 2022

Comments

  • Francesca de Filippis
    Francesca de Filippis almost 2 years

    I have a bigfile like this:

    denovo1 xxx yyyy oggugu ddddd
    denovo11 ggg hhhh bbbb gggg
    denovo22 hhhh yyyy kkkk iiii
    denovo2 yyyyy rrrr fffff jjjj
    denovo33 hhh yyy eeeee fffff
    

    then my pattern file is:

    denovo1
    denovo3
    denovo22
    

    I'm trying to use fgrep in order to extract only the lines exactly matching the pattern in my file (so I want denovo1 but not denovo11). I tried to use -x for the exact match, but then I got an empty file. I tried:

    fgrep -x --file="pattern" bigfile.txt > clusters.blast.uniq
    

    Is there a way to make grep searching only in the first column?

    • Costas
      Costas over 9 years
      For first column|word you should add to pattern ^ like ^denovo1 which mean from start the line. For search denovo1 with denovo11 not included you can modify pattern like denovo1\b or denovo1\> or add the option -w which limits a search by pattern compliant to whole word
    • Francesca de Filippis
      Francesca de Filippis over 9 years
      Should I add the \b in my pattern file at the end of each line?
    • Costas
      Costas over 9 years
      Yes, you can choice any of variants. It is easy to do with sed --in-place 's/$/\\b/' pattern.file
  • smw
    smw over 9 years
    +1 that's a nice alternative since it achieves the column restriction without modification to the pattern file