Search in specific column for pattern and output entire line

31,330

Solution 1

The simplest approach would probably be awk:

awk -F'|' '$4~/^5/' file

The -F'|' sets the field separator to |. The $4~/^5/ will be true if the 4th field starts with 5. The default action for awk when something evaluates to true is to print the current line, so the script above will print what you want.

Other choices are:

  • Perl

    perl -F'\|' -ane 'print if $F[3]=~/^5/' file
    

    Same idea. The -a switch causes perl to split its input fields on the value given by -F into the array @F. We then print if the 4th element (field) of the array (arrays start counting at 0) starts with a 5.

  • grep

    grep -E  '^([^|]*\|){3}5' file 
    

    The regex will match a string of non-| followed by a | 3 times, and then a 5.

  • GNU or BSD sed

    sed -En '/([^|]*\|){3}5/p' file 
    

    The -E turns on extended regular expressions and the -n suppresses normal output. The regex is the same as the grep above and the p at the end makes sed print only lines matching the regex.

Solution 2

This will print all lines that match |5 and then no more | until the end of the line:

grep '|5[^|]*$' <in >out
Share:
31,330

Related videos on Youtube

Kit Goodman
Author by

Kit Goodman

Updated on September 18, 2022

Comments

  • Kit Goodman
    Kit Goodman over 1 year

    I'm working in HDFS and am trying to get the entire line where the 4th column starts with the number 5:

    100|20151010|K|5001
    695|20151010|K|1010
    309|20151010|R|5005
    410|20151010|K|5001
    107|20151010|K|1062
    652|20151010|K|5001
    

    Hence should output:

    100|20151010|K|5001
    309|20151010|R|5005
    410|20151010|K|5001
    652|20151010|K|5001
    
  • terdon
    terdon over 8 years
    @mikeserv thanks, greater portability is always a good thing but where is that documented? I tried it and it does indeed work on GNU sed but the -E isn't mentioned in either man or info. It does activate ERE, right?
  • mikeserv
    mikeserv over 8 years
    its not, except in the source. that happens a lot with open source stuff - people submit a patch to do the same thing something already does because they're used to doing it with a different letter but then dont care to write a new SYNOPSIS. anyway, its worked for a long time. and -Extended regexp is slated for the next POSIX version, too, so might as well just get used to it. plus, -r doesn't make any sense. and yeah, it's a synonym - they both do exactly the same thing. almost the same - i think w/ -r you can switch back -re ... -Ge ... or something, but who would?