Find only the matched pattern in a CSV file

9,341

Solution 1

With GNU grep which supports -o option to print only matched string, each on its own line

$ grep -oE '\b35=[^,]+' ip.csv 
35=A
35=BCD
35=EF
35=G
  • \b is word boundary, so that 900035 won't match
  • [^,]+ to match one or more non, characters
  • assumes the values do not contain ,


With awk

$ awk -F, '{ for(i=1;i<=NF;i++){if($i~/^35=/) print $i} }' ip.csv 
35=A
35=BCD
35=EF
35=G
  • -F, set , as input field separator
  • for(i=1;i<=NF;i++) iterate over all fields
  • if($i~/^35=/) if field starts with 35=
    • print $i print that field

Similar with perl

perl -F, -lane 'foreach (@F){print if /^35=/}' ip.csv 

Solution 2

Using tr to replace all commas with newlines, and then grep to get all lines that start with the string 35=:

$  tr ',' '\n' <data.in | grep '^35='
35=A
35=BCD
35=EF
35=G

Solution 3

With perl:

$ perl -lne 'print for /(\b35=[^,]+)/g' filename
35=A
35=BCD
35=EF
35=G

or perhaps more generally/robustly using the Text::CSV module

$ perl -MText::CSV -lne '
  BEGIN{$p = Text::CSV->new()} 
  print for grep { /^35=/ } $p->fields(), $p->parse($_)
' filename
35=A
35=BCD
35=EF
35=G

Solution 4

Perl lookarounds with grep work really well.

grep -oP '(?<=35\=).*?(?=,)'

This returns the exact information minus the 35= bit

grep -oP '(?<=35\=).*?(?=,)' file.csv will return this

A
BCD
G
EF

Share:
9,341

Related videos on Youtube

user102299
Author by

user102299

Updated on September 18, 2022

Comments

  • user102299
    user102299 over 1 year

    I am trying to print only the matched pattern in a CSV file. Example: all the columns value starting with 35=its value. Thanks.

    CSV file:

    35=A,D=35,C=129,ff=136
    D=35,35=BCD,C=129,ff=136
    900035=G,D=35,C=129,ff=136
    35=EF,D=35,C=129,ff=136,35=G
    36=o,D=35,k=1
    

    Output:

    35=A
    35=BCD
    35=EF
    35=G
    

    The command I used did not work:

    sed -n '/35=[A-Z]*?/ s/.*\(35=[A-Z]*?\).*/\1/p' filename
    
    • Арсений Черенков
      Арсений Черенков almost 7 years
      try to remove space before s in sed expression. it might be difficult to catch both 35in third line.
    • Admin
      Admin almost 7 years
      Use the following: sed -e 'y/,/\n/; /^35=/P; D' filename
  • user102299
    user102299 almost 7 years
    Thanks a lot. In Solaries the grep option o is not available. used the awk for the solution.
  • Aaron
    Aaron almost 7 years
    Note that \b isn't the best choice of boundary ; it would for instance match 35=value from somethingsomething, 42.35=value, somethingelse. If the grep versions supports -Perl regex and the input contains no spaces, I'd use a lookbehind to assert that we match from the start of a field. If there can be extra spaces this would require variable-length lookbehinds, which aren't implemented in any regex flavour available to grep AFAIK.
  • Sundeep
    Sundeep almost 7 years
    @Aaron agreed.. steeldriver's solution based on csv module is most robust... for variable length lookbehinds \K can be used stackoverflow.com/documentation/regex/639/…
  • Aaron
    Aaron almost 7 years
    @Sundeep thanks for the \K trick, I knew about the meta-character but had never thought about its ability to emulate basic variable-length lookbehinds :)