Find only the matched pattern in a CSV file

text-processing awk sed perl

9,341

Solution 1

With GNU grep which supports -o option to print only matched string, each on its own line

$ grep -oE '\b35=[^,]+' ip.csv 
35=A
35=BCD
35=EF
35=G

\b is word boundary, so that 900035 won't match
[^,]+ to match one or more non, characters
assumes the values do not contain ,

With awk

$ awk -F, '{ for(i=1;i<=NF;i++){if($i~/^35=/) print $i} }' ip.csv 
35=A
35=BCD
35=EF
35=G

-F, set , as input field separator
for(i=1;i<=NF;i++) iterate over all fields
if($i~/^35=/) if field starts with 35=
- print $i print that field

Similar with perl

perl -F, -lane 'foreach (@F){print if /^35=/}' ip.csv

Solution 2

Using tr to replace all commas with newlines, and then grep to get all lines that start with the string 35=:

$  tr ',' '\n' <data.in | grep '^35='
35=A
35=BCD
35=EF
35=G

Solution 3

With perl:

$ perl -lne 'print for /(\b35=[^,]+)/g' filename
35=A
35=BCD
35=EF
35=G

or perhaps more generally/robustly using the Text::CSV module

$ perl -MText::CSV -lne '
  BEGIN{$p = Text::CSV->new()} 
  print for grep { /^35=/ } $p->fields(), $p->parse($_)
' filename
35=A
35=BCD
35=EF
35=G

Solution 4

Perl lookarounds with grep work really well.

grep -oP '(?<=35\=).*?(?=,)'

This returns the exact information minus the 35= bit

grep -oP '(?<=35\=).*?(?=,)' file.csv will return this

A
BCD
G
EF

View more solutions

9,341

user102299

Updated on September 18, 2022

Comments

user102299 over 1 year
I am trying to print only the matched pattern in a CSV file. Example: all the columns value starting with 35=its value. Thanks.

CSV file:
```
35=A,D=35,C=129,ff=136
D=35,35=BCD,C=129,ff=136
900035=G,D=35,C=129,ff=136
35=EF,D=35,C=129,ff=136,35=G
36=o,D=35,k=1
```
Output:
```
35=A
35=BCD
35=EF
35=G
```
The command I used did not work:
```
sed -n '/35=[A-Z]*?/ s/.*$35=[A-Z]*?$.*/\1/p' filename
```
- Арсений Черенков almost 7 years
  
  try to remove space before s in sed expression. it might be difficult to catch both 35in third line.
- Admin almost 7 years
  
  Use the following: sed -e 'y/,/\n/; /^35=/P; D' filename
user102299 almost 7 years

Thanks a lot. In Solaries the grep option o is not available. used the awk for the solution.
Aaron almost 7 years

Note that \b isn't the best choice of boundary ; it would for instance match 35=value from somethingsomething, 42.35=value, somethingelse. If the grep versions supports -Perl regex and the input contains no spaces, I'd use a lookbehind to assert that we match from the start of a field. If there can be extra spaces this would require variable-length lookbehinds, which aren't implemented in any regex flavour available to grep AFAIK.
Sundeep almost 7 years

@Aaron agreed.. steeldriver's solution based on csv module is most robust... for variable length lookbehinds \K can be used stackoverflow.com/documentation/regex/639/…
Aaron almost 7 years

@Sundeep thanks for the \K trick, I knew about the meta-character but had never thought about its ability to emulate basic variable-length lookbehinds :)