Find only the matched pattern in a CSV file
9,341
Solution 1
With GNU grep
which supports -o
option to print only matched string, each on its own line
$ grep -oE '\b35=[^,]+' ip.csv
35=A
35=BCD
35=EF
35=G
-
\b
is word boundary, so that900035
won't match -
[^,]+
to match one or more non,
characters - assumes the values do not contain
,
With awk
$ awk -F, '{ for(i=1;i<=NF;i++){if($i~/^35=/) print $i} }' ip.csv
35=A
35=BCD
35=EF
35=G
-
-F,
set,
as input field separator -
for(i=1;i<=NF;i++)
iterate over all fields -
if($i~/^35=/)
if field starts with35=
-
print $i
print that field
-
Similar with perl
perl -F, -lane 'foreach (@F){print if /^35=/}' ip.csv
Solution 2
Using tr
to replace all commas with newlines, and then grep
to get all lines that start with the string 35=
:
$ tr ',' '\n' <data.in | grep '^35='
35=A
35=BCD
35=EF
35=G
Solution 3
With perl:
$ perl -lne 'print for /(\b35=[^,]+)/g' filename
35=A
35=BCD
35=EF
35=G
or perhaps more generally/robustly using the Text::CSV module
$ perl -MText::CSV -lne '
BEGIN{$p = Text::CSV->new()}
print for grep { /^35=/ } $p->fields(), $p->parse($_)
' filename
35=A
35=BCD
35=EF
35=G
Solution 4
Perl lookarounds with grep work really well.
grep -oP '(?<=35\=).*?(?=,)'
This returns the exact information minus the 35=
bit
grep -oP '(?<=35\=).*?(?=,)' file.csv
will return this
A
BCD
G
EF
Related videos on Youtube
Author by
user102299
Updated on September 18, 2022Comments
-
user102299 over 1 year
I am trying to print only the matched pattern in a CSV file. Example: all the columns value starting with
35=its value
. Thanks.CSV file:
35=A,D=35,C=129,ff=136 D=35,35=BCD,C=129,ff=136 900035=G,D=35,C=129,ff=136 35=EF,D=35,C=129,ff=136,35=G 36=o,D=35,k=1
Output:
35=A 35=BCD 35=EF 35=G
The command I used did not work:
sed -n '/35=[A-Z]*?/ s/.*\(35=[A-Z]*?\).*/\1/p' filename
-
Арсений Черенков almost 7 yearstry to remove space before s in sed expression. it might be difficult to catch both 35in third line.
-
Admin almost 7 yearsUse the following:
sed -e 'y/,/\n/; /^35=/P; D' filename
-
-
user102299 almost 7 yearsThanks a lot. In Solaries the grep option o is not available. used the awk for the solution.
-
Aaron almost 7 yearsNote that
\b
isn't the best choice of boundary ; it would for instance match35=value
fromsomethingsomething, 42.35=value, somethingelse
. If thegrep
versions supports-P
erl regex and the input contains no spaces, I'd use a lookbehind to assert that we match from the start of a field. If there can be extra spaces this would require variable-length lookbehinds, which aren't implemented in any regex flavour available to grep AFAIK. -
Sundeep almost 7 years@Aaron agreed.. steeldriver's solution based on csv module is most robust... for variable length lookbehinds
\K
can be used stackoverflow.com/documentation/regex/639/… -
Aaron almost 7 years@Sundeep thanks for the
\K
trick, I knew about the meta-character but had never thought about its ability to emulate basic variable-length lookbehinds :)