Count number of column in a pipe delimited file

25,602

Solution 1

awk -F\| '{print NF}'

gives correct result.

Solution 2

Pure Unix solution (without awk/Perl):

$ cat  /tmp/x1
1|2|3|34
4534|23442|1121|334434

$ head -1 /tmp/x1 | tr "|" "\012" | wc -l
4

Perl solution - 1-liner:

$ perl5.8 -naF'\|' -e 'print scalar(@F)."\n";exit;' /tmp/x1
4

BUT!!!! IMPORTANT!!!

Every one of these solutions - as well as those on other answers - do NOT work 100%!

Namely, they all break when it's a REAL "pipe-separated" file, with a pipe being a valid character in the field (and the field being quoted), the way real CSV files work.

E.g.

$ cat /tmp/x2
"0|1"|2|3|34
4534|23442|1121|334434
$ perl5.8 -naF'\|' -e 'print scalar(@F)."\n";exit;' /tmp/x1
5   <----- BROKEN!!! There are only 4 fields, first field is "0|1"

To fix that, a proper CSV (or delimited file) parser should be used, such as one in Perl:

$ perl5.8 -MText::CSV_XS 
-ne '$csv=Text::CSV_XS->new({sep_char => "|"});  $csv->parse($_); 
print $csv->fields(); print "\n"; exit;' /tmp/x2

Prints correct value

4

As a note, simply fixing an awk or sed solution with a convoluted RegEx won't work easily, since on top of pipe-containing-and-quoted PSV fields, the spec also allows quotes as part of the field as well. That does NOT lend itself to a nice RegEx solution.

Share:
25,602
Maulzey
Author by

Maulzey

Updated on July 09, 2022

Comments

  • Maulzey
    Maulzey almost 2 years

    I have a pipe | delimited file.

    File:

    106232145|"medicare"|"medicare,medicaid"|789
    

    I would like to count the number of fields in each line. I tried the below code

    Code:

    awk -F '|' '{print NF-1}'
    

    This returns me the result as 5 instead of 4. This is because the awk takes "medicare|medicaid" as two different fields instead of one field