Ignore delimiter present inside quotes

11,119

You are doing a couple of things wrong here:

  1. You are using the shell to parse text.

    While this is possible, it is very inefficient. It is slow, hard to write, hard to read and very hard to do properly. The shell just isn't designed for this sort of thing.

  2. You are trying to parse a csv file without a csv parser.

    CSV is not a simple format. You can have fields that contain the delimiter as you do here. You can also have fields spanning multiple lines. Attempting to parse arbitrary CSV data with simple pattern matching is very, very complicated and extremely hard to get right.

The bad, hacky solution is to do something like this:

$ sed 's/","/"|"/g' file.csv | 
    while IFS='|' read -r Key Name Address1 Address2 City \
     State Country SwiftCode Nid Chips Aba IsSwitching; do 
        echo "From Key : $Key"; echo "From Name : $Name"; 
    done
From Key : "ID0054XX"
From Name : "PT. SUMUT"
From Key : "ID00037687"
From Name : "PAN INDONESIA, PT."

That will replace all "," with "|" and then use | as the delimiter. Of course, that will fail if any of your fields can contain |.

The good, clean approach is to use a proper scripting language, not the shell, and a csv parser. For example, in Perl1:

$ cat file.csv | perl -MText::CSV -le '
    $csv = Text::CSV->new({binary=>1}); 
    while ($row = $csv->getline(STDIN)){ my ($Key, $Name, $Address1, $Address2, $City, $State, $Country, $SwiftCode, $Nid, $Chips, $Aba, $IsSwitching) = @$row;
print "From Key: $Key\nFrom Name: $Name";}' 
From Key: ID0054XX
From Name: PT. SUMUT
From Key: ID00037687
From Name: PAN INDONESIA, PT.
    

Or, as a script:

#!/usr/bin/perl -l
use strict;
use warnings;
use Text::CSV;

open(my $fh, "file.csv");
my $csv = Text::CSV->new({binary=>1}); 
while (my $row = $csv->getline($fh)){
    my (
            $Key, $Name, $Address1, $Address2, $City,
            $State, $Country, $SwiftCode, $Nid, $Chips,
            $Aba, $IsSwitching
         ) = @$row;
    print "From Key: $Key\nFrom Name: $Name";
}

Note that you'll have to install the Text::CSV module first (cpanm Text::CSV) and you might want to install cpanm (package cpanminus on most distributions)

Alternatively, in Python 3:

#!/usr/bin/env python3

import csv
with open('file.csv', newline='') as csvfile:
    linereader = csv.reader(csvfile, delimiter=',', quotechar='"')
    for row in linereader:
        print("From Key: %s\nFrom Name: %s" % (row[0], row[1]))
    

Saving the Python code above as a script and executing it on your file will print:

$ foo.py
From Key: ID0054XX
From Name: PT. SUMUT
From Key: ID00037687
From Name: PAN INDONESIA, PT.
    

1Yes, I am aware that that is a UUoC, but it's simpler to write as a one liner this way.

Share:
11,119

Related videos on Youtube

Ashish K
Author by

Ashish K

It's '0' days since my last syntax error

Updated on September 18, 2022

Comments

  • Ashish K
    Ashish K over 1 year

    I have a a .csv file as follows:

    "ID0054XX","PT. SUMUT","18 JL.BONJOL","SUMATERA UTARA, NORTH","MEDAN","","ID9856","PDSUIDSAXXX","","","","Y"
    "ID00037687","PAN INDONESIA, PT.","JALAN JENDERAL, SUDIRMAN, SENAYAN","","INDIA","","ID566543","PINBIDJAXXX","","0601","","Y"
    

    I have a script that assigns each of the comma separated value to a unique variable using , as the delimiter.

    The portion of the script is as under :

    IFS=,
    
    [ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; }
    
    while read Key  Name    Address1        Address2        City    State   Country SwiftCode       Nid     Chips   Aba     IsSwitching
    do
              echo "-------------------------------------------------------------------"
    
         echo "From Key : $Key"
    
        echo "-------------------------------------------------------------------"
              echo "-------------------------------------------------------------------"
    
         echo "From Name : $Name"
    

    What it does is separate the values having comma inside the quotes against my desired output of uniquely separating each value to their respective variables.

    I tried replacing the comma like IFS=[","] but no luck. Any suggestion/ help is really appreciated.

    • RomanPerekhrest
      RomanPerekhrest about 7 years
      can you post how should look the final expected output?
    • Marty McGowan
      Marty McGowan about 7 years
      Ron Manis (RDB: Unix Relational Database Managment) was adamant about the TAB. "There will never be a TAB character in a data field". So, the TSV format is the guaranteed safe version: "sed 's/^"//; s/"$//; s/","/<TAB>/g' file.csv | tee file.tsv where <TAB> is typed [CTRL]-V{tab}
  • Ashish K
    Ashish K about 7 years
    I had to use sed with -i flag to make changes inline. @terdon thankyou so much for this.
  • terdon
    terdon about 7 years
    @AshishK why make them inline? My (really bad suggestion which you shouldn't use) doesn't require it to be inline and making it inline might break the file for other applications.
  • Ashish K
    Ashish K about 7 years
    The reason why I had to do it using sed is because i was not supposed to use perl and python here in the problem. I had already thought about this solution of replacing , outside quotes with some other character and then adding that character as the delimiter. However when using sed directly, it was opening the file and deleting everything in it. So had to use inline
  • terdon
    terdon about 7 years
    Yes, because you must have tried to do sed 'sed command' file > file instead of sed 'sed command' file | while.
  • Ashish K
    Ashish K about 7 years
    Yes! That was it.. I didn't notice that :) Thankyou.