Ignore delimiter present inside quotes
You are doing a couple of things wrong here:
-
You are using the shell to parse text.
While this is possible, it is very inefficient. It is slow, hard to write, hard to read and very hard to do properly. The shell just isn't designed for this sort of thing.
-
You are trying to parse a csv file without a csv parser.
CSV is not a simple format. You can have fields that contain the delimiter as you do here. You can also have fields spanning multiple lines. Attempting to parse arbitrary CSV data with simple pattern matching is very, very complicated and extremely hard to get right.
The bad, hacky solution is to do something like this:
$ sed 's/","/"|"/g' file.csv |
while IFS='|' read -r Key Name Address1 Address2 City \
State Country SwiftCode Nid Chips Aba IsSwitching; do
echo "From Key : $Key"; echo "From Name : $Name";
done
From Key : "ID0054XX"
From Name : "PT. SUMUT"
From Key : "ID00037687"
From Name : "PAN INDONESIA, PT."
That will replace all ","
with "|"
and then use |
as the delimiter. Of course, that will fail if any of your fields can contain |
.
The good, clean approach is to use a proper scripting language, not the shell, and a csv parser. For example, in Perl1:
$ cat file.csv | perl -MText::CSV -le '
$csv = Text::CSV->new({binary=>1});
while ($row = $csv->getline(STDIN)){ my ($Key, $Name, $Address1, $Address2, $City, $State, $Country, $SwiftCode, $Nid, $Chips, $Aba, $IsSwitching) = @$row;
print "From Key: $Key\nFrom Name: $Name";}'
From Key: ID0054XX
From Name: PT. SUMUT
From Key: ID00037687
From Name: PAN INDONESIA, PT.
Or, as a script:
#!/usr/bin/perl -l
use strict;
use warnings;
use Text::CSV;
open(my $fh, "file.csv");
my $csv = Text::CSV->new({binary=>1});
while (my $row = $csv->getline($fh)){
my (
$Key, $Name, $Address1, $Address2, $City,
$State, $Country, $SwiftCode, $Nid, $Chips,
$Aba, $IsSwitching
) = @$row;
print "From Key: $Key\nFrom Name: $Name";
}
Note that you'll have to install the Text::CSV
module first (cpanm Text::CSV
) and you might want to install cpanm
(package cpanminus
on most distributions)
Alternatively, in Python 3:
#!/usr/bin/env python3
import csv
with open('file.csv', newline='') as csvfile:
linereader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in linereader:
print("From Key: %s\nFrom Name: %s" % (row[0], row[1]))
Saving the Python code above as a script and executing it on your file will print:
$ foo.py
From Key: ID0054XX
From Name: PT. SUMUT
From Key: ID00037687
From Name: PAN INDONESIA, PT.
1Yes, I am aware that that is a UUoC, but it's simpler to write as a one liner this way.
Related videos on Youtube
Comments
-
Ashish K over 1 year
I have a a
.csv
file as follows:"ID0054XX","PT. SUMUT","18 JL.BONJOL","SUMATERA UTARA, NORTH","MEDAN","","ID9856","PDSUIDSAXXX","","","","Y" "ID00037687","PAN INDONESIA, PT.","JALAN JENDERAL, SUDIRMAN, SENAYAN","","INDIA","","ID566543","PINBIDJAXXX","","0601","","Y"
I have a script that assigns each of the comma separated value to a unique variable using
,
as the delimiter.The portion of the script is as under :
IFS=, [ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; } while read Key Name Address1 Address2 City State Country SwiftCode Nid Chips Aba IsSwitching do echo "-------------------------------------------------------------------" echo "From Key : $Key" echo "-------------------------------------------------------------------" echo "-------------------------------------------------------------------" echo "From Name : $Name"
What it does is separate the values having comma inside the quotes against my desired output of uniquely separating each value to their respective variables.
I tried replacing the comma like
IFS=[","]
but no luck. Any suggestion/ help is really appreciated.-
RomanPerekhrest about 7 yearscan you post how should look the final expected output?
-
Marty McGowan about 7 yearsRon Manis (RDB: Unix Relational Database Managment) was adamant about the TAB. "There will never be a TAB character in a data field". So, the TSV format is the guaranteed safe version: "sed 's/^"//; s/"$//; s/","/<TAB>/g' file.csv | tee file.tsv where <TAB> is typed [CTRL]-V{tab}
-
-
Ashish K about 7 yearsI had to use sed with
-i
flag to make changes inline. @terdon thankyou so much for this. -
terdon about 7 years@AshishK why make them inline? My (really bad suggestion which you shouldn't use) doesn't require it to be inline and making it inline might break the file for other applications.
-
Ashish K about 7 yearsThe reason why I had to do it using
sed
is because i was not supposed to use perl and python here in the problem. I had already thought about this solution of replacing,
outside quotes with some other character and then adding that character as the delimiter. However when usingsed
directly, it was opening the file and deleting everything in it. So had to use inline -
terdon about 7 yearsYes, because you must have tried to do
sed 'sed command' file > file
instead ofsed 'sed command' file | while
. -
Ashish K about 7 yearsYes! That was it.. I didn't notice that :) Thankyou.