Escaping commas inside double quotes as field separator in awk
Solution 1
Your second example almost work. You are just missing a ,
as the output separator (OFS=","
) and to put double quotes around the new calculated date. This works:
BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"; OFS=","}
NR==1 {
for (i=1; i<=NF; i++) {
f[$i] = i
}
}
NR>1 {
begSecs = mktime( gensub(/[":-]/," ","g",$(f["DateTime"])) )
endSecs = begSecs + $(f["TotalDuration"])
$(f["CallEndTime"]) = "\"" strftime("%Y-%m-%d %H:%M:%S", endSecs) "\""
}
{print}
For the examples given.
But there is quite more about csv's than awk could handle. As other answers already have recommended, use a tool that understand the csv format correctly.
Solution 2
I would not use awk
to parse csv files, better to use dedicated tools, e.g. using python csv module:
#!/usr/bin/env python3
import csv, shutil
from tempfile import NamedTemporaryFile
from datetime import datetime
from datetime import timedelta
tempfile = NamedTemporaryFile(mode='w', delete=False)
with open('input.csv') as csvfile:
reader = csv.DictReader(csvfile)
writer = csv.DictWriter(tempfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
row['CallEndTime']=datetime.strptime(row['DateTime'], '%Y-%m-%d %H:%M:%S') + timedelta(seconds=int(row['TotalDuration']))
writer.writerow(row)
shutil.move(tempfile.name, 'output.csv')
output.csv:
DateTime,Dealers,Locations,CallEndTime,TotalDuration
2018-12-27 12:19:14,"Dealer1,Dealer2,Dealer3",Gujarat,2018-12-27 12:20:21,67
2018-12-27 12:19:14,"Dealer1,Dealer2","Gujarat,Vadodara",2018-12-27 12:20:54,100
Solution 3
With csvkit >= 1.0.4 (current development version) you can use csvsql
:
csvsql --query '
update input
set CallEndTime = datetime(DateTime,"+"||TotalDuration||" seconds");
' input.csv
Related videos on Youtube
mittu
Updated on September 18, 2022Comments
-
mittu over 1 year
I need to add two fields to a csv file. The separator of csv fields is the
comma
, and some fields are inside double quotes. The problem is that, inside the double quoted fields, is posible to also find a comma. How to split that with awk?
The fields result from a mongo export. Th position of these fields might change,Sample input csv,
DateTime,Dealers,Locations,CallEndTime,TotalDuration "2018-12-27 12:19:14","Dealer1,Dealer2,Dealer3","Gujarat",,67,, "2018-12-27 12:19:14","Dealer1,Dealer2","Gujarat,Vadodara",,100,
Sample output csv,
DateTime,Dealers,Locations,CallEndTime,TotalDuration "2019-01-07 11:35:42","Dealer1,Dealer2,Dealer3","Gujarat","2019-01-07 11:36:51",69,, "2018-12-27 12:19:14","Dealer1,Dealer2","Gujarat,Vadodara","2018-12-27 12:19:14,78",
Awk code:
BEGIN { FSOFS=","} NR==1 { for (i=1; i<=NF; i++) { f[$i] = i } } NR>1 { begSecs = mktime( gensub( /[":-]/, " ", "g", $(f["DateTime"]) ) ) endSecs = begSecs + $(f["TotalDuration"]) $(f["CallEndTime"]) = strftime("%Y-%m-%d %H:%M:%S", endSecs) } {print}
I don't want to consider the commas inside the double quotes as FS, I have seen that this can be done using FPAT but I haven't got any clue about how to use that here in case,
BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"} NR==1 { for (i=1; i<=NF; i++) { f[$i] = i } } NR>1 { begSecs = mktime( gensub(/[":-]/," ","g",$(f["DateTime"])) ) endSecs = begSecs + $(f["TotalDuration"]) $(f["CallEndTime"]) = strftime("%Y-%m-%d %H:%M:%S", endSecs) } {print}
-
Marek Zakrzewski over 5 yearsHave you seen gnu.org/software/gawk/manual/html_node/…
-
mittu over 5 yearsI have seen that, but I want the all the values to be in comma separated only then I could view it in a csv file, I have no idea how to do that
-
aborruso over 5 yearsCould you add the csv output you want?
-
-
mittu over 5 yearsI tried that, But the comas inside double quotes are also getting considered, I have no Idea why, I tested by using OFS="|", the comas inside double quotes are also getting replaced by OFS.
-
done over 5 years@mittu What version of awk are you using? If in GNU awk, call
awk --version
. It works for me with version 4.1.4 -
done over 5 yearsHmmm. it should work. (1) Check that the
"
are actually ASCII 22 inside the source text file. (2) Are you placing the awk code inside a text file that starts with#!/bin/awk -f
, calledawkfile
and then using the commandawk -f awkfile textfile
? @mittu -
mittu over 5 years2)Yes I am using like that, I have an t.awk file which I calls from a .sh file, using awk -f t.awk textfile, 1) Sorry, I don't know how to check that @Issac
-
done over 5 yearsTry
grep -P '\x22' file
, grep should highlight in red all the '"' . Or try:cat inf11 | tr -d '\042'
which should remove all the '"' in the file. -
mittu over 5 yearsI can't remove the """ because that will make more fields in the csv, There are many comma seperated fields inside """"
-
done over 5 yearsThat is a temporal test to "look" at the results in the command line, not to permanently remove anything. Looking at the output of both commands you will be able to find if the "double quotes" are actually correct. I don't know what other problem is there that prevents you from using the script provided. In any case: good luck.