Escaping commas inside double quotes as field separator in awk

5,339

Solution 1

Your second example almost work. You are just missing a , as the output separator (OFS=",") and to put double quotes around the new calculated date. This works:

BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"; OFS=","}
NR==1 {
        for (i=1; i<=NF; i++) {
            f[$i] = i
        }
      }
NR>1  {
        begSecs = mktime( gensub(/[":-]/," ","g",$(f["DateTime"])) )
        endSecs = begSecs + $(f["TotalDuration"])
        $(f["CallEndTime"]) = "\"" strftime("%Y-%m-%d %H:%M:%S", endSecs) "\""
      }
{print}

For the examples given.

But there is quite more about csv's than awk could handle. As other answers already have recommended, use a tool that understand the csv format correctly.

Solution 2

I would not use awk to parse csv files, better to use dedicated tools, e.g. using python csv module:

#!/usr/bin/env python3
import csv, shutil
from tempfile import NamedTemporaryFile
from datetime import datetime 
from datetime import timedelta

tempfile = NamedTemporaryFile(mode='w', delete=False)

with open('input.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    writer = csv.DictWriter(tempfile, fieldnames=reader.fieldnames)
    writer.writeheader()

    for row in reader:
        row['CallEndTime']=datetime.strptime(row['DateTime'], '%Y-%m-%d %H:%M:%S') + timedelta(seconds=int(row['TotalDuration']))
        writer.writerow(row)

shutil.move(tempfile.name, 'output.csv')

output.csv:

DateTime,Dealers,Locations,CallEndTime,TotalDuration
2018-12-27 12:19:14,"Dealer1,Dealer2,Dealer3",Gujarat,2018-12-27 12:20:21,67
2018-12-27 12:19:14,"Dealer1,Dealer2","Gujarat,Vadodara",2018-12-27 12:20:54,100

Solution 3

With csvkit >= 1.0.4 (current development version) you can use csvsql:

csvsql --query '
    update input
    set CallEndTime = datetime(DateTime,"+"||TotalDuration||" seconds");
' input.csv
Share:
5,339

Related videos on Youtube

mittu
Author by

mittu

Updated on September 18, 2022

Comments

  • mittu
    mittu over 1 year

    I need to add two fields to a csv file. The separator of csv fields is the comma, and some fields are inside double quotes. The problem is that, inside the double quoted fields, is posible to also find a comma. How to split that with awk?
    The fields result from a mongo export. Th position of these fields might change,

    Sample input csv,

     DateTime,Dealers,Locations,CallEndTime,TotalDuration
    "2018-12-27 12:19:14","Dealer1,Dealer2,Dealer3","Gujarat",,67,,
    "2018-12-27 12:19:14","Dealer1,Dealer2","Gujarat,Vadodara",,100,
    

    Sample output csv,

     DateTime,Dealers,Locations,CallEndTime,TotalDuration
    "2019-01-07 11:35:42","Dealer1,Dealer2,Dealer3","Gujarat","2019-01-07 11:36:51",69,,
    "2018-12-27 12:19:14","Dealer1,Dealer2","Gujarat,Vadodara","2018-12-27 12:19:14,78",
    

    Awk code:

    BEGIN { FSOFS=","}
    NR==1 {
            for (i=1; i<=NF; i++) {
                f[$i] = i
            }
          }
    NR>1  {
            begSecs = mktime( gensub( /[":-]/, " ", "g", $(f["DateTime"]) ) )
            endSecs = begSecs + $(f["TotalDuration"])
            $(f["CallEndTime"]) = strftime("%Y-%m-%d %H:%M:%S", endSecs)
    }
    {print}
    

    I don't want to consider the commas inside the double quotes as FS, I have seen that this can be done using FPAT but I haven't got any clue about how to use that here in case,

    BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"}
    NR==1 {
            for (i=1; i<=NF; i++) {
                f[$i] = i
            }
          }
    NR>1  {
            begSecs = mktime( gensub(/[":-]/," ","g",$(f["DateTime"])) )
            endSecs = begSecs + $(f["TotalDuration"])
            $(f["CallEndTime"]) = strftime("%Y-%m-%d %H:%M:%S", endSecs)
          }
    {print}
    
    • Marek Zakrzewski
      Marek Zakrzewski over 5 years
    • mittu
      mittu over 5 years
      I have seen that, but I want the all the values to be in comma separated only then I could view it in a csv file, I have no idea how to do that
    • aborruso
      aborruso over 5 years
      Could you add the csv output you want?
  • mittu
    mittu over 5 years
    I tried that, But the comas inside double quotes are also getting considered, I have no Idea why, I tested by using OFS="|", the comas inside double quotes are also getting replaced by OFS.
  • done
    done over 5 years
    @mittu What version of awk are you using? If in GNU awk, call awk --version. It works for me with version 4.1.4
  • done
    done over 5 years
    Hmmm. it should work. (1) Check that the " are actually ASCII 22 inside the source text file. (2) Are you placing the awk code inside a text file that starts with #!/bin/awk -f, called awkfile and then using the command awk -f awkfile textfile ? @mittu
  • mittu
    mittu over 5 years
    2)Yes I am using like that, I have an t.awk file which I calls from a .sh file, using awk -f t.awk textfile, 1) Sorry, I don't know how to check that @Issac
  • done
    done over 5 years
    Try grep -P '\x22' file, grep should highlight in red all the '"' . Or try: cat inf11 | tr -d '\042' which should remove all the '"' in the file.
  • mittu
    mittu over 5 years
    I can't remove the """ because that will make more fields in the csv, There are many comma seperated fields inside """"
  • done
    done over 5 years
    That is a temporal test to "look" at the results in the command line, not to permanently remove anything. Looking at the output of both commands you will be able to find if the "double quotes" are actually correct. I don't know what other problem is there that prevents you from using the script provided. In any case: good luck.