Find and replace with awk

6,517

Solution 1

With sed assuming you only have 5 columns and the column which should be quoted is column4 in your case:

sed -E 's/"//g; s/^(([^|]*\|){3})(.*)(\|[^|]*)$/\1"\3"\4/' infile

This will work even if no quotes entered by user, if entered any quotes by user or pipes.

Here is an awk solution too as requested:

$ awk -F'|' '{ for (i=1; i<NF; i++){if (i>3)gsub("\"",""); 
  printf (i!=(NF-1))?$i"|"((i==3)?"\"":""):$i"\"|"}; print $NF}' infile

col1|col2|col3|"col4"|col5
test|test_f|21/03/2017|"|||||USER RIGHTa anything here"|123

Solution 2

Using csvkit:

$ csvformat -d '|' -D '|' file.csv
col1|col2|col3|col4|col5
test|test_f|21/03/2017|"|||||USER RIGHTa anything here"|123

The -d and -D flags define the delimiter used in the input and output respectively.

Share:
6,517

Related videos on Youtube

Admin
Author by

Admin

Updated on September 18, 2022

Comments

  • Admin
    Admin over 1 year

    I am trying to grep column 4 with pipe delimiter (|) and replace all double quotes within it and then enclose that string with double quotes again.

    sample File:

    col1|col2|col3|col4|col5|col6|col7|col8|col9
    value1|value2|value3|"|||||value"4|value5|value6|"||value"7|value8|value9
    

    Problem is column 4 is where user can write anything, even a pipe, which is my delimiter and it breaks my process.

    E.g. column 4 may be

    "|||||value"4
    

    Now I am trying to write command which goes to column 4, replace all double quotes then enclose it with double quote again so I can treat it as single string to make my process work.

    Required Output for column 4 is

    "|||||value4"
    

    Similarly I have do the same for column 7, required Output for column 7 is

    "||value7"
    

    Final output should look like this:

    col1|col2|col3|col4|col5|col6|col7|col8|col9
    value1|value2|value3|"|||||value4"|value5|value6|"||value7"|value8|value9
    

    please suggest.

    • terdon
      terdon about 6 years
      What operating system are you using? Please mention that in the question instead of using tags for it (OS tags should be used when the question is about the OS, not when you're simply using that OS). And you have tagged with both Linux and Solaris, two very different systems with different versions of the standard tools.
    • Admin
      Admin about 6 years
      I have to implement it on both Linux and Solaris OS using awk cmd.
    • Angel Todorov
      Angel Todorov about 6 years
      For properly formatted CSV, you should double any existing double quotes, then enclose in double quotes: test|test_f|21/03/2017|"""|||||USER ""RIGHTa anything here"|123
    • Angel Todorov
      Angel Todorov about 6 years
      And you should be fixing the process that creates the file in the first place, instead of repairing the damage.
  • αғsнιη
    αғsнιη about 6 years
    hmm, still I cannot recognize how double quote moved out of string!
  • Kusalananda
    Kusalananda about 6 years
    @αғsнιη Double quotes are the default quoting characters for CSV files. The whole field does not need to be quoted, but the delimiters that are part of the data needs to be. CSVkit's csvformat is able to read the partially quoted field and outputs it fully quoted. One may also choose to escape the delimiters in the field with -U 3 -P '\' (for backslash escapes, -U 3 means "no quotes").
  • Kusalananda
    Kusalananda about 6 years
    @αғsнιη If the user was using a CSV-aware parser in his workflow, this transformation would not be needed.
  • αғsнιη
    αғsнιη about 6 years
    OK, thanks, now in OP's question, user fortunately entered two double quotes "|||||USER "RIGHTa anything here, what if user only input |||||USER RIGHTa anything here or "|||||"USER "RIGHTa anything here or some varied string? does this tool csvformat can print the same and result with "|||||USER RIGHTa anything here"? Where OP is asked to remove quotes and quote whole column 4 again.
  • Kusalananda
    Kusalananda about 6 years
    @αғsнιη No. In those cases, the quoting is broken, and if the program that writes the data to the CSV file does not insert any other form of quoting characters around the string (at least around the delimiters), csvformat will not be able to handle the data. You could probably solve that by parsing each row backwards with awk or something to "peel off" the last column.
  • αғsнιη
    αғsнιη about 6 years
    thank you, so this solution only will work if user input quotes and with even numbers of quotes, correct? or only 2 quotes?
  • Kusalananda
    Kusalananda about 6 years
    @αғsнιη It will work as long as the delimiter characters in the data field are quoted with double quotes and as long as the double quotes are paired up properly.
  • αғsнιη
    αғsнιη about 6 years
    column4 is your column(NF-1) already!
  • Admin
    Admin about 6 years
    @αғsнιη-Thanks this works like a charm. Ran into another problem where I have 2 faulty columns and count from end of the line fails(NF-1). col1|col2|col3|col4|col5|col6|col7|col8|col9 test|test_f|21/03/2017|"|||||USER "RIGHTa anything "here|123|erer|""||User Comments|test|test23 Here is problem with col4 : "|||||USER "RIGHTa anything "here and col7 : ""||User Comments
  • αғsнιη
    αғsнιη about 6 years
    still you didn't edit into your question with these, since I cannot recognize where is the next line starts.
  • Admin
    Admin about 6 years
    I have edited my question, thanks for looking into it.
  • αғsнιη
    αғsнιη about 6 years
    I should say that it's not possible to fix your broken file with this way since the broken column is in middle of the line, you need go and fix the process creating this file first.