Find and replace with awk
Solution 1
With sed
assuming you only have 5 columns and the column which should be quoted is column4 in your case:
sed -E 's/"//g; s/^(([^|]*\|){3})(.*)(\|[^|]*)$/\1"\3"\4/' infile
This will work even if no quotes entered by user, if entered any quotes by user or pipes.
Here is an awk
solution too as requested:
$ awk -F'|' '{ for (i=1; i<NF; i++){if (i>3)gsub("\"","");
printf (i!=(NF-1))?$i"|"((i==3)?"\"":""):$i"\"|"}; print $NF}' infile
col1|col2|col3|"col4"|col5
test|test_f|21/03/2017|"|||||USER RIGHTa anything here"|123
Solution 2
Using csvkit
:
$ csvformat -d '|' -D '|' file.csv
col1|col2|col3|col4|col5
test|test_f|21/03/2017|"|||||USER RIGHTa anything here"|123
The -d
and -D
flags define the delimiter used in the input and output respectively.
Related videos on Youtube
Admin
Updated on September 18, 2022Comments
-
Admin over 1 year
I am trying to grep column 4 with pipe delimiter (
|
) and replace all double quotes within it and then enclose that string with double quotes again.sample File:
col1|col2|col3|col4|col5|col6|col7|col8|col9 value1|value2|value3|"|||||value"4|value5|value6|"||value"7|value8|value9
Problem is column 4 is where user can write anything, even a pipe, which is my delimiter and it breaks my process.
E.g. column 4 may be
"|||||value"4
Now I am trying to write command which goes to column 4, replace all double quotes then enclose it with double quote again so I can treat it as single string to make my process work.
Required Output for column 4 is
"|||||value4"
Similarly I have do the same for column 7, required Output for column 7 is
"||value7"
Final output should look like this:
col1|col2|col3|col4|col5|col6|col7|col8|col9 value1|value2|value3|"|||||value4"|value5|value6|"||value7"|value8|value9
please suggest.
-
terdon about 6 yearsWhat operating system are you using? Please mention that in the question instead of using tags for it (OS tags should be used when the question is about the OS, not when you're simply using that OS). And you have tagged with both Linux and Solaris, two very different systems with different versions of the standard tools.
-
Admin about 6 yearsI have to implement it on both Linux and Solaris OS using awk cmd.
-
Angel Todorov about 6 yearsFor properly formatted CSV, you should double any existing double quotes, then enclose in double quotes:
test|test_f|21/03/2017|"""|||||USER ""RIGHTa anything here"|123
-
Angel Todorov about 6 yearsAnd you should be fixing the process that creates the file in the first place, instead of repairing the damage.
-
-
αғsнιη about 6 yearshmm, still I cannot recognize how double quote moved out of string!
-
Kusalananda about 6 years@αғsнιη Double quotes are the default quoting characters for CSV files. The whole field does not need to be quoted, but the delimiters that are part of the data needs to be. CSVkit's
csvformat
is able to read the partially quoted field and outputs it fully quoted. One may also choose to escape the delimiters in the field with-U 3 -P '\'
(for backslash escapes,-U 3
means "no quotes"). -
Kusalananda about 6 years@αғsнιη If the user was using a CSV-aware parser in his workflow, this transformation would not be needed.
-
αғsнιη about 6 yearsOK, thanks, now in OP's question, user fortunately entered two double quotes
"|||||USER "RIGHTa anything here
, what if user only input|||||USER RIGHTa anything here
or"|||||"USER "RIGHTa anything here
or some varied string? does this toolcsvformat
can print the same and result with"|||||USER RIGHTa anything here"
? Where OP is asked to remove quotes and quote whole column 4 again. -
Kusalananda about 6 years@αғsнιη No. In those cases, the quoting is broken, and if the program that writes the data to the CSV file does not insert any other form of quoting characters around the string (at least around the delimiters),
csvformat
will not be able to handle the data. You could probably solve that by parsing each row backwards withawk
or something to "peel off" the last column. -
αғsнιη about 6 yearsthank you, so this solution only will work if user input quotes and with even numbers of quotes, correct? or only 2 quotes?
-
Kusalananda about 6 years@αғsнιη It will work as long as the delimiter characters in the data field are quoted with double quotes and as long as the double quotes are paired up properly.
-
αғsнιη about 6 yearscolumn4 is your column(NF-1) already!
-
Admin about 6 years@αғsнιη-Thanks this works like a charm. Ran into another problem where I have 2 faulty columns and count from end of the line fails(NF-1). col1|col2|col3|col4|col5|col6|col7|col8|col9 test|test_f|21/03/2017|"|||||USER "RIGHTa anything "here|123|erer|""||User Comments|test|test23 Here is problem with col4 : "|||||USER "RIGHTa anything "here and col7 : ""||User Comments
-
αғsнιη about 6 yearsstill you didn't edit into your question with these, since I cannot recognize where is the next line starts.
-
Admin about 6 yearsI have edited my question, thanks for looking into it.
-
αғsнιη about 6 yearsI should say that it's not possible to fix your broken file with this way since the broken column is in middle of the line, you need go and fix the process creating this file first.