Converting CSV to TSV
Solution 1
Python
Add to file named csv2tab
, and make it executable
touch csv2tab && chmod u+x csv2tab
Add to it
#!/usr/bin/env python
import csv, sys
csv.writer(sys.stdout, dialect='excel-tab').writerows(csv.reader(sys.stdin))
Test runs
$ echo 'A,,C,"D,E,F","G",I,"K,L,M",Z' | ./csv2tab
A C D,E,F G I K,L,M Z
$ ./csv2tab < data.csv > data.tsv && head data.tsv
1A C D,E,F G I K,L,M Z
2A C D,E,F G I K,L,M Z
3A C D,E,F G I K,L,M Z
Solution 2
Using csvkit
(Python), for example:
$ csvformat -T in.csv > out.txt
Does streaming, with correct CSV and TSV quoting and escaping
It's in apt and other package managers
Solution 3
For fun, sed
.
sed -E 's/("([^"]*)")?,/\2\t/g' file
If your sed
doesn't support -E
, try with -r
. If your sed
doesn't support \t
for a literal tab, try putting a literal tab (in many shells, ctrl-v tab) or in Bash, use a $'...'
C-style string (in which case the backslash in \2
needs to be doubled). If you want to keep the quotes, use \1
instead of \2
(in which case the inner pair of parentheses is useless, and can be removed).
If your sed
doesn't support either -E
or -r
, try
sed 's/\("\([^"]*\)"\)\?,/\2\t/g' file
again possibly with the tweaks suggested above if \t
is not supported.
For additional fun, here's the same thing with the Bash "here-string" syntax, just to demonstrate what it looks like. Notice how the literal backslashes we want sed
to receive are now doubled:
sed $'s/\\("\\([^"]*\\)"\\)\\?,/\\2\t/g' file
This makes no attempt to handle escaped double quotes inside double quotes; some CSV dialects support this by doubling the quoted double quote (sic).
Solution 4
One option might be perl's Text::CSV module e.g.
perl -MText::CSV -lne 'BEGIN { $csv = Text::CSV->new() }
print join "\t", $csv->fields() if $csv->parse($_)
' somefile
to demonstrate
echo 'A,,C,"D,E,F","G",I,"K,L,M",Z' |
perl -MText::CSV -lne 'BEGIN { $csv = Text::CSV->new() }
print join "\t", $csv->fields() if $csv->parse($_)
'
A C D,E,F G I K,L,M Z
Solution 5
I authored an open-source CSV to TSV converter that handles the transformations described. It's quite fast, may be worth a look if there's an on-going need to convert large CSV files. Tool is part of eBay's TSV utilities toolkit (csv2tsv documentation here). Default options suffice for the input described:
$ csv2tsv file.csv > file.tsv
A consideration when converting CSV to TSV is handling of field and record delimiters (comma and newline) in the data. CSV uses an escape syntax. If the goal is to use the output with Unix tools like cut
, awk
, etc., the output needs to be free of escapes. Most solutions listed here produce CSV style escapes when delimiters are in the data. csv2tsv
is differentiated from other solutions in that it produces TSV without escapes. See the documentation for details.
To see what a particular solution does, convert a CSV containing commas, tabs, quotes, and newlines in the data. For example:
$ echo $'Line,Field1,Field2\n1,"Comma: |,|","Quote: |""|"\n"2","TAB: |\t|","Newline: |\n|"' | <conversion-script-or-command>
Solutions generating escapes will put double quotes around the fields containing quotes, newlines, or tabs.
Related videos on Youtube
Admin
Updated on September 18, 2022Comments
-
Admin over 1 year
I have a number of large CSV files and would like them in TSV (tab separated format). The complication is that there are commas in the fields of the CSV file, eg:
A,,C,"D,E,F","G",I,"K,L,M",Z
Expected output:
A C D,E,F G I K,L,M Z
(where whitespace in between are 'hard' tabs)
I have Perl, Python, and coreutils installed on this server.
-
George Vasiliou about 7 yearsI think that i tried about 100 different sed scripts to achieve this one but all my attempts failed. This is awesome.
-
Pont about 7 yearsI think the true thermonuclear flyswatter would be writing a Java utility to do it via LibreOffice's UNO API :).
-
agc about 6 yearsWhen the input data
'A,,C,"D,E,F","G",I,"K,L,M","Z,A"'
is input to this answer, then the"Z,A"
is incorrectly replaced withZ A
, rather than the correctZ,A
. -
kev about 4 yearsi get the following error when I try your solution:
No protocol specified import-im6.q16: unable to open X server :0' @ error/import.c/ImportImageCommand/358. csv2tab.sh: line 3: syntax error near unexpected token sys.stdout,' csv2tab.sh: line 3: csv.writer(sys.stdout, dialect='excel-tab').writerows(csv.reader(sys.stdin))'
-
OneCricketeer about 4 years@kev This code doesn't require an X Server, so seems you may have some other issue
-
Ricardo Mayerhofer almost 4 yearsCommand line I used to fix the quoting: csvformat -U 3 -Q "" -T in.csv > out.tsv
-
JJoao almost 4 yearsobfuscated but very cool!
-
nisetama almost 4 yearsThis version also works if a line ends with a double-quoted field:
echo '1337,"test input"'|sed -r 's/""/\v/g;s/("([^"]+)")?,/\2\t/g;s/"([^"]+)"$/\1/;s/\v/"/g'
(the dollar sign is not required, but it's used to clarify the intent of the command). Both your and my version fail when a field consists of 4 double quotes (i.e. a single double quote in escaped form):"next field is a single escaped double quote",""""
. -
Mr. Lance E Sloan over 3 yearsI didn't see anything in the original question about disabling headers. When I decided to use
mlr
to convert a CSV file (with headers) to an equivalent TSV file (keeping headers, naturally), I just used:mlr --c2t cat input.csv > output.tsv
The results were perfect. -
nstatam over 2 yearsyour command works with my sed version (4.4). Unfortunatly with another Linux the sed version is 4.2.2 and don't work with the same data, even after replace
-E
by-r
. Do you have a clue how to fix that ? Thanks -
tripleee over 2 years@nstatam If you get an error message for both
-E
and-r
, you will need to refactor to a POSIX BRE regex (see updated answer now). If you don't, I'm guessing you transcribed the script incorrectly. Did you try with a literal tab instead of the symbolic sequence\t
? Which Linux distro exactly is this? -
tom over 2 yearsGreat answer, but note that it doesn't strip the double quotes from the final field if there is no trailing comma (e.g.
"a","b"
becomesa<TAB>"b"
). This can be fixed by appending a comma at the start, then removing the corresponding trailing tab afterward:sed -E 's/$/,/; s/("([^"]*)")?,/\2\t/g; s/\t$//'