How can I convert tab delimited data to comma delimited data?

shell-script text-processing sed awk csv

16,192

Solution 1

#!/usr/bin/awk -f

BEGIN { FS = "\t"; OFS = "," }
{
    for(i = 1; i <= NF; i++) {
        if ($i + 0 == $i) { $i = "=" $i }
        else gsub(/"/, "\"\"", $i);
        $i = "\"" $i "\""
    }
    print
}

Assuming you name this convert.awk, you can either call with either

ec2-describe-snapshots -H --hide-tags | awk -f convert.awk > snapshots.csv

or (after adding execute permissions, chmod a+x convert.awk)

ec2-describe-snapshots -H --hide-tags | ./convert.awk > snapshots.csv

This will make a new column for each tab, which will keep the comment column together (unless it contains tabs), but add empty columns (though that is how your sample output looks, so maybe you actually do want that). If you want to split on all whitespace (this will collapse extra tabs within the table but put each word in the description as a new column), take out the FS="\t"; statement.

For future generations, if you don't need the "s or =s or embedded whitespace, you can make it a one-liner:

awk -v OFS=, '{$1=$1;print}'

Solution 2

Here's a perl solution. This might be possible with sed/awk, but testing for the numeric part would likely make it pretty ugly.

ec2-describe-snapshots -H --hide-tags | \
perl -e 'use Scalar::Util qw(looks_like_number);
         while (chomp($line = <STDIN>)) {
             print(join(",", map { "\"" . (looks_like_number($_) ? "=$_" :
                                           do {s/"/""/g; $_}) . "\"" }
             split(/\t/, $line)) . "\n");
         }' \
> snapshots.csv

Solution 3

If you're just lazy like me and want to do it all on one command line without writing a script, here's how i'd do it.

ec2-describe-snapshots -H --hide-tags | sed -e 's/^I/","/g' | sed -e 's/^/"/' | sed -e 's/$/"/'> snapshots.csv

The ^I is made by pressing ctrl+v i.

The first sed swaps all the tabs for ",". The second sed inserts a " at the beginning of each line, and the last sed inserts a closing " at the end of each line.

Solution 4

Another Perl solution:

#!/usr/bin/perl -wln
use strict;

my($n,$s);chomp();
for $s ( split(/\t/,$_) )
{
    $s = '='.$s if ($s =~ /^\d+$/);
    $n.= '"'.$s.'",';
}
$n =~ s/(.*),/$1/;print $n;

invoke with ec2-describe-snapshots -H --hide-tags | /var/tmp/script.pl > output.txt

Solution 5

sed is the most useful linux utility I have ever encountered.

sed 's/\t/","/g' TabSeparatedValues.txt > CommaSeparatedValues.csv
sed -i 's/.*/"&"/' CommaSeparatedValues.csv

The first command replaces all tabs in every line with commas and quotes. The second command inserts quotes at the beginning and end of each line, so that each values will be surrounded in quotes, which allows commas to be part of the value.

View more solutions

16,192

cwd

Updated on September 18, 2022

Comments

cwd over 1 year
I'm requesting a list of ec2 snapshots via amazon's ec2 command line tool:
```
ec2-describe-snapshots -H --hide-tags > snapshots.csv
```
The data looks something like this:
```
SnapshotId      VolumeId        StartTime   OwnerId         VolumeSize  Description
snap-00b66464   vol-b99a38d0    2012-01-05  5098939         160         my backup
```
How can I intercept the data before redirecting it to snapshots.csv and do the following things:
- replace "tabs" with commas
- encapsulate values with quotations
- if a value is all numbers, prefix it with an = so that excel will treat it as text - for example OwnerId should be "=5098939" (this one is not necessary if it cannot be done inline and would instead require a script file or function)
desired output:
```
"SnapshotId","VolumeId","StartTime","OwnerId","VolumeSize","Description"
"snap-00b66464","vol-b99a38d0","2012-01-05","=5098939","=160","my backup"
```
- Ignacio Vazquez-Abrams over 12 years
  
  This is where someone tells you to import using tabs. Or they would, if Excel wasn't on crack.
- cwd over 12 years
  
  Yeah I'm trying to help excel out a little bit since it doesn't seem to be doing so hot on it's own. Also having a CSV file that can just be opened instead of having to use the import menu command is always nice. I already tried changing the extension to ".tsv" with no luck.
- phemmer over 12 years
  
  I think your desired output is a bit off. You have a lot of empty fields in there (the empty quotes).
phemmer over 12 years

Nice clean solution. Thought it would end up a lot uglier than that, but then I'm not a awk person :-)
cwd over 12 years

so do i save this into a file such as ./convert.sh, chmod +x, and then pipe the input into it so that it will print the output? I'm getting an error: /usr/bin/awk: syntax error at source line 1 context is >>> . <<< /convert.sh.
Kevin over 12 years

@cwd You can save it in a file, I'd suggest convert.awk to indicate it's an awk script and not a bash one. I updated the post with the full command line, and note that I added a -f flag I had forgotten to the first line (that tells it to interpret the file as commands).
Stylex over 12 years

How did you get the ctrl + v i to show up like that?
jw013 over 12 years

@burhan The syntax is <kbd>text</kbd>.
Arcege over 12 years

Or in one line: sed -e 's/^I/","/g' -e 's/.*/"&"/' or even shorter sed -e 's/^I/","/g;s/.*/"&"/'.
phemmer over 12 years

Scalar::Util isnt an outside module, it comes with standard perl.
Jim over 12 years

True. Apologies for poorly wording my intended comment. Thank you for the correction.
Paul_Pedant about 4 years

The one-liner version treats any whitespace as a field separator, not just tabs. Needs a -F'\t' before the -V.