How can I convert tab delimited data to comma delimited data?

16,192

Solution 1

#!/usr/bin/awk -f

BEGIN { FS = "\t"; OFS = "," }
{
    for(i = 1; i <= NF; i++) {
        if ($i + 0 == $i) { $i = "=" $i }
        else gsub(/"/, "\"\"", $i);
        $i = "\"" $i "\""
    }
    print
}

Assuming you name this convert.awk, you can either call with either

ec2-describe-snapshots -H --hide-tags | awk -f convert.awk > snapshots.csv

or (after adding execute permissions, chmod a+x convert.awk)

ec2-describe-snapshots -H --hide-tags | ./convert.awk > snapshots.csv

This will make a new column for each tab, which will keep the comment column together (unless it contains tabs), but add empty columns (though that is how your sample output looks, so maybe you actually do want that). If you want to split on all whitespace (this will collapse extra tabs within the table but put each word in the description as a new column), take out the FS="\t"; statement.

For future generations, if you don't need the "s or =s or embedded whitespace, you can make it a one-liner:

awk -v OFS=, '{$1=$1;print}'

Solution 2

Here's a perl solution. This might be possible with sed/awk, but testing for the numeric part would likely make it pretty ugly.

ec2-describe-snapshots -H --hide-tags | \
perl -e 'use Scalar::Util qw(looks_like_number);
         while (chomp($line = <STDIN>)) {
             print(join(",", map { "\"" . (looks_like_number($_) ? "=$_" :
                                           do {s/"/""/g; $_}) . "\"" }
             split(/\t/, $line)) . "\n");
         }' \
> snapshots.csv

Solution 3

If you're just lazy like me and want to do it all on one command line without writing a script, here's how i'd do it.

ec2-describe-snapshots -H --hide-tags | sed -e 's/^I/","/g' | sed -e 's/^/"/' | sed -e 's/$/"/'> snapshots.csv

The ^I is made by pressing ctrl+v i.

The first sed swaps all the tabs for ",". The second sed inserts a " at the beginning of each line, and the last sed inserts a closing " at the end of each line.

Solution 4

Another Perl solution:

#!/usr/bin/perl -wln
use strict;

my($n,$s);chomp();
for $s ( split(/\t/,$_) )
{
    $s = '='.$s if ($s =~ /^\d+$/);
    $n.= '"'.$s.'",';
}
$n =~ s/(.*),/$1/;print $n;

invoke with ec2-describe-snapshots -H --hide-tags | /var/tmp/script.pl > output.txt

Solution 5

sed is the most useful linux utility I have ever encountered.

sed 's/\t/","/g' TabSeparatedValues.txt > CommaSeparatedValues.csv
sed -i 's/.*/"&"/' CommaSeparatedValues.csv

The first command replaces all tabs in every line with commas and quotes. The second command inserts quotes at the beginning and end of each line, so that each values will be surrounded in quotes, which allows commas to be part of the value.

Share:
16,192

Related videos on Youtube

cwd
Author by

cwd

Updated on September 18, 2022

Comments

  • cwd
    cwd over 1 year

    I'm requesting a list of ec2 snapshots via amazon's ec2 command line tool:

    ec2-describe-snapshots -H --hide-tags > snapshots.csv
    

    The data looks something like this:

    SnapshotId      VolumeId        StartTime   OwnerId         VolumeSize  Description
    snap-00b66464   vol-b99a38d0    2012-01-05  5098939         160         my backup
    

    How can I intercept the data before redirecting it to snapshots.csv and do the following things:

    • replace "tabs" with commas
    • encapsulate values with quotations
    • if a value is all numbers, prefix it with an = so that excel will treat it as text - for example OwnerId should be "=5098939" (this one is not necessary if it cannot be done inline and would instead require a script file or function)

    desired output:

    "SnapshotId","VolumeId","StartTime","OwnerId","VolumeSize","Description"
    "snap-00b66464","vol-b99a38d0","2012-01-05","=5098939","=160","my backup"
    
    • Ignacio Vazquez-Abrams
      Ignacio Vazquez-Abrams over 12 years
      This is where someone tells you to import using tabs. Or they would, if Excel wasn't on crack.
    • cwd
      cwd over 12 years
      Yeah I'm trying to help excel out a little bit since it doesn't seem to be doing so hot on it's own. Also having a CSV file that can just be opened instead of having to use the import menu command is always nice. I already tried changing the extension to ".tsv" with no luck.
    • phemmer
      phemmer over 12 years
      I think your desired output is a bit off. You have a lot of empty fields in there (the empty quotes).
  • phemmer
    phemmer over 12 years
    Nice clean solution. Thought it would end up a lot uglier than that, but then I'm not a awk person :-)
  • cwd
    cwd over 12 years
    so do i save this into a file such as ./convert.sh, chmod +x, and then pipe the input into it so that it will print the output? I'm getting an error: /usr/bin/awk: syntax error at source line 1 context is >>> . <<< /convert.sh.
  • Kevin
    Kevin over 12 years
    @cwd You can save it in a file, I'd suggest convert.awk to indicate it's an awk script and not a bash one. I updated the post with the full command line, and note that I added a -f flag I had forgotten to the first line (that tells it to interpret the file as commands).
  • Stylex
    Stylex over 12 years
    How did you get the ctrl + v i to show up like that?
  • jw013
    jw013 over 12 years
    @burhan The syntax is <kbd>text</kbd>.
  • Arcege
    Arcege over 12 years
    Or in one line: sed -e 's/^I/","/g' -e 's/.*/"&"/' or even shorter sed -e 's/^I/","/g;s/.*/"&"/'.
  • phemmer
    phemmer over 12 years
    Scalar::Util isnt an outside module, it comes with standard perl.
  • Jim
    Jim over 12 years
    True. Apologies for poorly wording my intended comment. Thank you for the correction.
  • Paul_Pedant
    Paul_Pedant about 4 years
    The one-liner version treats any whitespace as a field separator, not just tabs. Needs a -F'\t' before the -V.