How do I convert a csv to a binary file with a bash command?

11,467

Solution 1

I have a csv file which is just a simple comma-separated list of numbers. I want to convert this csv file into a binary file [...]

I was moving along the lines of big endian 32-bit float to keep things simple.

Not sure how to do it in pure bash (actually doubt that it is doable, since float as binary is non-standard conversion).

But here it is with a simple Perl one-liner:

$ cat example1.csv
1.0
2.1
3.2
4.3

$ cat example1.csv | perl -ne 'print pack("f>*", split(/\s*,\s*/))' > example1.bin

$ hexdump -C < example1.bin
00000000  3f 80 00 00 40 06 66 66  40 4c cc cd 40 89 99 9a  |[email protected]@L..@...|
00000010

It uses the Perl's pack function with f to convert floats to binary, and < to convert them into BE. (I have also added the split in case of multiple numbers per CSV line.)

P.S. The command to convert to integers to 16-bit shorts with native endianness:

perl -ne 'print pack("s*", split(/\s*,\s*/))'

Use "s>*" for BE, or "s<*" for LE, instead of the "s*".

P.P.S. If it is audio data, you can also check the sox tool. Haven't used it in ages, but IIRC it could convert anything PCM-like from literally any format to any format, while also applying effects.

Solution 2

I would recommend Python over bash. For this particular task, it's simpler/saner IMO.

#!/usr/bin/env python

import array

with open('input.csv', 'rt') as f:
    text = f.read()
    entries = text.split(',')
    values = [int(x) for x in entries]
    # do a scalar here: if your input goes from [-100, 100] then
    #   you may need to translate/scale into [0, 2^16-1] for
    #   16-bit PCM
    # e.g.:
    #   values = [(val * scale) for val in values]

with open('output.pcm', 'wb') as out:
    pcm_vals = array.array('h', values) # 16-bit signed
    pcm_vals.tofile(out)

You could also use Python's wave module instead of just writing raw PCM.

Here's how the example above works:

$ echo 1,2,3,4,5,6,7 > input.csv
$ ./so_pcm.py
$ xxd output.pcm
0000000: 0100 0200 0300 0400 0500 0600 0700       ..............

xxd shows the binary values. It used my machine's native endianness (little).

Share:
11,467
JVE999
Author by

JVE999

Updated on June 08, 2022

Comments

  • JVE999
    JVE999 almost 2 years

    I have a csv file which is just a simple comma-separated list of numbers. I want to convert this csv file into a binary file (just a sequence of bytes, with each interpreted number being a number from the csv file).

    The reason I am doing this is to be able to import audio data from a spreadsheet of values. In my import (I am using audacity), I have a few formats to choose from for the binary file:

    Encoding:
    Signed 8, 24, 16, or 32 bit PCM
    Unsigned 8 bit PCM
    32 bit or 64 bit float
    U-Law
    A-Law
    GSM 6.10
    12, 16, or 24 bit DWVW
    VOX ADPCM
    
    Byte Order:
    No endianness
    Big endian
    Little endian
    

    I was moving along the lines of big endian 32-bit float to keep things simple. I wanted to keep things as simple as possible, so I was thinking bash would be the optimal tool.

  • Andreas Louv
    Andreas Louv almost 8 years
    Useless use of cat: perl -pe '$_ = pack("f>*", split(/\s*,\s*/))' example1.csv > example1.bin
  • Dummy00001
    Dummy00001 almost 8 years
    @andlr, Yes, useless. But more readable: input file - in front, output file - at the end.
  • Andreas Louv
    Andreas Louv almost 8 years
    You can just use <input.txt perl -pe 'code' >output.txt
  • Dummy00001
    Dummy00001 almost 8 years
    @andlrc, that syntax is just crazy. +1