How do I convert a csv to a binary file with a bash command?
Solution 1
I have a
csv
file which is just a simple comma-separated list of numbers. I want to convert thiscsv
file into a binary file [...]I was moving along the lines of
big endian 32-bit float
to keep things simple.
Not sure how to do it in pure bash
(actually doubt that it is doable, since float as binary is non-standard conversion).
But here it is with a simple Perl one-liner:
$ cat example1.csv
1.0
2.1
3.2
4.3
$ cat example1.csv | perl -ne 'print pack("f>*", split(/\s*,\s*/))' > example1.bin
$ hexdump -C < example1.bin
00000000 3f 80 00 00 40 06 66 66 40 4c cc cd 40 89 99 9a |[email protected]@L..@...|
00000010
It uses the Perl's pack function with f
to convert floats to binary, and <
to convert them into BE. (I have also added the split in case of multiple numbers per CSV line.)
P.S. The command to convert to integers to 16-bit shorts with native endianness:
perl -ne 'print pack("s*", split(/\s*,\s*/))'
Use "s>*"
for BE, or "s<*"
for LE, instead of the "s*"
.
P.P.S. If it is audio data, you can also check the sox
tool. Haven't used it in ages, but IIRC it could convert anything PCM-like from literally any format to any format, while also applying effects.
Solution 2
I would recommend Python over bash
. For this particular task, it's simpler/saner IMO.
#!/usr/bin/env python
import array
with open('input.csv', 'rt') as f:
text = f.read()
entries = text.split(',')
values = [int(x) for x in entries]
# do a scalar here: if your input goes from [-100, 100] then
# you may need to translate/scale into [0, 2^16-1] for
# 16-bit PCM
# e.g.:
# values = [(val * scale) for val in values]
with open('output.pcm', 'wb') as out:
pcm_vals = array.array('h', values) # 16-bit signed
pcm_vals.tofile(out)
You could also use Python's wave
module instead of just writing raw PCM.
Here's how the example above works:
$ echo 1,2,3,4,5,6,7 > input.csv
$ ./so_pcm.py
$ xxd output.pcm
0000000: 0100 0200 0300 0400 0500 0600 0700 ..............
xxd
shows the binary values. It used my machine's native endianness (little).
JVE999
Updated on June 08, 2022Comments
-
JVE999 almost 2 years
I have a
csv
file which is just a simple comma-separated list of numbers. I want to convert thiscsv
file into a binary file (just a sequence of bytes, with each interpreted number being a number from thecsv
file).The reason I am doing this is to be able to import audio data from a spreadsheet of values. In my import (I am using audacity), I have a few formats to choose from for the binary file:
Encoding: Signed 8, 24, 16, or 32 bit PCM Unsigned 8 bit PCM 32 bit or 64 bit float U-Law A-Law GSM 6.10 12, 16, or 24 bit DWVW VOX ADPCM Byte Order: No endianness Big endian Little endian
I was moving along the lines of
big endian 32-bit float
to keep things simple. I wanted to keep things as simple as possible, so I was thinkingbash
would be the optimal tool. -
Andreas Louv almost 8 yearsUseless use of cat:
perl -pe '$_ = pack("f>*", split(/\s*,\s*/))' example1.csv > example1.bin
-
Dummy00001 almost 8 years@andlr, Yes, useless. But more readable: input file - in front, output file - at the end.
-
Andreas Louv almost 8 yearsYou can just use
<input.txt perl -pe 'code' >output.txt
-
Dummy00001 almost 8 years@andlrc, that syntax is just crazy. +1