Loop through binary data chunks from stdin in Bash
bash
can't hold binary data in its variables. It's already bad enough to process text with shell loops, it would be even worse for processing binary data. The shell is the tool to run other tools.
Also note that the read
buit-in command reads characters, not bytes.
Also, dd
does one read
system call, so a dd bs=77 count=1
won't necessarily read 77 bytes especially if stdin is a pipe (the GNU implementation of dd
has iflag=fullblock
for that).
Here, you want to use a data processing programming language like perl
:
In perl
:
perl -ne 'BEGIN{$/=\77}
print "Do something with the 77 byte long <$_> record\n"'
With GNU awk
:
LC_ALL=C awk -vRS='.{,77}' '{print "the record is in <" RT ">"}'
If you want to use a shell, your best option is probably zsh
which is the only one that can store binary data in its variables:
while LC_ALL=C IFS= read -ru0 -k77 record; do
print -r -- "you may only call builtins with $record
anyway since you can't pass NUL bytes in arguments
to an external command"
done
If all you want to do is pass each chunk as stdin to a new invocation of some command
, then you can use GNU split
and its --filter
option:
split -b 77 --filter='some command'
--filter
starts a new shell to evaluate some command
for each chunk. Unless your sh
does the optimisation already by itself, you can do:
split -b 77 --filter='exec some command'
To save a fork.
Using dd
, you could parse its stderr output to find out the end of input. You'd need the GNU specific iflag=fullblock
as well:
while
{
report=$({
LC_ALL=C dd bs=77 iflag=fullblock count=1 2>&3 |
some command >&4 3>&- 4>&-
} 3>&1)
} 4>&1
[ "${report%%+*}" -eq 1 ]
do
: nothing
done
If the input size is multiple of 77 though, some command
will be run an extra time with an empty input.
Related videos on Youtube
reith
Updated on September 18, 2022Comments
-
reith over 1 year
I'm looking for something like
while IFS= read -r -n $length str; do ... done
but for binary data. Is it possible to do this usingdd
or other tools? Is there some technique to make these tools able to see when pipe -which stdin is actually read from it- is closed and terminate loop?Currently I encode and decode binary data and use
read
but It's so slow.. (base64 | while read -r -n77 str; do echo $str | base64 -d; ... done
)-
Admin about 10 years
bash
doesn't support storing binary data in its variables. You'll need a shell that does likezsh
, or better use a programming language likeperl
. -
Admin about 10 years@StephaneChazelas I think bash can store raw data in its variables except for
NUL
byte.. Actually problem is detecting when pipe was closed. I can read that data withdd
or any tool that is able to do magic with binary data. If I can use something likeselect()
orpoll()
directly in Bash, I can usedd
to solve problem.
-
-
Stéphane Chazelas about 10 years@xin, you can call
perl
frombash
.bash
is not part ofcoreutils
. What aboutgawk
? -
phemmer about 10 yearsYour second paragraph is incorrect. It is true the
read
built-in in bash reads characters, not bytes. But theread
system call does read bytes. Thusdd bs=77 count=1
will read 77 bytes, not 77 characters. -
reith about 10 years@StephaneChazelas Thanks for updating.
split --filter
trick was very helpful, Although I should change my code since it expects a command.. Are you sure yourdd
solution works? I've fixed test error ([ -eq]
) but it does just one iterate.. -
reith about 10 years@StephaneChazelas
line 8: [: : integer expression expected
-
Stéphane Chazelas about 10 years@xin, we store the stderr of
dd
into that$report
, so we can parse the1+0 records in
... Possibly you got an error message as well. What happens when you run that dd command manually? -
reith about 10 years@StephaneChazelas Your dd loop sometimes invokes
some command
for once, sometimes twice and sometimes more. Of course for same arguments and same input. By runningdd
manually I getsome command
called with first block in stdin.