Loop through binary data chunks from stdin in Bash

5,502

bash can't hold binary data in its variables. It's already bad enough to process text with shell loops, it would be even worse for processing binary data. The shell is the tool to run other tools.

Also note that the read buit-in command reads characters, not bytes.

Also, dd does one read system call, so a dd bs=77 count=1 won't necessarily read 77 bytes especially if stdin is a pipe (the GNU implementation of dd has iflag=fullblock for that).

Here, you want to use a data processing programming language like perl:

In perl:

perl -ne 'BEGIN{$/=\77}
  print "Do something with the 77 byte long <$_> record\n"'

With GNU awk:

LC_ALL=C awk -vRS='.{,77}' '{print "the record is in <" RT ">"}'

If you want to use a shell, your best option is probably zsh which is the only one that can store binary data in its variables:

while LC_ALL=C IFS= read -ru0 -k77 record; do
  print -r -- "you may only call builtins with $record
    anyway since you can't pass NUL bytes in arguments
    to an external command"
done

If all you want to do is pass each chunk as stdin to a new invocation of some command, then you can use GNU split and its --filter option:

split -b 77 --filter='some command'

--filter starts a new shell to evaluate some command for each chunk. Unless your sh does the optimisation already by itself, you can do:

split -b 77 --filter='exec some command'

To save a fork.

Using dd, you could parse its stderr output to find out the end of input. You'd need the GNU specific iflag=fullblock as well:

while
  {
    report=$({
      LC_ALL=C dd bs=77 iflag=fullblock count=1 2>&3 |
        some command >&4 3>&- 4>&-
    } 3>&1)
  } 4>&1
  [ "${report%%+*}" -eq 1 ]
do
  : nothing
done

If the input size is multiple of 77 though, some command will be run an extra time with an empty input.

Share:
5,502

Related videos on Youtube

reith
Author by

reith

Updated on September 18, 2022

Comments

  • reith
    reith over 1 year

    I'm looking for something like while IFS= read -r -n $length str; do ... done but for binary data. Is it possible to do this using dd or other tools? Is there some technique to make these tools able to see when pipe -which stdin is actually read from it- is closed and terminate loop?

    Currently I encode and decode binary data and use read but It's so slow.. (base64 | while read -r -n77 str; do echo $str | base64 -d; ... done)

    • Admin
      Admin about 10 years
      bash doesn't support storing binary data in its variables. You'll need a shell that does like zsh, or better use a programming language like perl.
    • Admin
      Admin about 10 years
      @StephaneChazelas I think bash can store raw data in its variables except for NUL byte.. Actually problem is detecting when pipe was closed. I can read that data with dd or any tool that is able to do magic with binary data. If I can use something like select() or poll() directly in Bash, I can use dd to solve problem.
  • Stéphane Chazelas
    Stéphane Chazelas about 10 years
    @xin, you can call perl from bash. bash is not part of coreutils. What about gawk?
  • phemmer
    phemmer about 10 years
    Your second paragraph is incorrect. It is true the read built-in in bash reads characters, not bytes. But the read system call does read bytes. Thus dd bs=77 count=1 will read 77 bytes, not 77 characters.
  • reith
    reith about 10 years
    @StephaneChazelas Thanks for updating. split --filter trick was very helpful, Although I should change my code since it expects a command.. Are you sure your dd solution works? I've fixed test error ([ -eq]) but it does just one iterate..
  • reith
    reith about 10 years
    @StephaneChazelas line 8: [: : integer expression expected
  • Stéphane Chazelas
    Stéphane Chazelas about 10 years
    @xin, we store the stderr of dd into that $report, so we can parse the 1+0 records in... Possibly you got an error message as well. What happens when you run that dd command manually?
  • reith
    reith about 10 years
    @StephaneChazelas Your dd loop sometimes invokes some command for once, sometimes twice and sometimes more. Of course for same arguments and same input. By running dd manually I get some command called with first block in stdin.