How can I handle raw binary data in a bash pipe?

32,611

Solution 1

Your way is adding line breaks to every thing that it write in space of whatever separator ($IFS) is using to split up the read. Instead of breaking it up into newlines just take the whole thing and pass it along. You can reduce the entire bit of code above to this:

 cat - > $file

You don't need the truncate bit, this will truncate and write the whole STDIN stream out to it.

Edit: If you are using zsh you can just use > $file in place of the cat. You are redirecting to a file and truncating it, but if there is anything hanging out there waiting for something to accept STDIN it will get read at that point. I think you can do something like this with bash but you would have to set some special mode.

Solution 2

To read a text file literally, don't use plain read, which processes the output in two ways:

  • read interprets \ as an escape character; use read -r to turn this off.
  • read splits into words on characters in $IFS; set IFS to an empty string to turn this off.

The usual idiom to process a text file line by line is

while IFS= read -r line; do …

For an explanation of this idiom, see Why is while IFS= read used so often, instead of IFS=; while read..?.

To write a string literally, don't just use plain echo, which processes the string in two ways:

  • On some shells, echo processes backslash escapes. (On bash, it depends whether the xpg_echo option is set.)
  • A few strings are treated as options, e.g. -n or -e (the exact set depends on the shell).

A portable way of printing a string literally is with printf. (There's no better way in bash, unless you know your input doesn't look like an option to echo.) Use the first form to print the exact string, and the second form if you want to add a newline.

printf %s "$line"
printf '%s\n' "$line"

This is only suitable for processing text, because:

  • Most shells will choke on null characters in the input.
  • When you've read the last line, you have no way to know if there was a newline at the end or not. (Some older shells may have bigger trouble if the input doesn't end with a newline.)

You can't process binary data in the shell, but modern versions of utilities on most unices can cope with arbitrary data. To pass all input through to the output, use cat. Going on a tangent, echo -n '' is a complicated and non-portable way of doing nothing; echo -n would be just as good (or not depending on the shell), and : is simpler and fully portable.

: >| "$file"
cat >>"$file"

or, simpler,

cat >|"$file"

In a script, you usually don't need to use >| since noclobber is off by default.

Share:
32,611
David Souther
Author by

David Souther

Software Craftsman. SpaceXer. Xoogler. Strong types, weakly enforced. Kitties go Meow! RPG Polyglot. Vi, i3.

Updated on September 18, 2022

Comments

  • David Souther
    David Souther over 1 year

    I have a bash function that takes a file as a parameter, verifies the file exists, then writes anything coming off stdin to the file. The naive solution works fine for text, but I am having problems with arbitrary binary data.

    echo -n '' >| "$file" #Truncate the file
    while read lines
    do  # Is there a better way to do this? I would like one...
        echo $lines >> "$file"
    done
    
  • David Souther
    David Souther almost 13 years
    I couldn't get the stdin redirect example to work, but changing the cat example to >| (I have noclobber set) works like a charm. Thanks for making my day ^.^
  • rozcietrzewiacz
    rozcietrzewiacz almost 13 years
    +1 for the cat-less version. Always avoid useless cats ;)
  • tcoolspy
    tcoolspy almost 13 years
    @rozcietrzewiacz: True, except it was an afterthought and I was wrong. This might not be a useless use of cat. The only thing you might be able to do is > $file. This only works as the first thing that looks for stdin in the parent shell script. Basically all of David's code can be reduced to a single character, but I think the cat - is more elegant and less trouble prode because it's understood on sight.
  • Michael Mrozek
    Michael Mrozek almost 13 years
    Sometimes I string four or five cats together, just to annoy UUOC fanatics
  • tcoolspy
    tcoolspy almost 13 years
    @MichaelMrozek: Sometimes I name my data files cat just so people who insist on using it necessarily have to do mental gymnastics to read the code. Named pipes are also good targets.
  • rozcietrzewiacz
    rozcietrzewiacz almost 13 years
    @Michael I didn't know people do get so dogmatic about it :D Hmm, maybe I did get syndromes of this disease. Caleb's example seemed a very simple and thus appealed to me. As it was wrong - well, then it seems a good use of cat after all. I love the way one can learn such stuff here.
  • David Souther
    David Souther almost 13 years
    @rozcietrzewiacz Cats are just fine. In many cases, I think they make code more readable. By putting the cat first in the pipe, then the grep, I see very clearly what is being worked on. It agrees with our natural language, as well: cat file | grep foo is subject-verb, whereas grep foo file is verb file (and grep foo < file is obtuse).
  • David Souther
    David Souther almost 13 years
    @caleb Yes. To both :)
  • David Souther
    David Souther almost 13 years
    thanks for pointing out xpg_echo, that's actually a problem I was having somewhere else in my code and didn't even realize. Re noclobber, I am in the habit of turning it on in my bashrc.
  • tcoolspy
    tcoolspy almost 13 years
    @DavidSouther: I disagree with your analysis of natural English language. In English, commands often start with verbs. In natural language grep foo file could be "Go look for bar in the third drawer of my file cabinet." and grep foo < file could be "Find the paper labeled bar in this stack of papers." The subject is an implied "you" with is left off since you are speaking to the shell.
  • David Souther
    David Souther almost 13 years
    @Caleb I'd buy that. It comes down to how the programmer thinks about the problem, and how the maintainer expects to read the solution.
  • tcoolspy
    tcoolspy almost 13 years
    @DavidSouther: True. Any syntax you are not accustomed to using is going to be more difficult to understand than one you are, I just wouldn't say command args < data is inherently obtuse.