How can I handle raw binary data in a bash pipe?
Solution 1
Your way is adding line breaks to every thing that it write in space of whatever separator ($IFS
) is using to split up the read. Instead of breaking it up into newlines just take the whole thing and pass it along. You can reduce the entire bit of code above to this:
cat - > $file
You don't need the truncate bit, this will truncate and write the whole STDIN stream out to it.
Edit: If you are using zsh you can just use > $file
in place of the cat. You are redirecting to a file and truncating it, but if there is anything hanging out there waiting for something to accept STDIN it will get read at that point. I think you can do something like this with bash but you would have to set some special mode.
Solution 2
To read a text file literally, don't use plain read
, which processes the output in two ways:
-
read
interprets\
as an escape character; useread -r
to turn this off. -
read
splits into words on characters in$IFS
; setIFS
to an empty string to turn this off.
The usual idiom to process a text file line by line is
while IFS= read -r line; do …
For an explanation of this idiom, see Why is while IFS= read
used so often, instead of IFS=; while read..
?.
To write a string literally, don't just use plain echo
, which processes the string in two ways:
- On some shells,
echo
processes backslash escapes. (On bash, it depends whether thexpg_echo
option is set.) - A few strings are treated as options, e.g.
-n
or-e
(the exact set depends on the shell).
A portable way of printing a string literally is with printf
. (There's no better way in bash, unless you know your input doesn't look like an option to echo
.) Use the first form to print the exact string, and the second form if you want to add a newline.
printf %s "$line"
printf '%s\n' "$line"
This is only suitable for processing text, because:
- Most shells will choke on null characters in the input.
- When you've read the last line, you have no way to know if there was a newline at the end or not. (Some older shells may have bigger trouble if the input doesn't end with a newline.)
You can't process binary data in the shell, but modern versions of utilities on most unices can cope with arbitrary data. To pass all input through to the output, use cat
. Going on a tangent, echo -n ''
is a complicated and non-portable way of doing nothing; echo -n
would be just as good (or not depending on the shell), and :
is simpler and fully portable.
: >| "$file"
cat >>"$file"
or, simpler,
cat >|"$file"
In a script, you usually don't need to use >|
since noclobber
is off by default.
David Souther
Software Craftsman. SpaceXer. Xoogler. Strong types, weakly enforced. Kitties go Meow! RPG Polyglot. Vi, i3.
Updated on September 18, 2022Comments
-
David Souther over 1 year
I have a bash function that takes a file as a parameter, verifies the file exists, then writes anything coming off stdin to the file. The naive solution works fine for text, but I am having problems with arbitrary binary data.
echo -n '' >| "$file" #Truncate the file while read lines do # Is there a better way to do this? I would like one... echo $lines >> "$file" done
-
David Souther almost 13 yearsI couldn't get the stdin redirect example to work, but changing the cat example to >| (I have noclobber set) works like a charm. Thanks for making my day ^.^
-
rozcietrzewiacz almost 13 years+1 for the cat-less version. Always avoid useless cats ;)
-
tcoolspy almost 13 years@rozcietrzewiacz: True, except it was an afterthought and I was wrong. This might not be a useless use of cat. The only thing you might be able to do is
> $file
. This only works as the first thing that looks for stdin in the parent shell script. Basically all of David's code can be reduced to a single character, but I think thecat -
is more elegant and less trouble prode because it's understood on sight. -
Michael Mrozek almost 13 yearsSometimes I string four or five
cat
s together, just to annoy UUOC fanatics -
tcoolspy almost 13 years@MichaelMrozek: Sometimes I name my data files
cat
just so people who insist on using it necessarily have to do mental gymnastics to read the code. Named pipes are also good targets. -
rozcietrzewiacz almost 13 years@Michael I didn't know people do get so dogmatic about it :D Hmm, maybe I did get syndromes of this disease. Caleb's example seemed a very simple and thus appealed to me. As it was wrong - well, then it seems a good use of cat after all. I love the way one can learn such stuff here.
-
David Souther almost 13 years@rozcietrzewiacz Cats are just fine. In many cases, I think they make code more readable. By putting the cat first in the pipe, then the grep, I see very clearly what is being worked on. It agrees with our natural language, as well: cat file | grep foo is subject-verb, whereas grep foo file is verb file (and grep foo < file is obtuse).
-
David Souther almost 13 years@caleb Yes. To both :)
-
David Souther almost 13 yearsthanks for pointing out xpg_echo, that's actually a problem I was having somewhere else in my code and didn't even realize. Re noclobber, I am in the habit of turning it on in my bashrc.
-
tcoolspy almost 13 years@DavidSouther: I disagree with your analysis of natural English language. In English, commands often start with verbs. In natural language
grep foo file
could be "Go look for bar in the third drawer of my file cabinet." andgrep foo < file
could be "Find the paper labeled bar in this stack of papers." The subject is an implied "you" with is left off since you are speaking to the shell. -
David Souther almost 13 years@Caleb I'd buy that. It comes down to how the programmer thinks about the problem, and how the maintainer expects to read the solution.
-
tcoolspy almost 13 years@DavidSouther: True. Any syntax you are not accustomed to using is going to be more difficult to understand than one you are, I just wouldn't say
command args < data
is inherently obtuse.