How do I use null bytes in Bash?

33,529

Bash uses C-style strings internally, which are terminated by null bytes. This means that a Bash string (such as the value of a variable, or an argument to a command) can never actually contain a null byte. For example, this mini-script:

foobar=$'foo\0bar'    # foobar='foo' + null byte + 'bar'
echo "${#foobar}"     # print length of $foobar

actually prints 3, because $foobar is actually just 'foo': the bar comes after the end of the string.

Similarly, echo $'foo\0bar' just prints foo, because echo doesn't know about the \0bar part.

As you can see, the \0 sequence is actually very misleading in a $'...'-style string; it looks like a null byte inside the string, but it doesn't end up working that way. In your first example, your read command has -d $'\0'. This works, but only because -d '' also works! (That's not an explicitly documented feature of read, but I suppose it works for the same reason: '' is the empty string, so its terminating null byte comes immediately. -d delim is documented as using "The first character of delim", and I guess that even works if the "first character" is past the end of the string!)

But as you know from your find example, it is possible for a command to print out a null byte, and for that byte to be piped to another command that reads it as input. No part of that relies on storing a null byte in a string inside Bash. The only problem with your second example is that we can't use $'\0' in an argument to a command; echo "$file"$'\0' could happily print the null byte at the end, if only it knew that you wanted it to.

So instead of using echo, you can use printf, which supports the same sorts of escape sequences as $'...'-style strings. That way, you can print a null byte without having to have a null byte inside a string. That would look like this:

for file in * ; do printf '%s\0' "$file" ; done \
  | while IFS= read -r -d '' ; do echo "$REPLY" ; done

or simply this:

printf '%s\0' * \
  | while IFS= read -r -d '' ; do echo "$REPLY" ; done

(Note: echo actually also has an -e flag that would let it process \0 and print a null byte; but then it would also try to process any special sequences in your filename. So the printf approach is more robust.)


Incidentally, there are some shells that do allow null bytes inside strings. Your example works fine in Zsh, for example (assuming default settings). However, regardless of your shell, Unix-like operating systems don't provide a way to include null bytes inside arguments to programs (since program arguments are passed as C-style strings), so there will always be some limitations. (Your example can work in Zsh only because echo is a shell builtin, so Zsh can invoke it without relying on the OS support for invoking other programs. If you used command echo instead of echo, so that it bypassed the builtin and used the standalone echo program on the $PATH, you'd see the same behavior in Zsh as in Bash.)

Share:
33,529

Related videos on Youtube

ruakh
Author by

ruakh

HTML programmer. Skilled with Java scripts; over ten years' experience in Node.js. Some experience in C or C++. (O.K., not really. If you want to know about me for real, just read through some of my answers and comments on Stack Overflow. But if this profile made just one person wince, it was totally worth it!   :-) By the way — any code that I post here, is because I intend for others to be able to use it, and not necessarily just under the terms of CC BY-SA 3.0. I can't say categorically that I cross-license all my code here into the public domain, because code in my answers is frequently a "derivative work" of code from the question, and of course I can't re-license OPs' work without their permission; but suffice it to say, if you take code that I post here and adapt it for use in a non-trivial program of which my code is a trivial or incidental piece, then please don't worry about my rights under CC BY-SA.

Updated on September 18, 2022

Comments

  • ruakh
    ruakh almost 2 years

    I've read that, since file-paths in Bash can contain any character except the null byte (zero-valued byte, $'\0'), that it's best to use the null byte as a separator. For example, if the output of find will be sent to another program, it's recommended to use the -print0 option (for versions of find that have it).

    But although something like this works fine (printing file-paths separated by newlines — don't worry, this is just a demonstration, I'm not actually doing it in real scripts):

    find -print0 \
      | while IFS= read -r -d $'\0' ; do echo "$REPLY" ; done
    

    something like this does not work:

    for file in * ; do echo -n "$file"$'\0' ; done \
      | while IFS= read -r -d $'\0' ; do echo "$REPLY" ; done
    

    When I try just the for-loop part, I find that it just prints all the filenames together, without the null byte in between.

    Why is this? What's going on?

  • CMCDragonkai
    CMCDragonkai over 7 years
    Why is IFS set to nothing if -d '' already means to delimit on \0? I found an explanation here: stackoverflow.com/questions/8677546/…