Using/reading/writing the null and escape characters

34,849

Solution 1

Printing the null character

On many recent shells you can write null character with dollar single quotes format $'\0', hexadecimal format \x00, unicode format \u0000 or \U00000000, or just as you tried with octal: '\0'. The point is that the command has to understand what to do with backslash-escaped characters. For example in case of echo usually one needs to add -e option and in case of printf that would be %b.

Let's check if it works:

$ echo -ne '\0'
$

So produces nothing, just like echo -ne '', similar

$ printf '%b' '\0'
$

Let's add some characters around (I will stick with printf '%b' from now on as more robust, but similar effect is with echo -ne):

$ printf '%b' a'\0'b
ab

Only two characters were printed, where did the null go?

$ printf '%b' a'\0'b | wc -c
3

Let's compare it with a''b:

$ printf '%b' a''b | wc -c
2

Last more check that we really print null character before trying to create file, let's pass the printed value to the command which will throw the error, like xargs:

$ printf '%b' a'\0'b | xargs echo
xargs: Warning: a NUL character occurred in the input.  It cannot be 
passed through in the argument list.  Did you mean to use the --null option?
a

Notice how only a was printed at the end. Of course xargs -0 works fine:

$ printf '%b' a'\0'b | xargs -0 echo
a b

Creating the file with null?

Now let's try to create file with null character:

$ touch $'\0'
touch: cannot touch ‘’: No such file or directory
$ mkdir $'\0'
mkdir: cannot create directory ‘’: No such file or directory

# let's try another approach - using printf in command substitution:
$ touch "$(printf '%b' '\0')"
touch: cannot touch ‘’: No such file or directory
$ mkdir "$(printf '%b' '\0')"
mkdir: cannot create directory ‘’: No such file or directory

The result is exactly the same as in touch '', it seems null is just ignored all together. What if we skip double quotes around command substitution?

$ touch $(printf '%b' '\0')
touch: missing file operand
Try 'touch --help' for more information.
$ mkdir $(printf '%b' '\0')
mkdir: missing operand
Try 'mkdir --help' for more information.

This is the same situation as touch/mkdir without arguments at all. Yet another result is if we surround null with text:

$ touch "$(printf '%b' a'\0'b)"
$ ls
a   # in zsh
ab  # in bash

One can also try to redirect standard output to $'\0' but all one gets is different kind of error.

Solution 2

Single/double quotes vs. backslash: single quotes and backslashes are equivalent in quoting power. It's a lot more convenient to use single quotes to quote a long string with spaces, tabs, newlines, ()[]*$><?|{}~&;"`^!# and probably other characters I'm forgetting. But you could achieve exactly equivalent results with just backslashes (beware of the overloading of backslashes within backticks (`...`) though)

Double quotes are unique, though. $ expands inside double-quotes, but not single. "$foo" expands foo, but protects the expanded result from word-splitting and glob expansion.

http://mywiki.wooledge.org/BashFAQ might be a good place to start. The bash manual doesn't spend much time on how to use all the features it describes, just how they work individually.


It's literally not possible to pass a string containing a zero-byte as a command line argument, or to a system call. The ABI (application binary interface) that specifies exactly how data is passed between processes and the kernel uses C strings for everything (except binary data), including command line arguments and file / path arguments to system calls. C strings are character arrays where the end-of-string is marked by a zero-byte. There is no way to "escape" a zero byte to indicate that it's not the end of a string.

Any attempt to do something like touch $'foo\0bar' would just result in touch seeing its argument list as

argv[0] = "/bin/touch";
argv[1] = "foo";

Even though sitting in memory, argv[1] = "foo\0bar\0", the first \0 marks the end of the string. Actually, "foo\0bar\0" wouldn't make it as far as the new process's argv. It wouldn't make it out of the argv array in the exevce(2) system call that ran touch.

And even if you wrote a C or perl program with character arrays / strings containing null bytes, passing them to a system call like open(2) would cause the same interpretation of the string by kernel. System calls that need to handle arbitrary data, like read(2) and write(2), take a length argument as well as a pointer to the buffer.


It's not even possible to do much of anything with null bytes with bash. As jimmij points out, the bash syntax for writing a string literal with escape-sequence processing is $'string', but writing a \0 in your string literal acts as a string terminator inside bash. I guess this means bash stores strings internally as C strings, not with an explicit length.

str=$'foo\0bar'
echo "${#str}"   # 3, showing that bash isn't even storing it in a variable.
echo "$str" | wc -c   # 4. wouldn't work even if ${#str} did: echo's cmdline would eat it
wc -c <<< $'foo\0bar'   # 4 (includes a newline)

So we can't use this syntax to send a null byte anywhere. We'd have to use tr or something.


However, bash printf does have some limited support for \0. The %b conversion processes backslash escapes, including \0. And printf already processes such escapes in its format string.

  • printf '\0' prints a zero byte. Pipe it into hexdump -C to confirm.
  • printf '%s\0%s' foo bar | hexdump -C writes foo.bar (where . is a NUL byte) to stdout. Note that \0 inside single or double quotes doesn't expand on its own; only $'\0' quoting would expand it before printf, which would make it act as a terminator.
  • printf '%b' 'foo\0bar' does the same thing, but more complicated.

Solution 3

As you know $var will lead to interpreting the variable. The reason for why the different options work vary:

  • the escape ( \$var ): do NOT interpret the very next character as shell functional character. BUT in some cases: give a special meaning (e.g. \n for newline in some contexts)
  • the single quotes ( '$var' ): everything in single quotes will strictly be nothing but the very string they contain
  • the separation of $ ( "$"var ): a single $ will not be interpreted, by placing it in double quotes it is separated from the var part and no interpretation happens
  • the double quotes ( "$var" ): will actually allow to interpret the variable var: mkdir "$var" DOES NOT WORK AND IS NOT THE SAME AS THE OTHERS! Please check again! However anything contained inside the quotes will be treated as a single sting. Especially useful when you have special characters in file names, e.g. creating a file with a space in its name: touch "a b" -> single file a b created/updated, touch a b -> two files a and b created/updated.

Other special operators are: redirections and 'heres' > >> < << <<<, process operators & |, boolean operators || &&, and command separators ; and grouping with parentheses ( ), sometimes - but then separated or as first character - the - for stdin or options of commands. Also there is the test command [ and the quotes we already used ' ", as well as recalling previous commands with an exclamation mark ! or comments with a hash #, and the wildcards asterisk * and question mark ? for multiple and single characters. Also note that the current and parent dir are . and .., while the home is set at ~/. I.e. the characters ; & | > < - [ \ ' " ( ) # * ! ? . ~ ^ { }, `, newline, space, tab (and the other blank characters in single-byte locales) should be looked at twice but not all of them are "dangerous" on the same level. I hope I forgot none as there are quite a bunch of them.

Solution 4

In filenames, '/' is prohibited because it is a directory separator. That is the only reason. And if you hand-edit a filesystem, you might even be able to create a file with '/' in the name (not recommended, as you won't be able to do much with it).

The NUL character cannot be used as part of the filename, because the relevant system calls use C language string-passing conventions, and NUL is the terminator for such a string. So it cannot be interpreted as part of the name.

Note that creating a file called \0 is not the same as creating one containing a NUL - the former is a filename containing the two characters '\' and '0'.

Share:
34,849

Related videos on Youtube

Mohamed Medhat Sallam
Author by

Mohamed Medhat Sallam

Linux for me is a to enjoy life. Not just a kernel. echo "Thank you Linus trovalds"

Updated on September 18, 2022

Comments

  • Mohamed Medhat Sallam
    Mohamed Medhat Sallam almost 2 years

    I know that a filename in Linux has no restriction whatsoever except for two characters '/' and '\0'. I know that '/' is prohibited because it is a directory separator but is there any other reason ?

    Also on my terminal I can create a file or a directory with the name \0. So I wonder how to write the null character correctly because obviously it shouldn't allow me to have a filename with null in it

    mkdir '\0' will create a directory named \0

    One more question, If I want to include $ in my filename, I can use the backslash

    mkdir \$myfile will create a directory named $myfile

    However, I can do the same if I surround the dollar sign with single quotes and double quotes

    mkdir \$myfile is the same as mkdir '$'myfile is the same as mkdir "$"myfile is the same as mkdir '$myfile' is the same as mkdir "$myfile"

    So my question is, Are the single and double quotes a substitution for the escape backslash character?

    Also what other characters need escaping in bash besides $, (space) and backslash ?

    • Admin
      Admin almost 9 years
      mkdir '\0' creates a two-character file; this is not the NUL character you were looking for ('\0' is from C).
  • Admin
    Admin almost 9 years
    Using an hex viewer (like xxd) really helps. For example: printf '%b' a'\0'b|xxd. Of course, the simpler and almost always present od could also do the job: printf '%b' a'\0'b|od -An -tx1
  • Admin
    Admin almost 9 years
    The problem in bash with touch "$(printf '%b' a'\0'b)" is the 'command substitution'. This echo "$(printf '%b' a'\0'b)"|xxd prints 6162, the null is removed by the 'command substitution'. Instead, this: printf '%b' a'\0'b|xxd does print the null: 610062. Instead zsh acts as 'c code', an string ends where a null exists.
  • Stéphane Chazelas
    Stéphane Chazelas almost 9 years
    Note that from your list, . and - are not special in that quoting them won't make any difference (except maybe in things like [[ "-e" = x ]]). Quotes can also be used to escape shell language keywords (like "for" or \for to run the for command as opposed to start a for loop) or aliases (\rm to make sure to run the rm command in case there's a rm alias defined).
  • Stéphane Chazelas
    Stéphane Chazelas almost 9 years
    While it's not possible to include 0 bytes in the list of arguments and env vars passed to the execve() system call, there's nothing preventing builtin commands or shell functions from supporting them in their arguments. zsh does, and so does it support passing NUL bytes on its command lines. It also has a mkdir builtin, but obviously passing NUL bytes won't help since the mkdir() system call also takes a NUL delimited string as argument (actually zmodload zsh/files; mkdir $'\0' currently exhibits a bug there).
  • Stéphane Chazelas
    Stéphane Chazelas almost 9 years
    Note that it's possible to pass a NUL byte in a system call, just not for those syscall arguments that are pointers to NUL terminated strings. Example: write(fd, buf, 12) (here the size argument (12) indicates the extent of the (binary as opposed to text) string). Which is why printf '\0\0' does manage to print those 2 NUL bytes (it does a write(1, (char[]) {'\0','\0'}, 2)).
  • Peter Cordes
    Peter Cordes almost 9 years
    @StéphaneChazelas: thanks for the edit of shell special characters. I clarified the bit about sys calls, since your 2nd comment is probably helpful to anyone seeing this concept for the first time.
  • Stéphane Chazelas
    Stéphane Chazelas about 6 years
    @Paul, when zsh passes arguments to its builtins or functions, the kernel is not involved. The shell is free not to use NUL-delimited strings there.
  • Peter Cordes
    Peter Cordes about 6 years
    @PaulStelian: ZSH must use explicit-length strings internally, like C++-style std::string instead of C-style char*. i.e. a buffer which "knows" its own length, and thus doesn't require a special terminator / sentinel byte that can't appear inside strings.
  • Paul Stelian
    Paul Stelian about 6 years
    @PeterCordes That may be true but the strings must still pass the C interface when launching a process, and that one is unforgiving with the NUL character; exec (all flavors) cannot understand buffers which know their length.
  • ErikE
    ErikE almost 4 years
    I think you will have better results with printf '%s\0%s' a b. To just print the null character, use printf '\0'.
  • ErikE
    ErikE almost 4 years
    Note that printf '\0' is superior to printf '%b' '\0' and printf '%s\0%s' foo bar is superior to printf '%b' 'foo\0bar'.
  • Peter Cordes
    Peter Cordes almost 4 years
    @ErikE: Thanks, I hadn't thought of / realized that printf would process backslash escapes in the format string itself. Fixed.
  • ErikE
    ErikE almost 4 years
    @PeterCordes Cool! Thanks for hexdump -C as I was just realizing recently that I needed a tool to examine character values easily, and didn't know this command yet in unix.
  • Admin
    Admin about 2 years
    $'\0' only works in zsh, and even there can't be passed to external commands (in arguments or environment variables) embedded in file names, etc as the system calls involved (execve(), open()...) take input as NUL-delimited strings.