Using/reading/writing the null and escape characters
Solution 1
Printing the null character
On many recent shells you can write null character with dollar single quotes format $'\0'
, hexadecimal format \x00
, unicode format \u0000
or \U00000000
, or just as you tried with octal: '\0'
. The point is that the command has to understand what to do with backslash-escaped characters. For example in case of echo
usually one needs to add -e
option and in case of printf
that would be %b
.
Let's check if it works:
$ echo -ne '\0'
$
So produces nothing, just like echo -ne ''
, similar
$ printf '%b' '\0'
$
Let's add some characters around (I will stick with printf '%b'
from now on as more robust, but similar effect is with echo -ne
):
$ printf '%b' a'\0'b
ab
Only two characters were printed, where did the null go?
$ printf '%b' a'\0'b | wc -c
3
Let's compare it with a''b
:
$ printf '%b' a''b | wc -c
2
Last more check that we really print null character before trying to create file, let's pass the printed value to the command which will throw the error, like xargs
:
$ printf '%b' a'\0'b | xargs echo
xargs: Warning: a NUL character occurred in the input. It cannot be
passed through in the argument list. Did you mean to use the --null option?
a
Notice how only a
was printed at the end. Of course xargs -0
works fine:
$ printf '%b' a'\0'b | xargs -0 echo
a b
Creating the file with null?
Now let's try to create file with null character:
$ touch $'\0'
touch: cannot touch ‘’: No such file or directory
$ mkdir $'\0'
mkdir: cannot create directory ‘’: No such file or directory
# let's try another approach - using printf in command substitution:
$ touch "$(printf '%b' '\0')"
touch: cannot touch ‘’: No such file or directory
$ mkdir "$(printf '%b' '\0')"
mkdir: cannot create directory ‘’: No such file or directory
The result is exactly the same as in touch ''
, it seems null is just ignored all together. What if we skip double quotes around command substitution?
$ touch $(printf '%b' '\0')
touch: missing file operand
Try 'touch --help' for more information.
$ mkdir $(printf '%b' '\0')
mkdir: missing operand
Try 'mkdir --help' for more information.
This is the same situation as touch
/mkdir
without arguments at all. Yet another result is if we surround null with text:
$ touch "$(printf '%b' a'\0'b)"
$ ls
a # in zsh
ab # in bash
One can also try to redirect standard output to $'\0'
but all one gets is different kind of error.
Solution 2
Single/double quotes vs. backslash: single quotes and backslashes are equivalent in quoting power. It's a lot more convenient to use single quotes to quote a long string with spaces, tabs, newlines, ()[]*$><?|{}~&;"`^!#
and probably other characters I'm forgetting. But you could achieve exactly equivalent results with just backslashes (beware of the overloading of backslashes within backticks (`...`
) though)
Double quotes are unique, though. $
expands inside double-quotes, but not single. "$foo" expands foo, but protects the expanded result from word-splitting and glob expansion.
http://mywiki.wooledge.org/BashFAQ might be a good place to start. The bash manual doesn't spend much time on how to use all the features it describes, just how they work individually.
It's literally not possible to pass a string containing a zero-byte as a command line argument, or to a system call. The ABI (application binary interface) that specifies exactly how data is passed between processes and the kernel uses C strings for everything (except binary data), including command line arguments and file / path arguments to system calls. C strings are character arrays where the end-of-string is marked by a zero-byte. There is no way to "escape" a zero byte to indicate that it's not the end of a string.
Any attempt to do something like touch $'foo\0bar'
would just result in touch
seeing its argument list as
argv[0] = "/bin/touch";
argv[1] = "foo";
Even though sitting in memory, argv[1] = "foo\0bar\0"
, the first \0
marks the end of the string. Actually, "foo\0bar\0" wouldn't make it as far as the new process's argv. It wouldn't make it out of the argv array in the exevce(2)
system call that ran touch
.
And even if you wrote a C or perl program with character arrays / strings containing null bytes, passing them to a system call like open(2)
would cause the same interpretation of the string by kernel. System calls that need to handle arbitrary data, like read(2)
and write(2)
, take a length argument as well as a pointer to the buffer.
It's not even possible to do much of anything with null bytes with bash. As jimmij points out, the bash syntax for writing a string literal with escape-sequence processing is $'string'
, but writing a \0
in your string literal acts as a string terminator inside bash. I guess this means bash stores strings internally as C strings, not with an explicit length.
str=$'foo\0bar'
echo "${#str}" # 3, showing that bash isn't even storing it in a variable.
echo "$str" | wc -c # 4. wouldn't work even if ${#str} did: echo's cmdline would eat it
wc -c <<< $'foo\0bar' # 4 (includes a newline)
So we can't use this syntax to send a null byte anywhere. We'd have to use tr
or something.
However, bash printf
does have some limited support for \0
. The %b
conversion processes backslash escapes, including \0
. And printf
already processes such escapes in its format string.
-
printf '\0'
prints a zero byte. Pipe it intohexdump -C
to confirm. -
printf '%s\0%s' foo bar | hexdump -C
writesfoo.bar
(where . is a NUL byte) to stdout. Note that\0
inside single or double quotes doesn't expand on its own; only$'\0'
quoting would expand it before printf, which would make it act as a terminator. -
printf '%b' 'foo\0bar'
does the same thing, but more complicated.
Solution 3
As you know $var
will lead to interpreting the variable. The reason for why the different options work vary:
- the escape (
\$var
): do NOT interpret the very next character as shell functional character. BUT in some cases: give a special meaning (e.g.\n
for newline in some contexts) - the single quotes (
'$var'
): everything in single quotes will strictly be nothing but the very string they contain - the separation of
$
("$"var
): a single$
will not be interpreted, by placing it in double quotes it is separated from thevar
part and no interpretation happens - the double quotes (
"$var"
): will actually allow to interpret the variablevar
:mkdir "$var"
DOES NOT WORK AND IS NOT THE SAME AS THE OTHERS! Please check again! However anything contained inside the quotes will be treated as a single sting. Especially useful when you have special characters in file names, e.g. creating a file with a space in its name:touch "a b"
-> single filea b
created/updated,touch a b
-> two filesa
andb
created/updated.
Other special operators are: redirections and 'heres' > >> < << <<<
, process operators & |
, boolean operators || &&
, and command separators ;
and grouping with parentheses ( )
, sometimes - but then separated or as first character - the -
for stdin or options of commands. Also there is the test command [
and the quotes we already used ' "
, as well as recalling previous commands with an exclamation mark !
or comments with a hash #
, and the wildcards asterisk *
and question mark ?
for multiple and single characters. Also note that the current and parent dir are .
and ..
, while the home is set at ~/
. I.e. the characters ; & | > < - [ \ ' " ( ) # * ! ? . ~ ^ { }
, `
, newline, space, tab (and the other blank characters in single-byte locales) should be looked at twice but not all of them are "dangerous" on the same level. I hope I forgot none as there are quite a bunch of them.
Solution 4
In filenames, '/'
is prohibited because it is a directory separator. That is the only reason. And if you hand-edit a filesystem, you might even be able to create a file with '/'
in the name (not recommended, as you won't be able to do much with it).
The NUL character cannot be used as part of the filename, because the relevant system calls use C language string-passing conventions, and NUL is the terminator for such a string. So it cannot be interpreted as part of the name.
Note that creating a file called \0
is not the same as creating one containing a NUL - the former is a filename containing the two characters '\'
and '0'
.
Related videos on Youtube
Mohamed Medhat Sallam
Linux for me is a to enjoy life. Not just a kernel. echo "Thank you Linus trovalds"
Updated on September 18, 2022Comments
-
Mohamed Medhat Sallam almost 2 years
I know that a filename in Linux has no restriction whatsoever except for two characters
'/'
and'\0'
. I know that'/'
is prohibited because it is a directory separator but is there any other reason ?Also on my terminal I can create a file or a directory with the name
\0
. So I wonder how to write the null character correctly because obviously it shouldn't allow me to have a filename with null in itmkdir '\0'
will create a directory named\0
One more question, If I want to include
$
in my filename, I can use the backslashmkdir \$myfile
will create a directory named$myfile
However, I can do the same if I surround the dollar sign with single quotes and double quotes
mkdir \$myfile
is the same asmkdir '$'myfile
is the same asmkdir "$"myfile
is the same asmkdir '$myfile'
is the same asmkdir "$myfile"
So my question is, Are the single and double quotes a substitution for the escape backslash character?
Also what other characters need escaping in bash besides
$
, (space) and backslash ?-
Admin almost 9 yearsmkdir '\0' creates a two-character file; this is not the
NUL
character you were looking for ('\0'
is from C).
-
-
Admin almost 9 yearsUsing an hex viewer (like xxd) really helps. For example:
printf '%b' a'\0'b|xxd
. Of course, the simpler and almost always presentod
could also do the job:printf '%b' a'\0'b|od -An -tx1
-
Admin almost 9 yearsThe problem in bash with
touch "$(printf '%b' a'\0'b)"
is the 'command substitution'. Thisecho "$(printf '%b' a'\0'b)"|xxd
prints6162
, the null is removed by the 'command substitution'. Instead, this:printf '%b' a'\0'b|xxd
does print the null:610062
. Instead zsh acts as 'c code', an string ends where a null exists. -
Stéphane Chazelas almost 9 yearsNote that from your list,
.
and-
are not special in that quoting them won't make any difference (except maybe in things like[[ "-e" = x ]]
). Quotes can also be used to escape shell language keywords (like"for"
or\for
to run thefor
command as opposed to start afor
loop) or aliases (\rm
to make sure to run therm
command in case there's arm
alias defined). -
Stéphane Chazelas almost 9 yearsWhile it's not possible to include 0 bytes in the list of arguments and env vars passed to the
execve()
system call, there's nothing preventing builtin commands or shell functions from supporting them in their arguments.zsh
does, and so does it support passing NUL bytes on its command lines. It also has amkdir
builtin, but obviously passing NUL bytes won't help since themkdir()
system call also takes a NUL delimited string as argument (actuallyzmodload zsh/files; mkdir $'\0'
currently exhibits a bug there). -
Stéphane Chazelas almost 9 yearsNote that it's possible to pass a NUL byte in a system call, just not for those syscall arguments that are pointers to NUL terminated strings. Example:
write(fd, buf, 12)
(here the size argument (12) indicates the extent of the (binary as opposed to text) string). Which is whyprintf '\0\0'
does manage to print those 2 NUL bytes (it does awrite(1, (char[]) {'\0','\0'}, 2)
). -
Peter Cordes almost 9 years@StéphaneChazelas: thanks for the edit of shell special characters. I clarified the bit about sys calls, since your 2nd comment is probably helpful to anyone seeing this concept for the first time.
-
Stéphane Chazelas about 6 years@Paul, when
zsh
passes arguments to its builtins or functions, the kernel is not involved. The shell is free not to use NUL-delimited strings there. -
Peter Cordes about 6 years@PaulStelian: ZSH must use explicit-length strings internally, like C++-style
std::string
instead of C-stylechar*
. i.e. a buffer which "knows" its own length, and thus doesn't require a special terminator / sentinel byte that can't appear inside strings. -
Paul Stelian about 6 years@PeterCordes That may be true but the strings must still pass the C interface when launching a process, and that one is unforgiving with the NUL character; exec (all flavors) cannot understand buffers which know their length.
-
ErikE almost 4 yearsI think you will have better results with
printf '%s\0%s' a b
. To just print the null character, useprintf '\0'
. -
ErikE almost 4 yearsNote that
printf '\0'
is superior toprintf '%b' '\0'
andprintf '%s\0%s' foo bar
is superior toprintf '%b' 'foo\0bar'
. -
Peter Cordes almost 4 years@ErikE: Thanks, I hadn't thought of / realized that printf would process backslash escapes in the format string itself. Fixed.
-
ErikE almost 4 years@PeterCordes Cool! Thanks for
hexdump -C
as I was just realizing recently that I needed a tool to examine character values easily, and didn't know this command yet in unix. -
Admin about 2 years
$'\0'
only works in zsh, and even there can't be passed to external commands (in arguments or environment variables) embedded in file names, etc as the system calls involved (execve()
,open()
...) take input as NUL-delimited strings.