What characters are required to be escaped in command line arguments?

29,808

Solution 1

The following characters have special meaning to the shell itself in some contexts and may need to be escaped in arguments:

Character Unicode Name Usage
` U+0060 (Grave Accent) Backtick Command substitution
~ U+007E Tilde Tilde expansion
! U+0021 Exclamation mark History expansion
# U+0023 Number sign Hash Comments
$ U+0024 Dollar sign Parameter expansion
& U+0026 Ampersand Background commands
* U+002A Asterisk Filename expansion and globbing
( U+0028 Left Parenthesis Subshells
) U+0029 Right Parenthesis Subshells
U+0009 Tab () Word splitting (whitespace)
{ U+007B Left Curly Bracket Left brace Brace expansion
[ U+005B Left Square Bracket Filename expansion and globbing
| U+007C Vertical Line Vertical bar Pipelines
\ U+005C Reverse Solidus Backslash Escape character
; U+003B Semicolon Separating commands
' U+0027 Apostrophe Single quote String quoting
" U+0022 Quotation Mark Double quote String quoting with interpolation
U+000A Line Feed Newline Line break
< U+003C Less than Input redirection
> U+003E Greater than Output redirection
? U+003F Question mark Filename expansion and globbing
U+0020 Space Word splitting1 (whitespace)

Some of those characters are used for more things and in more places than the one I linked.


There are a few corner cases that are explicitly optional:


Escaping a newline requires quoting — backslashes won't do the job. Any other characters listed in IFS will need similar handling. You don't need to escape ] or }, but you do need to escape ) because it's an operator.

Some of these characters have tighter limits on when they truly need escaping than others. For example, a#b is ok, but a #b is a comment, while > would need escaping in both contexts. It doesn't hurt to escape them all conservatively anyway, and it's easier than remembering the fine distinctions.

If your command name itself is a shell keyword (if, for, do) then you'll need to escape or quote it too. The only interesting one of those is in, because it's not obvious that it's always a keyword. You don't need to do that for keywords used in arguments, only when you've (foolishly!) named a command after one of them. Shell operators ((, &, etc) always need quoting wherever they are.


1Stéphane has noted that any other single-byte blank character from your locale also needs escaping. In most common, sensible locales, at least those based on C or UTF-8, it's only the whitespace characters above. In some ISO-8859-1 locales, U+00A0 no-break space is considered blank, including Solaris, the BSDs, and OS X (I think incorrectly). If you're dealing with an arbitrary unknown locale, it could include just about anything, including letters, so good luck.

Conceivably, a single byte considered blank could appear within a multi-byte character that wasn't blank, and you'd have no way to escape that other than putting the whole thing in quotes. This isn't a theoretical concern: in an ISO-8859-1 locale from above, that A0 byte which is considered a blank can appear within multibyte characters like UTF-8 encoded "à" (C3 A0). To handle those characters safely you would need to quote them "à". This behaviour depends on the locale configuration in the environment running the script, not the one where you wrote it.

I think this behaviour is broken multiple ways, but we have to play the hand we're dealt. If you're working with any non-self-synchronising multibyte character set, the safest thing would be to quote everything. If you're in UTF-8 or C, you're safe (for the moment).

Solution 2

In GNU Parallel this is tested and used extensively:

$a =~ s/[\002-\011\013-\032\\\#\?\`\(\)\{\}\[\]\^\*\<\=\>\~\|\; \"\!\$\&\'\202-\377]/\\$&/go;
# quote newline as '\n'                                                                                                         
$a =~ s/[\n]/'\n'/go;

It is tested in bash,dash,ash,ksh,zsh, and fish. Some of the characters do not need quoting in some (versions) of the shells, but the above works in all tested shells.

If you simply want a string quoted, you can pipe it into parallel --shellquote:

printf "&*\t*!" | parallel --shellquote

Solution 3

For lightweight escaping solution in Perl, I'm following the principle of single quotes. A Bash-string in single quotes can have any character, except the single quote itself.

My code:

my $bash_reserved_characters_re = qr([ !"#$&'()*;<>?\[\\`{|~\t\n]);

while(<>) {
    if (/$bash_reserved_characters_re/) {
        my $quoted = s/'/'"'"'/gr;
        print "'$quoted'";
    } else {
        print $_;
    }
}

Example run 1:

$ echo -n "abc" | perl escape_bash_special_chars.pl
abc

Example run 2:

echo "abc" | perl escape_bash_special_chars.pl
'abc
'

Example run 3:

echo -n 'ab^c' | perl escape_bash_special_chars.pl
ab^c

Example run 4:

echo -n 'ab~c' | perl escape_bash_special_chars.pl
'ab~c'

Example run 5:

echo -n "ab'c" | perl escape_bash_special_chars.pl
'ab'"'"'c'

echo 'ab'"'"'c'
ab'c
Share:
29,808

Related videos on Youtube

Tim
Author by

Tim

Elitists are oppressive, anti-intellectual, ultra-conservative, and cancerous to the society, environment, and humanity. Please help make Stack Exchange a better place. Expose elite supremacy, elitist brutality, and moderation injustice to https://stackoverflow.com/contact (complicit community managers), in comments, to meta, outside Stack Exchange, and by legal actions. Push back and don't let them normalize their behaviors. Changes always happen from the bottom up. Thank you very much! Just a curious self learner. Almost always upvote replies. Thanks for enlightenment! Meanwhile, Corruption and abuses have been rampantly coming from elitists. Supportive comments have been removed and attacks are kept to control the direction of discourse. Outright vicious comments have been removed only to conceal atrocities. Systematic discrimination has been made into policies. Countless users have been harassed, persecuted, and suffocated. Q&amp;A sites are for everyone to learn and grow, not for elitists to indulge abusive oppression, and cover up for each other. https://softwareengineering.stackexchange.com/posts/419086/revisions https://math.meta.stackexchange.com/q/32539/ (https://i.stack.imgur.com/4knYh.png) and https://math.meta.stackexchange.com/q/32548/ (https://i.stack.imgur.com/9gaZ2.png) https://meta.stackexchange.com/posts/353417/timeline (The moderators defended continuous harassment comments showing no reading and understanding of my post) https://cs.stackexchange.com/posts/125651/timeline (a PLT academic had trouble with the books I am reading and disparaged my self learning posts, and a moderator with long abusive history added more insults.) https://stackoverflow.com/posts/61679659/revisions (homework libels) Much more that have happened.

Updated on September 18, 2022

Comments

  • Tim
    Tim over 1 year

    In Bash, when specifying command line arguments to a command, what characters are required to be escaped?

    Are they limited to the metacharacters of Bash: space, tab, |, &, ;, (, ), <, and >?

    • Jeff Schaller
      Jeff Schaller about 8 years
      Don't forget (possible) filename globbing with * and ?
    • Tim
      Tim about 8 years
      Thanks. Could you exhaustively list the kinds of characters which need to be escaped in cmd line args?
    • Wildcard
      Wildcard about 8 years
      The list is good to have, but the most important thing to understand about quoting, is: Everything between single quotes is passed literally and without word splitting. No exceptions. (This means there is no way whatsoever to embed a single quote within single quotes, by the way, but that's easy to work around.)
  • Stéphane Chazelas
    Stéphane Chazelas about 8 years
    Other blanks in your locale would need escaping as well (except currently the multi-byte one because of a bug)
  • Stéphane Chazelas
    Stéphane Chazelas about 8 years
    You only need to escape ! when csh history expansion is enabled, typically not in scripts. [ ! -f a ] or find . ! -name... are fine. That's covered by your tighter limits section but maybe worth mentioning explicitly.
  • Stéphane Chazelas
    Stéphane Chazelas about 8 years
    Note that there are contexts where other characters need quoting like: hash[foo"]"]=, ${var-foo"}"}, [[ "!" = b ]], [[ a = "]]" ]], the regexp operators for [[ x =~ ".+[" ]]. Other keywords than { (if, while, for...) would need to be quoted so they're not recognised as such...
  • Michael Homer
    Michael Homer about 8 years
    To the extent that those are command-line arguments at all, the interpretation is up to the command in question (just like ]), so I'm not listing them. I don't think any keyword needs quoting in argument position.
  • Stéphane Chazelas
    Stéphane Chazelas about 8 years
    If you mean in arguments other than the first (zeroth)? Then yes.
  • Michael Homer
    Michael Homer about 8 years
    I assumed the question was about arguments "to a command" in the shell-syntax sense, but I suppose you're right that argv[0] is strictly an argument too. If Tim wants to edit to clarify that point I'll update the list, but otherwise I'll let the assumption stand.
  • Tim
    Tim about 8 years
    Thanks. I didn't explicitly mention the zeroth argument, neither did I explicitly realize that. But I think Stephane is correct, and I agree.
  • Admin
    Admin about 8 years
    A dash - (used for options), *?[+@ (pathname expansions), Job control % the builtins : and . and the control characters that make up white space: In the POSIX locale, white space consists of one or more <blank> ( <space> and <tab> characters), <newline>, <carriage-return>, <form-feed>, and <vertical-tab> characters. (A filename with any of those characters need to be quoted).
  • Michael Homer
    Michael Homer about 8 years
    Quoting builtins, dashes, or % doesn't do anything.
  • Admin
    Admin about 8 years
    A dash in a filename (touch -- -awe) needs ls ./-awe at the very least. You may not call that "quoting", but it is: escaping troubling characters. The builtins : and . need quoting as argv[0] if an alias exist (alias .='source nothing.sh'), then \. will actually execute the builtin (not the alias). Maybe you are right about Job control. But I hope that pathname expansions will not raise any complaint.
  • Admin
    Admin about 8 years
    @StéphaneChazelas White space in UNICODE is hardly only space and tab It include, at least, hex 0x09, 0x0A, 0x0B, 0x0C, 0x0D, and (which may raise some debate but it is a single byte whitespace in cp-1252 at least) 0x85 (Horizontal Ellipsis).
  • Stéphane Chazelas
    Stéphane Chazelas about 8 years
    @BinaryZebra, we're talking of [[:blank:]] (which in the C locale is TAB and space), not [[:space:]], the FF, CR, VT... don't need quoting (except maybe for CR on some Microsoft ports of bash). In UTF-8, all the non-ascii characters are multi-byte and so fall into that current bug of bash. But for instance on latin1 locales on Solaris, 0xa0, the non-breaking space is a [[:blank:]], so needs quoting (even though the whole point of that character should be that it doesn't break...)
  • Stéphane Chazelas
    Stéphane Chazelas about 8 years
    @MichaelHomer, on Solaris in iso-8859-1 locales, 0xa0 is a blank. LC_ALL=en_GB.ISO8859-1 bash -c $'printf "%s," a\xa0b' outputs a,b, there.
  • Michael Homer
    Michael Homer about 8 years
    @StéphaneChazelas: Not just there - it seems to be the case on BSDs and OS X too. That seems clearly wrong.
  • Stéphane Chazelas
    Stéphane Chazelas about 8 years
    @MichaelHomer, I agree it's not desirable. But that seems to be what POSIX requires. Same applies to xargs, and would probably apply to the grammar of awk or bc for instance. I had said at the time I would bring it up to the austin group mailing list, but never gotten round to do it. I'll try and give it a go.
  • Stéphane Chazelas
    Stéphane Chazelas about 8 years
    You hinted to it already, but that means that you need to quote the à in echo Voilà | iconv -f utf-8 in a script, as if called in a latin1 locale on those systems, the 0xa0 byte in that à character would be taken as a token separator. IOW, the script is parsed based on the locale of the user, not his author's which sounds wrong to me.
  • Admin
    Admin about 8 years
    @StéphaneChazelas What you wrote was "Other blanks in your locale". That is not limited in any way to C. In fact it is written to include any your locale. A locale may include anything as it see fit in the blank category. Of which you give an example. In the broader view of what a language may interpret as blanks, the Unicode white space list serves as the most extreme example. That's why I presented it.
  • Admin
    Admin about 8 years
    @StéphaneChazelas In bash -c $'printf "%s," a\xa0t' you are obviously asking for a byte (\x..), which bash correctly gives. Even in a utf8 locale, this: LC_ALL=en_US.utf8 bash -c $'printf "%s," a\xa0t' will produce a byte with value 0xA0 (which renders as a broken character, which it is in a utf-8 locale). The additional language effects that byte might have depend on the locale description. Which looks broken on the Solaris you describe.
  • Michael Homer
    Michael Homer about 8 years
    @BinaryZebra: "blank" has a specific meaning of word separators (see e.g. ISO C99, incorporated by reference into POSIX), which form feeds &c don't meet. An uncontrolled locale could include such characters as blanks, incorrectly, but that's far from the most extreme case - everything could be a blank in that locale. Which, really, is the problem with the POSIX tokenisation requirement.
  • Admin
    Admin about 8 years
    @StéphaneChazelas According to this: A [:blank:] in Unicode is [\p{Zs}\t]. A \p{Zs} (or \p{Space_Separator}) is : a whitespace character that is invisible, but does take up space. Similar to this list from EM to HAIR space.
  • Admin
    Admin about 8 years
    @MichaelHomer The problem is that you try to look at the issue only from the programing POV (Point of view). Yes, it is desirable to have a clear list of [[:blank:]] characters. Which a default locale for C language of "C" does very well ( it only means 0x20 0x09) but if you are to embrace languages and language definitions the list may (and in fact does drift) change (with all the security issues you may want to add to such change). But expecting a locale of en_US.utf8 (or most others) to strictly fit the (very limited) view of only 0x20 0x09 for [[:blank:]] is simply wrong.
  • Stéphane Chazelas
    Stéphane Chazelas about 8 years
    @BinaryZebra, nobody said en_US.utf8 blanks only had 0x9 and 0x20, just that its only single-byte blanks were 0x9 and 0x20 as the only single-byte characters in UTF-8 are the ASCII ones and no other character in ASCII fit the definition of blank. Of course, you could construct a malicious locale where a or " itself is a blank, but then you can't expect much to work with that. See the discussions there have around CVE-2014-0475 there have been at the time.
  • Stéphane Chazelas
    Stéphane Chazelas about 8 years
    @BinaryZebra, in bash -c $'printf "%s," a\xa0t', it's my shell, not bash that expands the $'\xa0', bash sees a nbsp character in between the a and t which it treats as a token delimiter when called in a latin1 locale on Solaris/OS/X...
  • Admin
    Admin about 8 years
    @StéphaneChazelas This dual talk about bytes and code points is wrong. A character is whichever encoding it is, whether it is one byte, two bytes or 10 bytes it is deeply irrelevant (as long as it is a valid character in whichever encoding is used). A nbsp is a character. That it happens to be a single byte 0xA0 in iso-8859-1 should be of no real importance. That, as is today, the only one byte encoded blanks in utf-8 are 0x09 and 0x20 must have no real meaning. That may change tomorrow. As soon as some other encoding is named (or used) the characters have been actually "converted".
  • Michael Homer
    Michael Homer about 8 years
    Bash uses isblank at the byte level; bytes are deeply relevant. In any case, comments are not discussion forums, so let's stick to relevant and material improvements to the answer.
  • Jari Turkia
    Jari Turkia over 6 years
    Yes, valid point that. My view is that most people will land on this page, because they have a problem to solve. Not because this makes an interesting academic debate. That's why I'd like to offer solutions and discuss the merits of them, even while being slightly off-topic.
  • Jari Turkia
    Jari Turkia over 6 years
    My code is just an implementation of Michael Homer's answer. I didn't intent to bring any more information, than what he did.
  • Mog0
    Mog0 about 6 years
    How have I not heard of parallel before...
  • Ole Tange
    Ole Tange about 6 years
    @TomH It will be appreciated if you can spend 5 minutes thinking of how we could have reached you.
  • Mog0
    Mog0 about 6 years
    I think it's a progression problem. most people don't need or understand parallel until they have progressed through some complexity stages. By which time they have come across xargs, nohup and stuff like that. Also I don't see many people using parallel to solve problems in stack exchange or when I google for solutions to bash problems