Escape a variable for use as content of another script

104,194

Solution 1

TL;DR: skip to the conclusion.

While several shells/tools have builtin quoting operators some of which have already been mentioned in a few answers, I'd like to stress here that many are unsafe to use depending on:

  • what is being quoted
  • context in which the quoted string is used.
  • the locale in which the quoted output is generated
  • the locale in which that generated quoted output is later used.

Several things to consider:

  • in some contexts, it's important the empty string be represented as '' or "". For instance, if it's to be used in sh -c "cmd $quoted_output" it matters if we want what was quoted to be passed as one argument to cmd. In sh -c "var=$quoted_output; ...", it doesn't matter whether the empty string is represented as '', "" or as the empty string.

    The $var:q operator of zsh represents the empty string as the empty string, not '', "" nor $''.

    The ${var@Q} operator of bash (itself copied from mksh which behaves differently in this regard), represents an empty $var as '', but an unset $var as the empty string:

    $ empty_var= bash -c 'printf "<%s>\n" "${empty_var@Q}" "${unset_var@Q}"'
    <''>
    <>
    $ empty_var= mksh -c 'printf "<%s>\n" "${empty_var@Q}" "${unset_var@Q}"'
    <''>
    <''>
    $ empty_var= zsh -c 'printf "<%s>\n" "${empty_var:q}" "${unset_var:q}"'
    <>
    <>
    
  • some of those quoting operators will use a combination of '...', \, "..." or $'...'. The syntax of the latter varies between shells and between versions of a given shell. So for those operators that do use it or can use it depending on the input, it's important that the result be used in the same shell (and same version thereof). That applies at least to:

    • the printf %q of GNU printf, bash, ksh93, zsh
    • zsh's $var:q, ${(q)var}, ${(q+)var}, ${(qqqq)var},
    • mksh's ${var@Q}
    • bash's ${var@Q},
    • the typeset/declare/export -p output of ksh93, mksh, zsh
    • the alias/set output of bash, ksh93, mksh, zsh
    • the xtrace output of ksh93, mksh, zsh

    In any case $'...' is not (yet¹) a standard sh quoting operator, and beware that non-Bourne-like shells such as rc, es, akanga, fish have completely different quoting syntax. There is simply no way to quote a string in a way that is compatible with every shell in existence (though see this other Q&A for some ways to work around it).

  • some shells decode their input as characters before interpreting the code in it, some don't, and some do it sometimes, and sometimes not.

    Some shells (like bash) also make their syntax conditional on the locale. For instance, token delimiters in the syntax are the characters considered as blanks in the locale in yash and bash (though in bash, that only works for single-byte ones). Some shells also rely on the locale's character classification to decide what characters are valid in a variable name. So for instance Stéphane=1 could be interpreted as an assignment in one locale, or as the invocation of the Stéphane=1 command in another.

    The sequence of bytes 0xa3 0x5c represents the £\ string in the ISO-8859-1 (aka latin1) character set, the α character in BIG5, or an invalid sequence of bytes in UTF-8. \ happens to be a special character in the shell syntax, including within "..." and $'...'. ` is also a (dangerous) character whose encoding can be found in the encoding of other characters in some locales.

    Byte 0xa0 is the non-breaking-space character in a great number of single-byte character sets and that character is considered as blank in some locales on some systems, and as such as a token delimiter in the syntax of bash or yash there.

    That byte is also found in the UTF-8 encoding of thousands of characters including many alphabetical ones (like à, encoded as 0xc3 0xa0).

    I'm not aware of any charset in use in any locale of any ASCII-based systems that have characters whose encoding contains the encoding of ' though.

    Some shell quoting operators output $'\u00e9' or $'\u[e9]' for the é character for instance. And that in turn, when used, depending on the shell, and the locale at the time of interpreting or running the code that uses it will be expanded to its UTF-8 encoding or in the locale's encoding (with variation in behaviour if the locale doesn't have that character).

    So, it's not only important that the resulting string be used in the same shell and shell version, but also that it be used in the same locale (at least for those shells that do some character encoding/decoding). And even then, several shells (including bash) have or have had bugs in that regard.

    Any quoting operator that uses $'...', "...", or backslash for quoting or that leaves some non-ASCII characters unquoted is potentially unsafe.

    Or in other words, only the ones that use '...' are safe in that regard. That leaves:

    • zsh's ${(qq)var} operator
    • The alias output of dash/bash,bosh (at least current versions).
    • The export -p of dash/bosh (at least current versions).
    • the set output of dash (at least current versions).

    Though of those only the first is documented and committed to always use single quotes (though note the caveat about rcquotes below).

    Also note that yash can't cope with data that can't be decoded in the locale's charset, so there's no way to pass arbitrary data to that shell (at least in the current version).

    Ironically, the output of the locale utility has the problem (as it's required to use "..." to output implied settings), and it's typically intended to be used to input code in a locale that is different from that where locale was invoked (to restore the locale).

  • The NUL character (0 byte) cannot occur in an environment variable or in arguments of a command that is executed by way of the execve() system call (that's a limitation of that system call that takes those env and arguments strings as C-style NUL-delimited strings). Except in zsh, NUL cannot be found in shell variables or builtin arguments or more generally shell code either.

    A 0 byte however can be read and written alright from/to a file or pipe or any I/O mechanism.

    In zsh it can be stored in a variable, read and written, passed as argument to builtins like in any modern programming language (such as python or perl).

    But bear in mind that if you quote a NUL with any method that leaves it as-is (as opposed to $'\0', $'\x0', $'\u0000', $'\C@' for instance), regardless of how it is quoted, the result cannot be passed in an argument or env var to an executed command, and no other shell will be able to make use of that NUL character.

    That's possibly to bear in mind if you take external input in zsh, as in IFS= read -r var. If a NUL byte is included in that line read from stdin, $var and ${(qq)var} will contain it which may restrict what you can do with it.

    That's one case where using the $'...' form of quoting can be preferable (if the other caveats associated with that form of quoting (see above) can be addressed).

  • If the resulting quoted text is to be used in shell code located inside backticks, beware that there's an extra layer of backslash interpretation. Always use $(...) in place of `...`.

  • Some characters are only special in some context. For instance = is special in the words that precede the command name (as in a=1 cmd arg), but not after² (as in cmd a=1), though there are some special cases in some shells for commands like export, readonly...

    ~ is special in some contexts and not others.

    Not all quoting operators will quote those.

    Some characters are special in some shells but not in others, or only when some option is enabled...

    Even digits are special in some contexts. For instance sh -c "echo ${quoted_text}>file" would not output the quoted text in file, if 2 was not quoted as '2' for instance.

  • in zsh, the rcquotes option affects how single-quoted strings are interpreted (and generated by its quoting operators). When enabled, a single quote can be represented in a single-quoted string with '' like in the rc shell. For instance, "foo'bar" can also be written 'foo''bar'.

    So it's important that the quoted string generated when rcquotes is enabled be only interpreted by zsh instances that also have rcquotes enabled.

    A ${(qq)var} produce by a zsh with or without rcquotes should be safe to use in zsh -o rcquotes, but notes that in zsh -o rcquotes, concatenating single quoted strings would result in a single quote being inserted between them.

    $ quoted_text="'*'"
    $ zsh -o rcquotes -c "echo $quoted_text$quoted_text"
    *'*
    

    same as:

    $ rc -c "echo $quoted_text$quoted_text"
    *'*
    

    You can work around it by inserting "" in between the two:

    $ zsh -o rcquotes -c "echo $quoted_text\"\"$quoted_text"
    **
    

    While in rc and derivatives (where "..." is not a quoting operator, '...' being the only kind of quotes, hence the need to be able to insert ' within them), you'd use ^:

    $ rc -c "echo $quoted_text^$quoted_text"
    **
    

In conclusion

The only quoting method that is safe (if we limit to Bourne-like shells and disregard yash and `...` or rogue locales, and assume the data doesn't contain NUL characters) is single quoting of everything (even the empty string, even characters you'd imagine never to be a problem), and represent the single quote character itself as \' or "'" outside of the single-quotes, as was the initial intent in your question.

To do that you can use:

  • zsh's ${(qq)var} operator (or "${(qq@)array}" for an array), assuming the rcquotes option is not enabled.

  • a function like:

    shquote() {
      LC_ALL=C awk -v q="'" '
        BEGIN{
          for (i=1; i<ARGC; i++) {
            gsub(q, q "\\" q q, ARGV[i])
            printf "%s ", q ARGV[i] q
          }
          print ""
        }' "$@"
    }
    

    or

    shquote() {
      perl -le "print join ' ', map {q(') . s/'/'\\\\''/gr . q(')} @ARGV" -- "$@"
    }
    
  • ksh93/zsh/bash/mksh:

    quoted_text=\'${1//\'/\'\\\'\'}\'
    

    (don't double-quote the expansion and don't use it outside of scalar variable assignments, or you'll run into compatibility problems between different versions of bash (see description of compat41 option))


¹ The POSIX specification of $'...' was initially targetted for Issue 8 of the Single UNIX Specification, expected to be released in 2021 at the earliest, but it looks like it's not going to make it (consensus on a resolution was not reached in time). So, we'll probably have to wait at least another decade before $'...' is added to the standard

² except when the -k (keyword) option of the Bourne shell and some of its derivatives is enabled

Solution 2

Bash has a parameter expansion option for exactly this case:

${parameter@Q} The expansion is a string that is the value of parameter quoted in a format that can be reused as input.

So in this case:

foo_esc="${foo@Q}"

This is supported in Bash 4.4 and up. There are several options for other forms of expansion as well, and for specifically generating complete assignment statements (@A).

Solution 3

Bash provides a printf builtin with %q format specifier, which performs shell escaping for you, even in older (<4.0) versions of Bash:

printf '[%q]\n' "Ne'er do well"
# Prints [Ne\'er\ do\ well]

printf '[%q]\n' 'Sneaky injection $( whoami ) `ls /root`'
# Prints [Sneaky\ injection\ \$\(\ whoami\ \)\ \`ls\ /root\`]

This trick can also be used to return arrays of data from a function:

function getData()
{
  printf '%q ' "He'll say hi" 'or `whoami`' 'and then $( byebye )'
}

declare -a DATA="( $( getData ) )"
printf 'DATA: [%q]\n' "${DATA[@]}"
# Prints:
# DATA: [He\'ll\ say\ hi]
# DATA: [or\ \`whoami\`]
# DATA: [and\ then\ \$\(\ byebye\ \)]

Note that the Bash printf builtin is different than the printf utility which comes bundled with most Unix-like operating systems. If, for some reason, the printf command invokes the utility instead of the builtin, you can always execute builtin printf instead.

Solution 4

I guess I didn't RTFM. It can be done like so:

q_mid=\'\\\'\'
foo_esc="'${foo//\'/$q_mid}'"

Then echo "$foo_esc" gives the expected 'bar'\''baz'


How I'm actually using it is with a function:

function esc_var {
    local mid_q=\'\\\'\'
    printf '%s' "'${1//\'/$mid_q}'"
}

...

foo_esc="`esc_var "$foo"`"

Modifying this to use the printf built-in from Dejay's solution:

function esc_vars {
    printf ' %q' "$@" | cut -b 2-
}

To heed Stéphane's warnings about incompatibilities between different versions of bash, regarding single quotes inside double-quoted expansions, the bullet-proof function becomes:

esc_vars() {
    local fmt
    fmt='%s'
    local v
    while [ $# -gt 0 ]; do
        v=\'${1//\'/\'\\\'\'}\'
        printf "$fmt" "$v"
        fmt=' %s'
        shift
    done
}

Solution 5

There are several solutions to quote a var value:

  1. alias
    In most shells (where alias is available)(except csh, tcsh and probably others csh like):

    $ alias qux=bar\'baz
    $ alias qux
    qux='bar'\''baz'
    

    Yes, this works in many sh-like shells like dash or ash.

  2. set
    Also in most shells (again, not csh):

    $ qux=bar\'baz
    $ set | grep '^qux='
    qux='bar'\''baz'
    
  3. typeset
    In some shells (ksh, bash and zsh at least):

    $ qux=bar\'baz
    $ typeset -p qux
    typeset qux='bar'\''baz'             # this is zsh, quoting style may
                                         # be different for other shells.
    
  4. export
    First do:

    export qux=bar\'baz
    

    Then use:
    export -p | grep 'qux=' export -p | grep 'qux='
    export -p qux

  5. quote
    echo "${qux@Q}"
    echo "${(qq)qux}" # from one to four q's may be used.

Share:
104,194

Related videos on Youtube

Walf
Author by

Walf

Updated on September 18, 2022

Comments

  • Walf
    Walf over 1 year

    This question is not about how to write a properly escaped string literal. I couldn't find any related question that isn't about how to escape variables for direct consumption within a script or by other programs.

    My goal is to enable a script to generate other scripts. This is because the tasks in the generated scripts will run anywhere from 0 to n times on another machine, and the data from which they are generated may change before they're run (again), so doing the operations directly, over a network will not work.

    Given a known variable that may contain special characters such as single quotes, I need to write that out as a fully escaped string literal, e.g. a variable foo containing bar'baz should appear in the generated script as:

    qux='bar'\''baz'
    

    which would be written by appending "qux=$foo_esc" to the other lines of script. I did it using Perl like this:

    foo_esc="'`perl -pe 's/('\'')/\\1\\\\\\1\\1/g' <<<"$foo"`'"
    

    but this seems like overkill.

    I have had no success in doing it with bash alone. I have tried many variations of these:

    foo_esc="'${file//\'/\'\\\'\'}'"
    foo_esc="'${file//\'/'\\''}'"
    

    but either extra slashes appear in the output (when I do echo "$foo"), or they cause a syntax error (expecting further input if done from the shell).

  • Walf
    Walf almost 7 years
    Neat, but only have 4.2 which gives bad substitution.
  • JdeBP
    JdeBP almost 7 years
    The Z shell equivalent is "${foo:q}".
  • Walf
    Walf over 5 years
    I'm not sure how that helps if what I'd need printed would be 'Ne'\''er do well', etc., i.e. quotes included in the output.
  • Dejay Clayton
    Dejay Clayton over 5 years
    @Walf I think you're not understanding that the two forms are equivalent, and both are perfectly as safe as each other. E.g. [[ 'Ne'\''er do well' == Ne\'er\ do\ well ]] && echo 'equivalent!' will echo equivalent!
  • Walf
    Walf over 5 years
    I did miss that :P however I prefer the quoted form as it's easier to read in a syntax-highlighting viewer/editor.
  • Dejay Clayton
    Dejay Clayton over 5 years
    @Walf it seems like your approach is pretty dangerous, considering that in your example Perl, passing a value like 'hello' results in the incorrect value ''\''hello'', which has an unnecessary leading empty string (the first two single quotes), and an inappropriate trailing single quote.
  • Walf
    Walf over 5 years
    Not dangerous, produces correct (if slightly more verbose) output of ''\''hello'\''', I just forgot the g flag in my example, and I'm not using Perl.
  • Dejay Clayton
    Dejay Clayton over 5 years
    @Walf but it is dangerous, try passing your function this input: $'hello\nthere'. It will end badly for your script if end-user values ever make it to your script, including derived end-user values, such as those that might be contained in source code repositories, etc.
  • Walf
    Walf over 5 years
    I don't get what you mean, how would a raw dollar sign begin an assignment? I get filenames piped from scripts, so the input is escaped already (foo=\$\'hello\\nthere\'; mid_q=\'\\\'\'; foo_esc="$(printf '%s' "'${foo//\'/$mid_q}'" <<<"$foo")"; echo "$foo_esc") prints '$'\''hello\nthere'\''' and I got the same result from the perl version.
  • Walf
    Walf over 5 years
    (mid_q=\'\\\'\'; while IFS= read -r -d '' file; do printf '%s' "'${file//\'/$mid_q}'"; done < <(find . -maxdepth 1 -type f -print0)) worked perfectly even on your test input (except my filename had an actual newline in it).
  • Dejay Clayton
    Dejay Clayton over 5 years
    @Walf, for clarification, $'escape-these-chars' is the ANSI-C quoting feature of Bash that causes all characters within the specified string to be escaped. Thus, to easily create a string literal that contains a newline within the filename (e.g. $'first-line\nsecond-line'), use \n within this construct.
  • Dejay Clayton
    Dejay Clayton over 5 years
    @Walf your approach would probably work fine as long as you don't have secondary scripts parsing your first scripts line-by-line.
  • Walf
    Walf over 5 years
    I'm not parsing, I'm generating. I wouldn't be able to write a BASH parser in BASH.
  • Steven Shaw
    Steven Shaw almost 5 years
    @JdeBP that Z shell equivalent doesn't work. Any other ideas for zsh?
  • Steven Shaw
    Steven Shaw almost 5 years
    I found the answer: "${(@qq)foo}"
  • jw013
    jw013 over 4 years
    The alias approach is clever and seems like it is specified by POSIX. For maximum portability I think this is the way to go. I believe the suggestions involving grep with export or set may break on variables containing embedded newlines.
  • benathon
    benathon almost 4 years
    The function should be esc_var and not esc_vars I know it's a small edit, and of course I would edit it myself, but I can't do single character edits
  • Walf
    Walf almost 4 years
    @portforwardpodcast The plural on the latter version is deliberate because the "$@" is used to expand and escape all arguments passed, unlike the former which only escapes the first argument and drops any others. Your comment did prompt be to check it and it was not separating them properly, so thanks.
  • vijay
    vijay over 3 years
    In perl, I use $q = "\x27"; to avoid the double-quote/single-quote/backslash construct you used.
  • Stéphane Chazelas
    Stéphane Chazelas over 3 years
    @LL3. Thanks. I've added a note about it. Note that that option has been described as "obsolete" in the Korn shell for at least the past 25 years and not included in the POSIX specification of sh.
  • Stéphane Chazelas
    Stéphane Chazelas over 3 years
    Note that ${var@Q} was actually copied from mksh, and is not among the safest to use. See my answer for details.
  • AdminBee
    AdminBee over 3 years
    Perhaps you could add an example on how to apply it to the OPs code (your answer ended up in my "low quality review" queue because it was so short ...)
  • Kusalananda
    Kusalananda over 3 years
    This seems irrelevant to the question at hand, which does not mention PHP at all.
  • seyed
    seyed over 2 years
    For arrays like $@ you can use this ${*@Q}