Escape a variable for use as content of another script
Solution 1
TL;DR: skip to the conclusion.
While several shells/tools have builtin quoting operators some of which have already been mentioned in a few answers, I'd like to stress here that many are unsafe to use depending on:
- what is being quoted
- context in which the quoted string is used.
- the locale in which the quoted output is generated
- the locale in which that generated quoted output is later used.
Several things to consider:
-
in some contexts, it's important the empty string be represented as
''
or""
. For instance, if it's to be used insh -c "cmd $quoted_output"
it matters if we want what was quoted to be passed as one argument tocmd
. Insh -c "var=$quoted_output; ..."
, it doesn't matter whether the empty string is represented as''
,""
or as the empty string.The
$var:q
operator ofzsh
represents the empty string as the empty string, not''
,""
nor$''
.The
${var@Q}
operator ofbash
(itself copied frommksh
which behaves differently in this regard), represents an empty$var
as''
, but an unset$var
as the empty string:$ empty_var= bash -c 'printf "<%s>\n" "${empty_var@Q}" "${unset_var@Q}"' <''> <> $ empty_var= mksh -c 'printf "<%s>\n" "${empty_var@Q}" "${unset_var@Q}"' <''> <''> $ empty_var= zsh -c 'printf "<%s>\n" "${empty_var:q}" "${unset_var:q}"' <> <>
-
some of those quoting operators will use a combination of
'...'
,\
,"..."
or$'...'
. The syntax of the latter varies between shells and between versions of a given shell. So for those operators that do use it or can use it depending on the input, it's important that the result be used in the same shell (and same version thereof). That applies at least to:- the
printf %q
of GNUprintf
,bash
,ksh93
,zsh
-
zsh
's$var:q
,${(q)var}
,${(q+)var}
,${(qqqq)var}
, -
mksh
's${var@Q}
-
bash
's${var@Q}
, - the
typeset
/declare
/export -p
output ofksh93
,mksh
,zsh
- the
alias
/set
output ofbash
,ksh93
,mksh
,zsh
- the
xtrace
output ofksh93
,mksh
,zsh
In any case
$'...'
is not (yet¹) a standardsh
quoting operator, and beware that non-Bourne-like shells such asrc
,es
,akanga
,fish
have completely different quoting syntax. There is simply no way to quote a string in a way that is compatible with every shell in existence (though see this other Q&A for some ways to work around it). - the
-
some shells decode their input as characters before interpreting the code in it, some don't, and some do it sometimes, and sometimes not.
Some shells (like
bash
) also make their syntax conditional on the locale. For instance, token delimiters in the syntax are the characters considered as blanks in the locale inyash
andbash
(though inbash
, that only works for single-byte ones). Some shells also rely on the locale's character classification to decide what characters are valid in a variable name. So for instanceStéphane=1
could be interpreted as an assignment in one locale, or as the invocation of theStéphane=1
command in another.The sequence of bytes 0xa3 0x5c represents the
£\
string in the ISO-8859-1 (aka latin1) character set, theα
character in BIG5, or an invalid sequence of bytes in UTF-8.\
happens to be a special character in the shell syntax, including within"..."
and$'...'
.`
is also a (dangerous) character whose encoding can be found in the encoding of other characters in some locales.Byte
0xa0
is the non-breaking-space character in a great number of single-byte character sets and that character is considered as blank in some locales on some systems, and as such as a token delimiter in the syntax ofbash
oryash
there.That byte is also found in the UTF-8 encoding of thousands of characters including many alphabetical ones (like
à
, encoded as 0xc3 0xa0).I'm not aware of any charset in use in any locale of any ASCII-based systems that have characters whose encoding contains the encoding of
'
though.Some shell quoting operators output
$'\u00e9'
or$'\u[e9]'
for theé
character for instance. And that in turn, when used, depending on the shell, and the locale at the time of interpreting or running the code that uses it will be expanded to its UTF-8 encoding or in the locale's encoding (with variation in behaviour if the locale doesn't have that character).So, it's not only important that the resulting string be used in the same shell and shell version, but also that it be used in the same locale (at least for those shells that do some character encoding/decoding). And even then, several shells (including
bash
) have or have had bugs in that regard.Any quoting operator that uses
$'...'
,"..."
, or backslash for quoting or that leaves some non-ASCII characters unquoted is potentially unsafe.Or in other words, only the ones that use
'...'
are safe in that regard. That leaves:-
zsh
's${(qq)var}
operator - The
alias
output ofdash
/bash
,bosh
(at least current versions). - The
export -p
ofdash
/bosh
(at least current versions). - the
set
output ofdash
(at least current versions).
Though of those only the first is documented and committed to always use single quotes (though note the caveat about
rcquotes
below).Also note that
yash
can't cope with data that can't be decoded in the locale's charset, so there's no way to pass arbitrary data to that shell (at least in the current version).Ironically, the output of the
locale
utility has the problem (as it's required to use"..."
to output implied settings), and it's typically intended to be used to input code in a locale that is different from that wherelocale
was invoked (to restore the locale). -
-
The NUL character (0 byte) cannot occur in an environment variable or in arguments of a command that is executed by way of the
execve()
system call (that's a limitation of that system call that takes those env and arguments strings as C-style NUL-delimited strings). Except inzsh
, NUL cannot be found in shell variables or builtin arguments or more generally shell code either.A 0 byte however can be read and written alright from/to a file or pipe or any I/O mechanism.
In
zsh
it can be stored in a variable, read and written, passed as argument to builtins like in any modern programming language (such aspython
orperl
).But bear in mind that if you quote a NUL with any method that leaves it as-is (as opposed to
$'\0'
,$'\x0'
,$'\u0000'
,$'\C@'
for instance), regardless of how it is quoted, the result cannot be passed in an argument or env var to an executed command, and no other shell will be able to make use of that NUL character.That's possibly to bear in mind if you take external input in
zsh
, as inIFS= read -r var
. If a NUL byte is included in that line read from stdin,$var
and${(qq)var}
will contain it which may restrict what you can do with it.That's one case where using the
$'...'
form of quoting can be preferable (if the other caveats associated with that form of quoting (see above) can be addressed). -
If the resulting quoted text is to be used in shell code located inside backticks, beware that there's an extra layer of backslash interpretation. Always use
$(...)
in place of`...`
. -
Some characters are only special in some context. For instance
=
is special in the words that precede the command name (as ina=1 cmd arg
), but not after² (as incmd a=1
), though there are some special cases in some shells for commands likeexport
,readonly
...~
is special in some contexts and not others.Not all quoting operators will quote those.
Some characters are special in some shells but not in others, or only when some option is enabled...
Even digits are special in some contexts. For instance
sh -c "echo ${quoted_text}>file"
would not output the quoted text infile
, if2
was not quoted as'2'
for instance.
-
in
zsh
, thercquotes
option affects how single-quoted strings are interpreted (and generated by its quoting operators). When enabled, a single quote can be represented in a single-quoted string with''
like in therc
shell. For instance,"foo'bar"
can also be written'foo''bar'
.So it's important that the quoted string generated when
rcquotes
is enabled be only interpreted byzsh
instances that also havercquotes
enabled.A
${(qq)var}
produce by a zsh with or withoutrcquotes
should be safe to use inzsh -o rcquotes
, but notes that inzsh -o rcquotes
, concatenating single quoted strings would result in a single quote being inserted between them.$ quoted_text="'*'" $ zsh -o rcquotes -c "echo $quoted_text$quoted_text" *'*
same as:
$ rc -c "echo $quoted_text$quoted_text" *'*
You can work around it by inserting
""
in between the two:$ zsh -o rcquotes -c "echo $quoted_text\"\"$quoted_text" **
While in
rc
and derivatives (where"..."
is not a quoting operator,'...'
being the only kind of quotes, hence the need to be able to insert'
within them), you'd use^
:$ rc -c "echo $quoted_text^$quoted_text" **
In conclusion
The only quoting method that is safe (if we limit to Bourne-like shells and disregard yash
and `...`
or rogue locales, and assume the data doesn't contain NUL characters) is single quoting of everything (even the empty string, even characters you'd imagine never to be a problem), and represent the single quote character itself as \'
or "'"
outside of the single-quotes, as was the initial intent in your question.
To do that you can use:
-
zsh
's${(qq)var}
operator (or"${(qq@)array}"
for an array), assuming thercquotes
option is not enabled. -
a function like:
shquote() { LC_ALL=C awk -v q="'" ' BEGIN{ for (i=1; i<ARGC; i++) { gsub(q, q "\\" q q, ARGV[i]) printf "%s ", q ARGV[i] q } print "" }' "$@" }
or
shquote() { perl -le "print join ' ', map {q(') . s/'/'\\\\''/gr . q(')} @ARGV" -- "$@" }
-
ksh93
/zsh
/bash
/mksh
:quoted_text=\'${1//\'/\'\\\'\'}\'
(don't double-quote the expansion and don't use it outside of scalar variable assignments, or you'll run into compatibility problems between different versions of
bash
(see description ofcompat41
option))
¹ The POSIX specification of $'...'
was initially targetted for Issue 8 of the Single UNIX Specification, expected to be released in 2021 at the earliest, but it looks like it's not going to make it (consensus on a resolution was not reached in time). So, we'll probably have to wait at least another decade before $'...'
is added to the standard
² except when the -k
(keyword
) option of the Bourne shell and some of its derivatives is enabled
Solution 2
Bash has a parameter expansion option for exactly this case:
${parameter@Q}
The expansion is a string that is the value of parameter quoted in a format that can be reused as input.
So in this case:
foo_esc="${foo@Q}"
This is supported in Bash 4.4 and up. There are several options for other forms of expansion as well, and for specifically generating complete assignment statements (@A
).
Solution 3
Bash provides a printf
builtin with %q
format specifier, which performs shell escaping for you, even in older (<4.0) versions of Bash:
printf '[%q]\n' "Ne'er do well"
# Prints [Ne\'er\ do\ well]
printf '[%q]\n' 'Sneaky injection $( whoami ) `ls /root`'
# Prints [Sneaky\ injection\ \$\(\ whoami\ \)\ \`ls\ /root\`]
This trick can also be used to return arrays of data from a function:
function getData()
{
printf '%q ' "He'll say hi" 'or `whoami`' 'and then $( byebye )'
}
declare -a DATA="( $( getData ) )"
printf 'DATA: [%q]\n' "${DATA[@]}"
# Prints:
# DATA: [He\'ll\ say\ hi]
# DATA: [or\ \`whoami\`]
# DATA: [and\ then\ \$\(\ byebye\ \)]
Note that the Bash printf
builtin is different than the printf
utility which comes bundled with most Unix-like operating systems. If, for some reason, the printf
command invokes the utility instead of the builtin, you can always execute builtin printf
instead.
Solution 4
I guess I didn't RTFM. It can be done like so:
q_mid=\'\\\'\'
foo_esc="'${foo//\'/$q_mid}'"
Then echo "$foo_esc"
gives the expected 'bar'\''baz'
How I'm actually using it is with a function:
function esc_var {
local mid_q=\'\\\'\'
printf '%s' "'${1//\'/$mid_q}'"
}
...
foo_esc="`esc_var "$foo"`"
Modifying this to use the printf
built-in from Dejay's solution:
function esc_vars {
printf ' %q' "$@" | cut -b 2-
}
To heed Stéphane's warnings about incompatibilities between different versions of bash, regarding single quotes inside double-quoted expansions, the bullet-proof function becomes:
esc_vars() {
local fmt
fmt='%s'
local v
while [ $# -gt 0 ]; do
v=\'${1//\'/\'\\\'\'}\'
printf "$fmt" "$v"
fmt=' %s'
shift
done
}
Solution 5
There are several solutions to quote a var value:
-
alias
In most shells (where alias is available)(except csh, tcsh and probably others csh like):$ alias qux=bar\'baz $ alias qux qux='bar'\''baz'
Yes, this works in many
sh
-like shells like dash or ash. -
set
Also in most shells (again, not csh):$ qux=bar\'baz $ set | grep '^qux=' qux='bar'\''baz'
-
typeset
In some shells (ksh, bash and zsh at least):$ qux=bar\'baz $ typeset -p qux typeset qux='bar'\''baz' # this is zsh, quoting style may # be different for other shells.
-
export
First do:export qux=bar\'baz
Then use:
kshexport -p | grep 'qux='
bashexport -p | grep 'qux='
zshexport -p qux
quote
bashecho "${qux@Q}"
zshecho "${(qq)qux}"
# from one to four q's may be used.
Related videos on Youtube
Walf
Updated on September 18, 2022Comments
-
Walf over 1 year
This question is not about how to write a properly escaped string literal. I couldn't find any related question that isn't about how to escape variables for direct consumption within a script or by other programs.
My goal is to enable a script to generate other scripts. This is because the tasks in the generated scripts will run anywhere from 0 to n times on another machine, and the data from which they are generated may change before they're run (again), so doing the operations directly, over a network will not work.
Given a known variable that may contain special characters such as single quotes, I need to write that out as a fully escaped string literal, e.g. a variable
foo
containingbar'baz
should appear in the generated script as:qux='bar'\''baz'
which would be written by appending
"qux=$foo_esc"
to the other lines of script. I did it using Perl like this:foo_esc="'`perl -pe 's/('\'')/\\1\\\\\\1\\1/g' <<<"$foo"`'"
but this seems like overkill.
I have had no success in doing it with bash alone. I have tried many variations of these:
foo_esc="'${file//\'/\'\\\'\'}'" foo_esc="'${file//\'/'\\''}'"
but either extra slashes appear in the output (when I do
echo "$foo"
), or they cause a syntax error (expecting further input if done from the shell). -
Walf almost 7 yearsNeat, but only have 4.2 which gives
bad substitution
. -
JdeBP almost 7 yearsThe Z shell equivalent is
"${foo:q}"
. -
Walf over 5 yearsI'm not sure how that helps if what I'd need printed would be
'Ne'\''er do well'
, etc., i.e. quotes included in the output. -
Dejay Clayton over 5 years@Walf I think you're not understanding that the two forms are equivalent, and both are perfectly as safe as each other. E.g.
[[ 'Ne'\''er do well' == Ne\'er\ do\ well ]] && echo 'equivalent!'
will echoequivalent!
-
Walf over 5 yearsI did miss that :P however I prefer the quoted form as it's easier to read in a syntax-highlighting viewer/editor.
-
Dejay Clayton over 5 years@Walf it seems like your approach is pretty dangerous, considering that in your example Perl, passing a value like
'hello'
results in the incorrect value''\''hello''
, which has an unnecessary leading empty string (the first two single quotes), and an inappropriate trailing single quote. -
Walf over 5 yearsNot dangerous, produces correct (if slightly more verbose) output of
''\''hello'\'''
, I just forgot theg
flag in my example, and I'm not using Perl. -
Dejay Clayton over 5 years@Walf but it is dangerous, try passing your function this input:
$'hello\nthere'
. It will end badly for your script if end-user values ever make it to your script, including derived end-user values, such as those that might be contained in source code repositories, etc. -
Walf over 5 yearsI don't get what you mean, how would a raw dollar sign begin an assignment? I get filenames piped from scripts, so the input is escaped already
(foo=\$\'hello\\nthere\'; mid_q=\'\\\'\'; foo_esc="$(printf '%s' "'${foo//\'/$mid_q}'" <<<"$foo")"; echo "$foo_esc")
prints'$'\''hello\nthere'\'''
and I got the same result from the perl version. -
Walf over 5 years
(mid_q=\'\\\'\'; while IFS= read -r -d '' file; do printf '%s' "'${file//\'/$mid_q}'"; done < <(find . -maxdepth 1 -type f -print0))
worked perfectly even on your test input (except my filename had an actual newline in it). -
Dejay Clayton over 5 years@Walf, for clarification,
$'escape-these-chars'
is the ANSI-C quoting feature of Bash that causes all characters within the specified string to be escaped. Thus, to easily create a string literal that contains a newline within the filename (e.g.$'first-line\nsecond-line')
, use\n
within this construct. -
Dejay Clayton over 5 years@Walf your approach would probably work fine as long as you don't have secondary scripts parsing your first scripts line-by-line.
-
Walf over 5 yearsI'm not parsing, I'm generating. I wouldn't be able to write a BASH parser in BASH.
-
Steven Shaw almost 5 years@JdeBP that Z shell equivalent doesn't work. Any other ideas for zsh?
-
Steven Shaw almost 5 yearsI found the answer: "${(@qq)foo}"
-
jw013 over 4 yearsThe alias approach is clever and seems like it is specified by POSIX. For maximum portability I think this is the way to go. I believe the suggestions involving
grep
withexport
orset
may break on variables containing embedded newlines. -
benathon almost 4 yearsThe function should be esc_var and not esc_vars I know it's a small edit, and of course I would edit it myself, but I can't do single character edits
-
Walf almost 4 years@portforwardpodcast The plural on the latter version is deliberate because the
"$@"
is used to expand and escape all arguments passed, unlike the former which only escapes the first argument and drops any others. Your comment did prompt be to check it and it was not separating them properly, so thanks. -
vijay over 3 yearsIn
perl
, I use$q = "\x27";
to avoid the double-quote/single-quote/backslash construct you used. -
Stéphane Chazelas over 3 years@LL3. Thanks. I've added a note about it. Note that that option has been described as "obsolete" in the Korn shell for at least the past 25 years and not included in the POSIX specification of
sh
. -
Stéphane Chazelas over 3 yearsNote that
${var@Q}
was actually copied from mksh, and is not among the safest to use. See my answer for details. -
AdminBee over 3 yearsPerhaps you could add an example on how to apply it to the OPs code (your answer ended up in my "low quality review" queue because it was so short ...)
-
Kusalananda over 3 yearsThis seems irrelevant to the question at hand, which does not mention PHP at all.
-
seyed over 2 yearsFor arrays like
$@
you can use this${*@Q}