Is it dangerous to run echo without quotes?
Solution 1
For the specific case
echo run after_bundle
quoting is not needed. No quoting is needed because the argument to echo
are static strings that contain no variable expansions or command substitutions etc. They are "just two words" (and as Stéphane points out, they are additionally constructed out of the portable character set).
The "danger" comes when you deal with variable data that the shell may expand or interpret. In such cases, care must be taken that the shell does the correct thing and that the result is what's intended.
The following two questions contain relevant information about that:
- Why is printf better than echo?
- Security implications of forgetting to quote a variable in bash/POSIX shells
echo
is sometimes used to "protect" potentially harmful commands in answers on this site. For example, I may show how to remove files or move files to a new destination using
echo rm "${name##*/}.txt"
or
echo mv "$name" "/new_dir/$newname"
This would output commands on the terminal instead of actually removing or renaming files. The user could then inspect the commands, decide that they look ok, remove the echo
and run again.
Your command echo run after_bundle
may be an instruction to the user, or it may be a "commented out" piece of code that is too dangerous to run without knowing the consequences.
Using echo
like this, one has to know what the modified command does and one must guarantee that the modified command actually is safe (it would potentially not be if it contained redirections, and using it on a pipeline doesn't work, etc.)
Solution 2
Just an extra note on top of @Kusalananda's fine answer.
echo run after_bundle
is fine because none of the characters in those 3 arguments¹ passed to echo
contain characters that are special to the shell.
And (the extra point I want to make here) there's no system locale where those bytes could translate to characters that are special to the shell.
All those characters are in what POSIX calls the portable character set. Those characters should be present and encoded the same in all character sets on a POSIX system².
So that command line will be interpreted the same regardless of the locale.
Now, if we start using characters outside of that portable character set, it's a good idea to quote them even if they are not special to the shell, because in another locale, the bytes that constitute them may be interpreted as different characters that could become special to the shell. Note that it's whether you're using echo
or any other command, the problem is not with echo
but with how the shell parses its code.
For instance in a UTF-8:
echo voilà | iconv -f UTF-8 -t //TRANSLIT
That à
is encoded as 0xc3 0xa0. Now, if you have that line of code in a shell script and the shell script is invoked by a user who uses a locale whose charset is not UTF-8, those two bytes could make very different characters.
For instance, in a fr_FR.ISO8859-15
locale, a typical French locale using the standard single-byte charset that covers the French language (the same used for most western European languages including English), that 0xc3 byte is interpreted as the Ã
character and 0xa0 as the non-breaking space character.
And on a few systems like NetBSD³, that non-breaking space is considered as a blank character (isblank()
on it returns true, it is matched by [[:blank:]]
) and shells like bash
therefore treat it as a token delimiter in their syntax.
That means that instead of running echo
with $'voil\xc3\xa0'
as argument, they run it with $'voil\xc3'
as argument, which means it won't print voilà
correctly.
It gets a lot worse with Chinese character sets like BIG5, BIG5-HKSCS, GB18030, GBK which have many characters whose encoding contains the same encoding as the |
, `
, \
(to name the worst) (also that ludicrous SJIS, aka Microsoft Kanji, except that it's ¥
instead of \
, but still treated as \
by most tools as it's encoded as 0x5c there).
For instance, if in a zh_CN.gb18030
Chinese locale, you write a script like:
echo 詜 reboot
That script will output 詜 reboot
in a locale using GB18030 or GBK, 唰 reboot
in a locale using BIG5 or BIG5-HKSCS, but in a C locale using ASCII or a locale using ISO8859-15 or UTF-8, will cause reboot
to be run because the GB18030 encoding of 詜
is 0xd4 0x7c and 0x7c is the encoding of |
in ASCII so we end up running:
echo �| reboot
(that � representing however the 0xd4 byte is rendered in the locale). Example using the less harmful uname
instead of reboot
:
$ echo $'echo \u8a5c uname' | iconv -t gb18030 > myscript
$ LC_ALL=zh_CN.gb18030 bash ./myscript | sed -n l
\324| uname$
$ LC_ALL=C bash ./myscript | sed -n l
Linux$
(uname
was run).
So my advise would be to quote all strings that contain characters outside of the portable character set.
However note that since the encoding of \
and `
are found in the encoding of some of those characters, it's better not to use \
or "..."
or $'...'
(inside which `
and/or \
are still special), but the '...'
instead to quote characters outside the portable character set.
I'm not aware of any system that has a locale where the charset has any character (other than '
itself of course) whose encoding contains the encoding of '
, so those '...'
should definitely be the safest.
Note that several shells also support a $'\uXXXX'
notation to express characters based on their Unicode code point. In shells like zsh
and bash
, the character is inserted encoded in the locale's charset (though can cause unexpected behaviours if that charset doesn't have that character). That lets you avoid to insert non-ASCII characters in your shell code.
So above:
echo 'voilà' | iconv -f UTF-8 -t //TRANSLIT
echo '詜 reboot'
Or:
echo $'voil\u00e0'
echo $'\u8a5c reboot'
(with the caveat the it could break the script when run in locales that don't have those characters).
Or better, since \
is also special to echo
(or at least some echo
implementations, at least the Unix compliant ones):
printf '%s\n' 'voilà' | iconv -f UTF-8 -t //TRANSLIT
printf '%s\n' '詜 reboot'
(note that \
is also special in the first argument to printf
, so non-ASCII characters are also better avoided there in case they may contain the encoding of \
).
Note that you could also do:
'echo' 'voilà' | 'iconv' '-f' 'UTF-8' '-t' '//TRANSLIT'
(that would be overkill but could give you some peace of mind if you're not sure which characters are in the portable character set)
Also make sure never to use the ancient `...`
form of command substitution (which introduces another level of backslash processing), but use $(...)
instead.
¹ technically, echo
is also passed as argument to the echo
utility (to tell it how it was invoked), it's the argv[0]
and argc
is 3, though in most shells nowadays echo
is builtin, so that exec()
of a /bin/echo
file with a list of 3 arguments is simulated by the shell. It's also common to consider the list of arguments as starting with the second one (argv[1]
to argv[argc - 1]
) as that's the ones the commands mainly act upon.
² a notable exception to that being the ludicrous ja_JP.SJIS
locale of FreeBSD systems whose charset has no \
nor ~
character!
³ note that while many systems (FreeBSD, Solaris, not GNU ones though) consider U+00A0 as a [[:blank:]]
in UTF-8 locales, few do in other locales like those using ISO8859-15, possibly to avoid this kind of issue.
Related videos on Youtube
![Jacob](https://i.stack.imgur.com/HeGwj.jpg?s=256&g=1)
Jacob
Updated on September 18, 2022Comments
-
Jacob almost 2 years
This an attempt to the test the usage of the VIRTUAL column with a function to increment the value in a column.
I am using a function which would return the last two digits of the current year, concatenated with a hyphen followed by the next max value from the table column which is defined as a virtual column.
When I insert records into the table, it does insert successfully. However, when I am querying records, I am getting the below error
ORA-00036: maximum number of recursive SQL levels (50) exceeded
My question is, does it possible to achieve to increment the values (custom increment with the using a VIRTUAL column or this attempt is trivial?
The below function is compiled first by uncommenting the commented part and upon the creation of the table, the first SQL block is commented and I use the second SQL block
Function
CREATE OR REPLACE FUNCTION test_func ( p_empl_id NUMBER, empl_nm VARCHAR2) RETURN VARCHAR2 DETERMINISTIC IS return_value VARCHAR2(32); BEGIN return_value := NULL; -- SELECT TO_CHAR (SYSDATE, 'YY') -- || '-' -- || LPAD ( -- TO_CHAR (NVL (MAX (TO_NUMBER (SUBSTR (001, 5))), 0) + 1), -- 5, -- '0') into return_value -- FROM dual; SELECT TO_CHAR (SYSDATE, 'YY') || '-' || LPAD ( TO_CHAR (NVL (MAX (TO_NUMBER (SUBSTR (test_col, 5))), 0) + 1), 5, '0') INTO return_value FROM test_table WHERE SUBSTR (test_col, 1, 2) = TO_CHAR (SYSDATE, 'YY'); RETURN return_value; END; /
Table Structure
CREATE TABLE test_table ( empl_id NUMBER, empl_nm VARCHAR2 (50), monthly_sal NUMBER (10, 2), bonus NUMBER (10, 2), test_col AS (test_func (empl_id, empl_nm)) VIRTUAL );
Insert Statement
INSERT INTO test_table (empl_id, empl_nm, monthly_sal, bonus) WITH data AS (SELECT 100 empl_id, 'AAA' empl_nm, 20000 monthly_sal, 3000 bonus FROM DUAL) SELECT * FROM data;
I have tried using the below SQL using a sequence, however, the sequence value is getting inserted every time I execute a SQL statement from the table
SELECT TO_CHAR (SYSDATE, 'YY') || '-' || '000' || test_virtual_sequence.NEXTVAL FROM DUAL;
-
krokodilko about 7 yearsIs this only a test exercise, or do you want to use this function in a production code ? If the latter is true, then it is a very bad idea to use such a function, and please explain what is your real requirement . ?
-
Jacob about 7 years@krokodilko By all means, I do not intend to use this in production, as mentioned in the question, this is certainly a test case to know the usages of
virtual
column. -
Matt Parkins about 5 yearsI ran into this when I had: target="***LIVE SERVER***"; echo target: $target; and the *** expanded into a folder listing...😬
-
-
Jacob about 7 yearsStupendous. I do have a question, the primary key in my original table is not by sequence, which means it doesn't follow the numeric sequence, at times it jumps the sequence. The users would like to see an incremental numeric sequence based on records. So are there a way to get the max + 1 instead of using
substr
of primary_key (serial_no)? -
Jacob about 7 yearsThe code_control table can a generic one with an additional column if we need to use for multiple tables?
-
APC about 7 yearsOf course. You would need to pass an additional parameter to
get_next_number()
or - safer - have a separate version of that function for each table. But you should benchmark with realistic loads and make sure the table has sufficient Interested Transaction slots (INITRANS, MAXTRANS) to cope with concurrent demands. -
Jacob about 7 yearsSigh, I would prefer different version as this looks more complicated. Thanks a lot for the wonderful answer, insight and for the link to your SO answer. Much appreciated.
-
Ferrybig about 6 yearsIn your first paragraph, you tell us "... of the characters in those 3 arguments passed to
echo
...", I only count 2 arguments being passed to the commandecho
, the arguments I can count arerun
andafter_bundle
, care to explain how you counted and got to 3 arguments? -
Stéphane Chazelas about 6 years@ViktorFonic, see edit about the number of arguments (and that the main problem is not with
echo
). See(exec -a foo /bin/echo --help)
on a GNU system and with the GNU shell for how to pass an arbitrary first argument to the/bin/echo
utility. -
Charles Duffy about 6 yearsAdding quotes isn't sufficient to know what a shell would do, however -- just as you can't tell that
echo rm "first file.txt" "second file.txt"
is in any way different fromecho rm "first" "file.txt" "second" "file.txt"
, the output from both being the same. If you want to generate a shell command as output, one must useprintf '%q ' rm "first file.txt" "second file.txt"; echo
or something equivalent that re-generates syntactic quoting that evaluates to theargv
passed. -
Kusalananda about 6 years@CharlesDuffy I really hope nobody copy-paste debugging output and runs it in the shell!
-
Sergiy Kolodyazhnyy about 6 years@Ferrybig See Stephane's edit, footnote 1. Arguments to command in the usual C style are array of arguments, with argv[0] being executable name itself. Kinda similar to
$0
and positional parameters in shells. -
Charles Duffy about 6 yearsGenerating shell commands and then piping them to
sh
is not exactly an uncommon pattern, and seeing people ask "why doesfoo
work when I run it on a command line, but this script that emits that exact string withecho
in front of the line doesn't?" happens all the time here. More to the point, debugging output isn't helpful if it hides your bugs -- and if your bugs are related to quoting, thenecho
won't reveal them. -
done about 6 yearsThere are 373 encoding in
iconv
in whichESC
is converted to'
. Try (as an example) :printf '\x1b'|iconv -f utf8 -t IBM-937|xxd
-
done about 6 yearsThere are 173 encoding in which some codepoint (other than ESC) is converted to a
'
. Tryprintf '\u2804' | iconv -f utf8 -t BRF | xxd
. There are encodings in which there are a lot of codepoints that become'
. Around 8695 codepoints in UCS-4 become'
. Tryprintf '\U627' | iconv -cf utf-8 -t UCS-4
. Several (37) encoding convert the character 0x127 to a'
. Tryprintf '\U127' | iconv -cf utf8 -t UCS2 |xxd
-
Stéphane Chazelas about 6 years@isaac, you won't find a POSIX system where the C locale uses ASCII (with ESC as 0x1b) and that has system locales with such charsets as IBM-937 as the encoding of the portable character set is different between ASCII and them. UCS-4, UTF16 cannot be used as a locale charset, they are not compatible with the POSIX API as they have characters which contain 0 bytes.
-
done about 6 years@StéphaneChazelas Are you saying that a file could not be exchanged between systems? That was the original claim: One system stores a file, some other opens the file misinterpreting some characters.
-
Stéphane Chazelas about 6 years@isaac, I'm saying that unless you quote non-ASCII characters with single-quotes, the code can be interpreted differently depending on the locale the user is using on the same system (using ASCII as the C charset. In practice, AFAIK, 99.999% of POSIX systems use ASCII and most of the rest some variant of EBCDIC). (and if you don't do that, that can cause security vulnerabilities, think for instance if it's a script used in a ssh ForcedCommand as in git server deployments, when sshd accepts the LC_* variables and the system has those locales with Chinese charsets)