Is it dangerous to run echo without quotes?

986

Solution 1

For the specific case

echo run after_bundle

quoting is not needed. No quoting is needed because the argument to echo are static strings that contain no variable expansions or command substitutions etc. They are "just two words" (and as Stéphane points out, they are additionally constructed out of the portable character set).

The "danger" comes when you deal with variable data that the shell may expand or interpret. In such cases, care must be taken that the shell does the correct thing and that the result is what's intended.

The following two questions contain relevant information about that:


echo is sometimes used to "protect" potentially harmful commands in answers on this site. For example, I may show how to remove files or move files to a new destination using

echo rm "${name##*/}.txt"

or

echo mv "$name" "/new_dir/$newname"

This would output commands on the terminal instead of actually removing or renaming files. The user could then inspect the commands, decide that they look ok, remove the echo and run again.

Your command echo run after_bundle may be an instruction to the user, or it may be a "commented out" piece of code that is too dangerous to run without knowing the consequences.

Using echo like this, one has to know what the modified command does and one must guarantee that the modified command actually is safe (it would potentially not be if it contained redirections, and using it on a pipeline doesn't work, etc.)

Solution 2

Just an extra note on top of @Kusalananda's fine answer.

echo run after_bundle

is fine because none of the characters in those 3 arguments¹ passed to echo contain characters that are special to the shell.

And (the extra point I want to make here) there's no system locale where those bytes could translate to characters that are special to the shell.

All those characters are in what POSIX calls the portable character set. Those characters should be present and encoded the same in all character sets on a POSIX system².

So that command line will be interpreted the same regardless of the locale.

Now, if we start using characters outside of that portable character set, it's a good idea to quote them even if they are not special to the shell, because in another locale, the bytes that constitute them may be interpreted as different characters that could become special to the shell. Note that it's whether you're using echo or any other command, the problem is not with echo but with how the shell parses its code.

For instance in a UTF-8:

echo voilà | iconv -f UTF-8 -t //TRANSLIT

That à is encoded as 0xc3 0xa0. Now, if you have that line of code in a shell script and the shell script is invoked by a user who uses a locale whose charset is not UTF-8, those two bytes could make very different characters.

For instance, in a fr_FR.ISO8859-15 locale, a typical French locale using the standard single-byte charset that covers the French language (the same used for most western European languages including English), that 0xc3 byte is interpreted as the à character and 0xa0 as the non-breaking space character.

And on a few systems like NetBSD³, that non-breaking space is considered as a blank character (isblank() on it returns true, it is matched by [[:blank:]]) and shells like bash therefore treat it as a token delimiter in their syntax.

That means that instead of running echo with $'voil\xc3\xa0' as argument, they run it with $'voil\xc3' as argument, which means it won't print voilà correctly.

It gets a lot worse with Chinese character sets like BIG5, BIG5-HKSCS, GB18030, GBK which have many characters whose encoding contains the same encoding as the |, `, \ (to name the worst) (also that ludicrous SJIS, aka Microsoft Kanji, except that it's ¥ instead of \, but still treated as \ by most tools as it's encoded as 0x5c there).

For instance, if in a zh_CN.gb18030 Chinese locale, you write a script like:

echo 詜 reboot

That script will output 詜 reboot in a locale using GB18030 or GBK, 唰 reboot in a locale using BIG5 or BIG5-HKSCS, but in a C locale using ASCII or a locale using ISO8859-15 or UTF-8, will cause reboot to be run because the GB18030 encoding of is 0xd4 0x7c and 0x7c is the encoding of | in ASCII so we end up running:

 echo �| reboot

(that � representing however the 0xd4 byte is rendered in the locale). Example using the less harmful uname instead of reboot:

$ echo $'echo \u8a5c uname' | iconv -t gb18030 > myscript
$ LC_ALL=zh_CN.gb18030 bash ./myscript | sed -n l
\324| uname$
$ LC_ALL=C bash ./myscript | sed -n l
Linux$

(uname was run).

So my advise would be to quote all strings that contain characters outside of the portable character set.

However note that since the encoding of \ and ` are found in the encoding of some of those characters, it's better not to use \ or "..." or $'...' (inside which ` and/or \ are still special), but the '...' instead to quote characters outside the portable character set.

I'm not aware of any system that has a locale where the charset has any character (other than ' itself of course) whose encoding contains the encoding of ', so those '...' should definitely be the safest.

Note that several shells also support a $'\uXXXX' notation to express characters based on their Unicode code point. In shells like zsh and bash, the character is inserted encoded in the locale's charset (though can cause unexpected behaviours if that charset doesn't have that character). That lets you avoid to insert non-ASCII characters in your shell code.

So above:

echo 'voilà' | iconv -f UTF-8 -t //TRANSLIT
echo '詜 reboot'

Or:

echo $'voil\u00e0'
echo $'\u8a5c reboot'

(with the caveat the it could break the script when run in locales that don't have those characters).

Or better, since \ is also special to echo (or at least some echo implementations, at least the Unix compliant ones):

printf '%s\n' 'voilà' | iconv -f UTF-8 -t //TRANSLIT
printf '%s\n' '詜 reboot'

(note that \ is also special in the first argument to printf, so non-ASCII characters are also better avoided there in case they may contain the encoding of \).

Note that you could also do:

'echo' 'voilà' | 'iconv' '-f' 'UTF-8' '-t' '//TRANSLIT'

(that would be overkill but could give you some peace of mind if you're not sure which characters are in the portable character set)

Also make sure never to use the ancient `...` form of command substitution (which introduces another level of backslash processing), but use $(...) instead.


¹ technically, echo is also passed as argument to the echo utility (to tell it how it was invoked), it's the argv[0] and argc is 3, though in most shells nowadays echo is builtin, so that exec() of a /bin/echo file with a list of 3 arguments is simulated by the shell. It's also common to consider the list of arguments as starting with the second one (argv[1] to argv[argc - 1]) as that's the ones the commands mainly act upon.

² a notable exception to that being the ludicrous ja_JP.SJIS locale of FreeBSD systems whose charset has no \ nor ~ character!

³ note that while many systems (FreeBSD, Solaris, not GNU ones though) consider U+00A0 as a [[:blank:]] in UTF-8 locales, few do in other locales like those using ISO8859-15, possibly to avoid this kind of issue.

Share:
986

Related videos on Youtube

Jacob
Author by

Jacob

Updated on September 18, 2022

Comments

  • Jacob
    Jacob almost 2 years

    This an attempt to the test the usage of the VIRTUAL column with a function to increment the value in a column.

    I am using a function which would return the last two digits of the current year, concatenated with a hyphen followed by the next max value from the table column which is defined as a virtual column.

    When I insert records into the table, it does insert successfully. However, when I am querying records, I am getting the below error

    ORA-00036: maximum number of recursive SQL levels (50) exceeded

    My question is, does it possible to achieve to increment the values (custom increment with the using a VIRTUAL column or this attempt is trivial?

    The below function is compiled first by uncommenting the commented part and upon the creation of the table, the first SQL block is commented and I use the second SQL block

    Function

    CREATE OR REPLACE FUNCTION test_func (
       p_empl_id    NUMBER,
       empl_nm      VARCHAR2)
       RETURN VARCHAR2
       DETERMINISTIC
    IS
       return_value  VARCHAR2(32);
    BEGIN
       return_value := NULL;
    
    --    SELECT    TO_CHAR (SYSDATE, 'YY')
    --          || '-'
    --          || LPAD (
    --                TO_CHAR (NVL (MAX (TO_NUMBER (SUBSTR (001, 5))), 0) + 1),
    --                5,
    --                '0') into return_value
    --     FROM dual;    
    
       SELECT    TO_CHAR (SYSDATE, 'YY')
              || '-'
              || LPAD (
                    TO_CHAR (NVL (MAX (TO_NUMBER (SUBSTR (test_col, 5))), 0) + 1),
                    5,
                    '0')
         INTO return_value
         FROM test_table
        WHERE SUBSTR (test_col, 1, 2) = TO_CHAR (SYSDATE, 'YY');
    
       RETURN return_value;
    END;
    /
    

    Table Structure

    CREATE TABLE test_table
    (
       empl_id       NUMBER,
       empl_nm       VARCHAR2 (50),
       monthly_sal   NUMBER (10, 2),
       bonus         NUMBER (10, 2),
       test_col      AS (test_func (empl_id, empl_nm)) VIRTUAL
    );
    

    Insert Statement

    INSERT INTO test_table (empl_id,
                            empl_nm,
                            monthly_sal,
                            bonus)
       WITH data
            AS (SELECT 100 empl_id,
                       'AAA' empl_nm,
                       20000 monthly_sal,
                       3000 bonus
                  FROM DUAL)
       SELECT *
         FROM data;
    

    I have tried using the below SQL using a sequence, however, the sequence value is getting inserted every time I execute a SQL statement from the table

    SELECT    TO_CHAR (SYSDATE, 'YY')
           || '-'
           || '000'
           || test_virtual_sequence.NEXTVAL
      FROM DUAL;
    
    • krokodilko
      krokodilko about 7 years
      Is this only a test exercise, or do you want to use this function in a production code ? If the latter is true, then it is a very bad idea to use such a function, and please explain what is your real requirement . ?
    • Jacob
      Jacob about 7 years
      @krokodilko By all means, I do not intend to use this in production, as mentioned in the question, this is certainly a test case to know the usages of virtual column.
    • Matt Parkins
      Matt Parkins about 5 years
      I ran into this when I had: target="***LIVE SERVER***"; echo target: $target; and the *** expanded into a folder listing...😬
  • Jacob
    Jacob about 7 years
    Stupendous. I do have a question, the primary key in my original table is not by sequence, which means it doesn't follow the numeric sequence, at times it jumps the sequence. The users would like to see an incremental numeric sequence based on records. So are there a way to get the max + 1 instead of using substr of primary_key (serial_no)?
  • Jacob
    Jacob about 7 years
    The code_control table can a generic one with an additional column if we need to use for multiple tables?
  • APC
    APC about 7 years
    Of course. You would need to pass an additional parameter to get_next_number() or - safer - have a separate version of that function for each table. But you should benchmark with realistic loads and make sure the table has sufficient Interested Transaction slots (INITRANS, MAXTRANS) to cope with concurrent demands.
  • Jacob
    Jacob about 7 years
    Sigh, I would prefer different version as this looks more complicated. Thanks a lot for the wonderful answer, insight and for the link to your SO answer. Much appreciated.
  • Ferrybig
    Ferrybig about 6 years
    In your first paragraph, you tell us "... of the characters in those 3 arguments passed to echo...", I only count 2 arguments being passed to the command echo, the arguments I can count are run and after_bundle, care to explain how you counted and got to 3 arguments?
  • Stéphane Chazelas
    Stéphane Chazelas about 6 years
    @ViktorFonic, see edit about the number of arguments (and that the main problem is not with echo). See (exec -a foo /bin/echo --help) on a GNU system and with the GNU shell for how to pass an arbitrary first argument to the /bin/echo utility.
  • Charles Duffy
    Charles Duffy about 6 years
    Adding quotes isn't sufficient to know what a shell would do, however -- just as you can't tell that echo rm "first file.txt" "second file.txt" is in any way different from echo rm "first" "file.txt" "second" "file.txt", the output from both being the same. If you want to generate a shell command as output, one must use printf '%q ' rm "first file.txt" "second file.txt"; echo or something equivalent that re-generates syntactic quoting that evaluates to the argv passed.
  • Kusalananda
    Kusalananda about 6 years
    @CharlesDuffy I really hope nobody copy-paste debugging output and runs it in the shell!
  • Sergiy Kolodyazhnyy
    Sergiy Kolodyazhnyy about 6 years
    @Ferrybig See Stephane's edit, footnote 1. Arguments to command in the usual C style are array of arguments, with argv[0] being executable name itself. Kinda similar to $0 and positional parameters in shells.
  • Charles Duffy
    Charles Duffy about 6 years
    Generating shell commands and then piping them to sh is not exactly an uncommon pattern, and seeing people ask "why does foo work when I run it on a command line, but this script that emits that exact string with echo in front of the line doesn't?" happens all the time here. More to the point, debugging output isn't helpful if it hides your bugs -- and if your bugs are related to quoting, then echo won't reveal them.
  • done
    done about 6 years
    There are 373 encoding in iconv in which ESC is converted to '. Try (as an example) : printf '\x1b'|iconv -f utf8 -t IBM-937|xxd
  • done
    done about 6 years
    There are 173 encoding in which some codepoint (other than ESC) is converted to a '. Try printf '\u2804' | iconv -f utf8 -t BRF | xxd. There are encodings in which there are a lot of codepoints that become '. Around 8695 codepoints in UCS-4 become '. Try printf '\U627' | iconv -cf utf-8 -t UCS-4. Several (37) encoding convert the character 0x127 to a '. Try printf '\U127' | iconv -cf utf8 -t UCS2 |xxd
  • Stéphane Chazelas
    Stéphane Chazelas about 6 years
    @isaac, you won't find a POSIX system where the C locale uses ASCII (with ESC as 0x1b) and that has system locales with such charsets as IBM-937 as the encoding of the portable character set is different between ASCII and them. UCS-4, UTF16 cannot be used as a locale charset, they are not compatible with the POSIX API as they have characters which contain 0 bytes.
  • done
    done about 6 years
    @StéphaneChazelas Are you saying that a file could not be exchanged between systems? That was the original claim: One system stores a file, some other opens the file misinterpreting some characters.
  • Stéphane Chazelas
    Stéphane Chazelas about 6 years
    @isaac, I'm saying that unless you quote non-ASCII characters with single-quotes, the code can be interpreted differently depending on the locale the user is using on the same system (using ASCII as the C charset. In practice, AFAIK, 99.999% of POSIX systems use ASCII and most of the rest some variant of EBCDIC). (and if you don't do that, that can cause security vulnerabilities, think for instance if it's a script used in a ssh ForcedCommand as in git server deployments, when sshd accepts the LC_* variables and the system has those locales with Chinese charsets)