Number of backslashes needed for escaping regex backslash on the command-line

11,379

Solution 1

For the unquoted example, each \\ pair passes one backslash to grep, so 4 backslashes pass two to grep, which translates to a single backslash. 6 backslashes pass three to grep, translating to one backslash and one \c, which is equal to c. One additional backslash does not change anything, because it is translated \c -> c by the shell. Eight backslashes in the shell are four in grep, translated to two, so this does not match anymore.

For the example in double quotes, note what follows your second quote from the bash manpage:

The backslash retains its special meaning only when followed by one of the following characters: $, `, ", \, or newline.

I.e. when you give an odd number of backslashes, the sequence ends in \c, which would be equal to c in the unquoted case, but when quoted, the backslash looses its special meaning, so \c is passed to grep. That is why the range of "possible" backslashes (i.e. those that make up a pattern matching your example file) slides down by one.

Solution 2

This link described bash Quotes and Escaping

Your question deals with the first three sections.

  • Per-character escaping
  • Weak quoting "double quotes"
  • Strong quoting 'single quotes'
  • ANSI C like string quoting
  • I18N/L10N quoting (Internationalization and Localization).

Below is a chart of how the strings as bash passes them on to grep and how grep further interprets them internally.

Lets first look at echo "#ab\\cd" > file.
In the weak-quoted ("") "#ab\\cd", the \\ is an escaped \ which is passed to file as a single literal \. So, file contains ab\cd

Now, to your commands: The chart below may help to see what actualy goes on with each call. The * shows the ones which match the file contents. It is really just a matter of applying bash's escape rules, as on the web page, with particular note to daniel kullmann`s answer where he refers to escaping behaviour in a weak-quoting situation.

The backslash retains its special meaning only when followed by one of the following characters: $, `, ", \, or newline.


                            bash passes    grep further
                            to grep        resolves to         
grep -E ab\cd file            abcd           abcd   
grep -E ab\\cd file           ab\cd          abcd  
grep -E ab\\\cd file          ab\cd          abcd
grep -E ab\\\\cd file         ab\\cd         ab\cd    * 
grep -E ab\\\\\cd file        ab\\\cd        ab\cd    *
grep -E ab\\\\\\cd file       ab\\\cd        ab\cd    *    
grep -E ab\\\\\\\cd file      ab\\\cd        ab\cd    *
grep -E ab\\\\\\\\cd file     ab\\\\cd       ab\\cd

grep -E "ab\cd" file          ab\cd          abcd
grep -E "ab\\cd" file         ab\cd          abcd
grep -E "ab\\\cd" file        ab\\cd         ab\cd    *
grep -E "ab\\\\cd" file       ab\\cd         ab\cd    *
grep -E "ab\\\\\cd" file      ab\\\cd        ab\cd    *
grep -E "ab\\\\\\cd" file     ab\\\cd        ab\cd    *
grep -E "ab\\\\\\\cd" file    ab\\\\cd       ab\\cd    

grep -E 'ab\cd' file          ab\cd          abcd  
grep -E 'ab\\cd' file         ab\\cd         ab\cd    *
grep -E 'ab\\\cd' file        ab\\\cd        ab\cd    *
grep -E 'ab\\\\cd' file       ab\\\\cd       ab\\cd
Share:
11,379

Related videos on Youtube

bnikhil
Author by

bnikhil

Freelance software developer, mostly using Java, JavaScript, and Python at the moment.

Updated on September 18, 2022

Comments

  • bnikhil
    bnikhil almost 2 years

    I recently had trouble with some regex on the command-line, and found that for matching a backslash, different numbers of characters can be used. This number depends on the quoting used for the regex (none, single quotes, double quotes). See the following bash session for what I mean:

    echo "#ab\\cd" > file
    grep -E ab\cd file
    grep -E ab\\cd file
    grep -E ab\\\cd file
    grep -E ab\\\\cd file
    #ab\cd
    grep -E ab\\\\\cd file
    #ab\cd
    grep -E ab\\\\\\cd file
    #ab\cd
    grep -E ab\\\\\\\cd file
    #ab\cd
    grep -E ab\\\\\\\\cd file
    grep -E "ab\cd" file
    grep -E "ab\\cd" file
    grep -E "ab\\\cd" file
    #ab\cd
    grep -E "ab\\\\cd" file
    #ab\cd
    grep -E "ab\\\\\cd" file
    #ab\cd
    grep -E "ab\\\\\\cd" file
    #ab\cd
    grep -E "ab\\\\\\\cd" file
    grep -E 'ab\cd' file
    grep -E 'ab\\cd' file
    #ab\cd
    grep -E 'ab\\\cd' file
    #ab\cd
    grep -E 'ab\\\\cd' file
    

    This means that:

    • with no quotes, I can match a backslash with 4-7 actual backslashes
    • with double quotes, I can match a backslash with 3-6 actual backslashes
    • With single quotes, I can match a backslash with 2-3 actual backslashes

    I understand that one extra backslash is ignored by the shell (from the bash man page):

    "A non-quoted backslash (\) is the escape character. It preserves the literal value of the next character that follows"

    This does not apply to the single-quoted examples, because no escaping is done in single quotes.

    And one additional backslash is ignored by the grep command ("\c" is just "c" escaped, but this is just the same as "c", because "c" does not have a special meaning in a regex).

    This explains the behaviour of the example with single quotes, but I don't really understand the other two examples, especially why there is a difference between non-qouted an double-quoted strings.

    Again, a quote from the bash man page:

    "Enclosing characters in double quotes preserves the literal value of all characters within the quotes, with the exception of $, `, \, and, when history expansion is enabled, !."

    I tried the same with GNU awk (e.g. awk /ab\cd/{print} file), with the same results.

    Perl, however, shows different results (using e.g. perl -ne "/ab\\cd/"\&\&print file):

    • with no quotes, I can match a backslash with 4-5 actual backslashes
    • with double quotes, I can match a backslash with 3-4 actual backslashes
    • With single quotes, I can match a backslash with 2 actual backslashes

    Can anyone explain that difference between non-quoted and double-qouted regex strings on the command-line for grep and awk? I'm not that interested in an explanation of Perl's behaviour, since I usually don't use Perl one-liners.

  • Olivier Dulac
    Olivier Dulac about 6 years
    ... and then there are some oddities: for exemple: printf "\ntest" will insert a newline before "test", even though "\n" should have been translated to "n" by the shell as it is whithin double quotes... (so the expected result should be, for "\ntest", "ntest". We should get the habit to write: printf "\\ntest" or printf '\ntest', but somehow I see a lot of script relying on the oddity instead.