What characters need to be escaped when using the printf command?

13,834

Solution 1

In the format argument of printf, only the % and \ characters are special (no, " is not special and \" is unspecified per POSIX).

But, two important notes.

  1. In most printf implementations¹, it's the byte values for \ and % that are special and the POSIX specification could even be interpreted as requiring it as it requires the printf utility to be an interface to the printf(3) C function and not wprintf(3) for instance (like it requires %.3s to truncate to 3 bytes and not 3 characters).

    In some character encodings including BIG5 and GB18030, there are hundreds of characters that contain the encoding of backslash, and to escape those for printf, you'd need to insert a \ before each 0x5c byte within the encoding of those characters!

    For instance in BIG5-HKSCS, as used for instance in the zh_HK.big5hkscs (Hong Kong) locale, all of Ěαжふ㘘㙡䓀䨵䪤么佢俞偅傜兝功吒吭园坼垥塿墦声娉娖娫嫹嬞孀尐岤崤幋廄惝愧揊擺暝枯柦槙檝歿汻沔涂淚滜潿瀙瀵焮燡牾狖獦珢珮琵璞疱癧礒稞穀笋箤糭綅縷罡胐胬脪苒茻莍蓋蔌蕚螏螰許豹贕赨跚踊蹾躡鄃酀酅醆鈾鎪閱鞸餐餤駹騱髏髢髿鱋鱭黠﹏𠗫𠰺𣘀𦖭𦰡𧃸𨜏𩄼𪀔 contain byte 0x5c (which is also the encoding of \).

    With most printf implementations, in that locale, printf 'αb' doesn't output αb but byte 0xa3 (the first byte of the encoding of α) followed by the BS character (the expansion of \b).

    $ LC_ALL=zh_HK.big5hkscs luit
    $ locale charmap
    BIG5-HKSCS
    $ printf 'αb' | LC_ALL=C od -tx1 -tc
    0000000  a3  08
            243  \b
    0000002
    

    Best is to avoid using (and even installing / making available) those locales as they cause all sorts of bugs and vulnerabilities of that sort.

  2. Some printf implementations support options, and even those that don't are required to support -- as the option delimiter. So printf -- won't output -- but likely report an error about a missing format argument. So if you can't guarantee your format won't start with -, you have to use the -- option delimiter:

     printf -- "$escaped_format" x y...
    

In any case, if you want to print arbitrary strings, you'd use:

printf '%s\n' "$data" # with terminating newline
printf %s "$data"     # without

There's no character that is special in the string passed to %s (though note that with the exception of the printf builtin of zsh, you can't pass the NUL character in any of printf arguments).

Note that while the canonical way to enter a literal \ is with \\ and a literal % with %%, on ASCII-based systems, you can also use \134 and \45 and with some printf implementations \x5c, \x25, or \x{5c}, \x{25}, or (even on non-ASCII systems): \u005c, \u0025 or \u{5c}, \u{25}.


¹ yash's printf builtin being the only exception I am aware of.

Solution 2

From the manual:

$ man printf
...
   printf FORMAT [ARGUMENT]...
...
   FORMAT controls the output as in C printf.  Interpreted sequences are:

This lists several interpreted sequences. The following are those where the character itself needs to be escaped.

   \"     double quote
   \\     backslash
   %%     a single %

I tested these three in bash, and they behaved as expected. As per man bash, this implementation of printf uses the "standard printf(1) format specifications" as above, in addition to a few more that aren't relevant here.


However, other shells such as zsh implement printf slightly differently. Here, the double quote shouldn't be escaped.

$ printf '"'
"   
$ printf '\"'
\"
Share:
13,834
Harold Fischer
Author by

Harold Fischer

Updated on September 18, 2022

Comments

  • Harold Fischer
    Harold Fischer almost 2 years

    I want to clarify that I am not talking about how to escape characters on the shell level of interpretation.

    As far as I can tell, only two character need to be escaped: % and \

    To print a literal %, you must escape it with a preceding %:

    printf '%%'
    

    To print a literal \ you must escape it with a preceding \:

    printf '\\'
    

    Are there any other instances where I would need to escape a character for it to be interpreted literally?

  • Harold Fischer
    Harold Fischer over 5 years
    Yeah, same behavior here on dash and bash. For what it's worth, the dash manual makes no mention of needing the to escape ", but maybe I'm not reading in between the lines
  • Sparhawk
    Sparhawk over 5 years
    @HaroldFischer Presumably dash just inherits printf(1) too? I found the zsh manual a bit more opaque, so I didn't quote it here.
  • dave_thompson_085
    dave_thompson_085 over 5 years
    (edited) backslash-dquote is only needed if the format string is in dquotes, which is usually a bad idea, as then you also need to backslash backquote and (most) dollarsign, and may need to quadruple backslash if followed by a printf special. printf is builtin in bash and dash, but like all nonspecial builtins in a POSIX shell must also be present as an 'external' program.
  • Sparhawk
    Sparhawk over 5 years
    @dave_thompson_085, the question does say I want to clarify that I am not talking about how to escape characters on the shell level of interpretation.
  • Kusalananda
    Kusalananda over 4 years
    Assuming this question is about outputting some supplied data: No character has to be escaped in a special way (apart from from the shell) if printf is used properly. For example:printf '%s\n' '\', or printf '%s\n' '%'. Data should not go in the format argument.
  • ilkkachu
    ilkkachu over 4 years
    "POSIX requires it to be an interface to printf(3)..." -- that's a bit funny in that the C printf() doesn't interpret backslash-escapes, but it's the compiler that does. Which doesn't make those ASCII-incompatible charsets less of a problem of course.
  • Stéphane Chazelas
    Stéphane Chazelas over 4 years
    @ilkkachu, yes the idea is that it requires its argument to be interpreted as arrays of bytes as opposed to text strings, like %.3s is meant to truncate to 3 bytes, not 3 characters. POSIX doesn't say in so many words that the format has to be interpreted as an array of bytes though. (and yes the handling of backslash there has nothing to do with the C printf)
  • Harold Fischer
    Harold Fischer over 4 years
    @StéphaneChazelas "Some printf implementations support options, and even those that don't are required to support -- as the option delimiter. " printf does not adhere to the XBD Utility Syntax Guidelines- I didn't think POSIX required printf to support the -- end of options delimter
  • Stéphane Chazelas
    Stéphane Chazelas over 4 years
    @HaroldFischer, see pubs.opengroup.org/onlinepubs/9699919799.2018edition/utiliti‌​es/… Standard utilities that do not accept options, but that do accept operands, shall recognize "--" as a first argument to be discarded....
  • Seamus
    Seamus about 2 years
    @Kusalananda: What is improper about this?: printf '--- end log entry ---\n'
  • Kusalananda
    Kusalananda about 2 years
    @Seamus Nothing? Unless you want the output to contain the character sequence \n rather than a literal newline. If you want \n in the output, you could do printf '--- end log entry ---%s' '\n'.
  • Seamus
    Seamus about 2 years
    @Kusalananda: Oh, sorry - I should have been more explicit. My printf command (and yours), both throw an error: -bash: printf: --: invalid option
  • Kusalananda
    Kusalananda about 2 years
    @Seamus In bash, yes, and I tested it in zsh where it doesn't error out. So make the problematic part a string that you format with %s: printf '%s\n' '--- end log entry ---'.
  • Seamus
    Seamus about 2 years
    @Kusalananda: Perfect! :)) 'tho I still think it's a bit of an "odd duck". I'll guess that zsh didn't have the legacy to deal with that seems to be the point of item 2. in the answer above??