Search for special characters using grep

33,879

Solution 1

grep "[]:/?#@\!\$&'()*+,;=%[]"

Within a bracketed expression, [...], very few character are "special" (only a very small subset, like ], - and ^, and the three combinations [=, [: and [.). When including ] in [...], the ] must come first (possibly after a ^). I opted to put the ] first and the [ last for symmetry.

The only other thing to remember is that a single quoted string can not include a single quote, so we use double quotes around the expression. Since we use a double quoted string, the shell will poke around in it for things to expand. For this reason, we escape the $ as \$ which will make the shell give a literal $ to grep, and we escape ! as \! too as it's a history expansion in bash (only in interactive bash shells though).

Would you want to include a backslash in the set, you would have to escape it as \\ so that the shell gives a single backslash to grep. Also, if you want to include a backtick `, it too must be escaped as \` as it starts a command substitution otherwise.

The command above would extract any line that contained at least one of the characters in the bracketed expression.


Using a single quoted string instead of a double quoted string, which gets around most of the annoyances with what characters the shell interprets:

grep '[]:/?#@!$&'"'"'()*+,;=%[]'

Here, the only thing to remember, apart from the placing of the ], is that a single quoted string can not include a single quote, so instead we use a concatenation of three strings:

  1. '[]:/?#@!$&'
  2. "'"
  3. '()*+,;=%[]'

Another approach would be to use the POSIX character class [[:punct:]]. This matches a single character from the set !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~, which is a larger set than what's given in the question (it additionally contains "-.<>^_`{|}~), but is all the "punctuation characters" that POSIX defines.

LC_ALL=C grep '[[:punct:]]'

Solution 2

You can use [:punct:] character class if you don't mind that it also matches other punctuation and special characters:

grep '[[:punct:]]' file

Solution 3

You can use full regex to find special characters inside of square brackets if your looking for one character that is a special character. A great resource practicing, learning and checking your Regular Expression is regex101.com.

This uses Perl regular expressions, which can be used with GNU grep with the -P option:

grep -P "(\:|\/|\?|\#|\@|\!|\\$|\&|\'|\(|\)|\*|\+|\,|\;|\=|\%|\[|\])"
                            ^^^

Note that you need two backslashes in front of the dollar sign, as it has a special meaning in the shell, and the first backslash will escape it for the shell. (With just one backslash in front, the shell would remove the backslash, grep would see an unescaped dollar sign meaning end of line, and match any input line.)

If your terminal supports colors, throw colors on as well,

grep --color=auto -P "(\:|\/|\?|\#|\@|\!|\\$|\&|\'|\(|\)|\*|\+|\,|\;|\=|\%|\[|\])"

Here is the explanation of my regex from regex101.com

/(\:|\/|\?|\#|\@|\!|\$|\&|\'|\(|\)|\*|\+|\,|\;|\=|\%|\[|\])/gm
1st Capturing Group (\:|\/|\?|\#|\@|\!|\$|\&|\'|\(|\)|\*|\+|\,|\;|\=|\%|\[|\])
  \: matches the character : literally (case sensitive)
  \/ matches the character / literally (case sensitive)
  \? matches the character ? literally (case sensitive)
  \# matches the character # literally (case sensitive)
  \@ matches the character @ literally (case sensitive)
  \! matches the character ! literally (case sensitive)
  \$ matches the character $ literally (case sensitive)
  \& matches the character & literally (case sensitive)
  \' matches the character ' literally (case sensitive)
  \( matches the character ( literally (case sensitive)
  \) matches the character ) literally (case sensitive)
  \* matches the character * literally (case sensitive)
  \+ matches the character + literally (case sensitive)
  \, matches the character , literally (case sensitive)
  \; matches the character ; literally (case sensitive)
  \= matches the character = literally (case sensitive)
  \% matches the character % literally (case sensitive)
  \[ matches the character [ literally (case sensitive)
  \] matches the character ] literally (case sensitive)
Share:
33,879

Related videos on Youtube

user9371654
Author by

user9371654

Updated on September 18, 2022

Comments

  • user9371654
    user9371654 over 1 year

    I want to search for the lines that contains any of the following characters:

    : / / ? # [ ] @ ! $ & ' ( ) * + , ; = %

  • Kusalananda
    Kusalananda over 5 years
    @ilkkachu I didn't spot the $ in there! Thanks!
  • user9371654
    user9371654 over 5 years
    When I try to execute the command, I get this error bash: !\: event not found.
  • Kusalananda
    Kusalananda over 5 years
    @user9371654 Darn bash! :-) Escape the ! too... Not being a bash user I forgot about that. I will update...
  • Kusalananda
    Kusalananda over 5 years
    The punct character class (not macro) matches !"#$%&'()*+,-./:;<=>?@[\]^_{|}~` in the C locale, which is a slightly larges set of characters than what the user has, but it may be good enough.
  • Kusalananda
    Kusalananda over 5 years
    @ilkkachu No, you can't put a single quote in a single quoted string like that. you would have to use '...'"'"'...'. I will make a note about it.
  • Kusalananda
    Kusalananda over 5 years
    @ilkkachu Ah, I see. I opted for a double-quoted ' instead.
  • Stéphane Chazelas
    Stéphane Chazelas over 5 years
    "[\!]" expands to [\!] even when history expansion is enabled, so would match on backslash. You'd need single quotes or using \! outside of quotes.
  • Stéphane Chazelas
    Stéphane Chazelas over 5 years
    No, with standard ERE, you can't escape the closing ] with backslash. backslash is not special inside bracket expressions. To have a ] inside a bracket expression, it needs to be first: []other], not [ot\]her]. That's different from PCREs which regex101 describe by default.
  • ilkkachu
    ilkkachu over 5 years
    It would work with pcregrep or GNU grep -P, though. And in a sense, the Perl behaviour is more straightforward: a backslash always makes a special character normal.
  • thebtm
    thebtm over 5 years
    Corrected to -P, sorry about that, i get the -E and -P mixed up
  • Stéphane Chazelas
    Stéphane Chazelas over 5 years
    Note that it's not only bash, zsh also has that annoying feature inherited from csh. in csh, ! special inside '...' as well, and also when non-interactive. However in csh (contrary to bash or zsh), using "\!" would work here (the backslash is removed).
  • lampShadesDrifter
    lampShadesDrifter over 3 years
    I was using cpdf to add bookmarks and getting a Bad bookmark file (syntax) at line 0 error. This post helped me find the bad chars. (Hopefully this comment helps this post get picked up in search for others with this problem).