Grep doesn't seem to find all matches in a file I expect

9,133

Solution 1

The reason is that * is a special character in regular expressions and means zero or more preceding characters. You have to escape * to mean a literal * character with \. So in your examples:

grep -ir "Settings*xml"

would search for a string that starts with Setting, and then has zero or more s characters and xml at the end. There is no such string in your file because xml is always preceded with .. And this:

grep -ir "Settings*.xml"

would search for a string that starts with Setting, and then has zero or more s and .xml after zero or more s letters.

Your first regex would match something like this:

Settingssxml

Solution 2

This other answer explains what happened, it answers your explicit questions. My answer is intended to introduce a broader context.

I guess you expected * to match zero or more characters (any characters) and . to literally mean .. This works with shell globbing, i.e. if you had files like this:

$ ls -1
Settings.xml
blah
Settings_1.xml
Settings_2.xml

then (say, in bash) you could do:

$ echo Settings*.xml
Settings.xml Settings_1.xml Settings_2.xml

You didn't get what you expected because grep uses regex syntax where:

  • . matches (almost) any character,
  • * means "zero or more preceding characters",
  • \ forces the next character to be interpreted literally.

That's why instead of "Settings*.xml" you should have used "Settings.*\.xml". In this case:

  • .* does what you thought * would do,
  • \. does what you thought . would do.
Share:
9,133
Ben Sandeen
Author by

Ben Sandeen

Interested in the intersection of physics and computer science. Enjoy wondering "what if". Excited by making things more efficient. Passionate about the environment.

Updated on September 18, 2022

Comments

  • Ben Sandeen
    Ben Sandeen over 1 year

    I'm not sure if I don't fully understand grep or if regexes are the source of my problem, so I have two questions. I have a simple test file named test.txt with the following contents:

    $ cat test.txt Settings.xml blah Settings_1.xml blah Settings_2.xml

    When I run grep in a directory containing only the above test file with the following command, it returns with no matches:

    $ grep -ir "Settings*xml"

    1) Why is the wildcard * not catching the period?

    And when I run grep as such:

    $ grep -ir "Settings*.xml"

    the only difference being the period after the wildcard, the results are:

    test.txt:Settings.xml

    2) Why is grep not finding the other two matches?

    • Ben Sandeen
      Ben Sandeen almost 7 years
      It looks like the * wasn't doing what I thought (see the answer from @ArkadiuszDrabczyk) A solution that does return what I want is: $ grep -ir "Settings[[:alnum:][:punct:]]*.xml"
  • slhck
    slhck almost 7 years
    You mean * in your first sentences.