Grep doesn't seem to find all matches in a file I expect
Solution 1
The reason is that *
is a special character in regular expressions and means
zero or more preceding characters
. You have to escape *
to mean a
literal *
character with \
. So in your examples:
grep -ir "Settings*xml"
would search for a string that starts with Setting
, and then has
zero or more s
characters and xml
at the end. There is no such
string in your file because xml
is always preceded with .
. And this:
grep -ir "Settings*.xml"
would search for a string that starts with Setting
, and then has
zero or more s
and .xml
after zero or more s
letters.
Your first regex would match something like this:
Settingssxml
Solution 2
This other answer explains what happened, it answers your explicit questions. My answer is intended to introduce a broader context.
I guess you expected *
to match zero or more characters (any characters) and .
to literally mean .
. This works with shell globbing, i.e. if you had files like this:
$ ls -1
Settings.xml
blah
Settings_1.xml
Settings_2.xml
then (say, in bash
) you could do:
$ echo Settings*.xml
Settings.xml Settings_1.xml Settings_2.xml
You didn't get what you expected because grep
uses regex syntax where:
-
.
matches (almost) any character, -
*
means "zero or more preceding characters", -
\
forces the next character to be interpreted literally.
That's why instead of "Settings*.xml"
you should have used "Settings.*\.xml"
. In this case:
-
.*
does what you thought*
would do, -
\.
does what you thought.
would do.
Ben Sandeen
Interested in the intersection of physics and computer science. Enjoy wondering "what if". Excited by making things more efficient. Passionate about the environment.
Updated on September 18, 2022Comments
-
Ben Sandeen over 1 year
I'm not sure if I don't fully understand
grep
or if regexes are the source of my problem, so I have two questions. I have a simple test file namedtest.txt
with the following contents:$ cat test.txt Settings.xml blah Settings_1.xml blah Settings_2.xml
When I run
grep
in a directory containing only the above test file with the following command, it returns with no matches:$ grep -ir "Settings*xml"
1) Why is the wildcard
*
not catching the period?And when I run
grep
as such:$ grep -ir "Settings*.xml"
the only difference being the period after the wildcard, the results are:
test.txt:Settings.xml
2) Why is
grep
not finding the other two matches?-
Ben Sandeen almost 7 yearsIt looks like the
*
wasn't doing what I thought (see the answer from @ArkadiuszDrabczyk) A solution that does return what I want is:$ grep -ir "Settings[[:alnum:][:punct:]]*.xml"
-
-
slhck almost 7 yearsYou mean
*
in your first sentences.