difference between .* and * in regular Expression

18,537

Solution 1

notation (.*)

The * in the regular expressions .* and * is referring to a count, not characters per say, more exactly it means 'zero or more'. Furthermore, the . means 'any single character'.

So when you put them together you get 'zero or more of any characters'. For example strings like these:

  • linux
  • linnnnnx
  • lnx
  • hi linux
  • lx

Would be matched by <l.*x>. The last one is important, it shows that the .* can match nothing too.

notation (*)

The use of * alone as I said is a counter. So when you put it after a letter such as 'l' the * is saying 'zero or more of l'.

Notice if we grep for l*x, this will match l...x, but probably not for the reason you'd think.

% echo "l...x" | grep "l*x"
l...x

It's matching on the trailing 'x'. The 'l' has nothing to do with why this is getting matched, other than the fact that the 'x' is preceded by 'zero or more l's'.

Solution 2

For the shell (eg. bash) when jokers are used to match filenames, * and ? are the characters themselves - they represent the character(s).

For regular-expression on the other hand, *, ?, {n,m} (range of occurrences) and + (egrep only) are nothing by themselves. They always refers to the previous character/atom - weather this is an actual character (eg. L or 5), the . (joker) which can represent any character, a range of characters (e.g. [a-f]) or a pattern of several characters (egrep only; e.g. (abba) - where "abba" is considered a unit). The * and ? thus represent nothing by themselves, but tell something about how many times the previous character (which may be a joker for any or a group treated as a unit) should be repeated.

Once you remember this distinction, between the way the shell and regex uses the * and ?, it should fall into place.

So for regex:

  • . - represent exactly one occurrence of any character
  • a..a - matches two a's with two characters of any sort between
  • .* - matches 0, 1 or more occurrences of any character
  • B* - matches 0, 1 or more occurrences of "B"

Solution 3

If you wanted to match anything starting with "l" and ending in "x", try regular expression "l.*x". Here "." and "*" are special characters representing a single valid character and characters of at least zero length respectively. Here what precedes "*" is a ".", so whatever comes in the place of "." is repeated according to "*" 's definition as per above.

Share:
18,537
ravi
Author by

ravi

Updated on September 18, 2022

Comments

  • ravi
    ravi over 1 year

    I've a file named "test" that contains

    linux
    Unixlinux
    Linuxunix
    it's linux
    l...x
    

    now when i use grep '\<l.*x\>' , it matches :

    linux
    it's linux
    l...x
    

    but when i use grep '\<l*x\>' , it only matches:

    l...x , but according to the reference guide, when using * , The preceding item will be matched zero or more times, i.e it should match anything that starts with 'l' and ends with 'x'

    Can anyone explain why ,it's not showing the desired result or if i've understood it wrong ?

    • Bernhard
      Bernhard about 11 years
      Why are you using \< and \>?
    • Bernhard
      Bernhard about 11 years
      Please note that . is a special character that should be escape if you want to use it as a dot.
    • Pavan Kumar
      Pavan Kumar about 11 years
      run grep using option --color ; that will help you understand what happens (hint: x is a word starting with zero l )
    • ravi
      ravi about 11 years
      thanks @guido, --color , really helped, and will also help in future
    • erch
      erch about 11 years
      @Bernhard \< matches the beginning and \> matches the end of a 'word' [ paragraph GNU Word Boundaries on regular-expressions.info/wordboundaries.html ]
  • Bernhard
    Bernhard about 11 years
    According to your explanation l*x shoud match neither l...x nor linux. Right?
  • slm
    slm about 11 years
    No it will match l...x because the last .x, will be matched as zero l's and an x. Let me update my answer to make that clearer, thanks.
  • ravi
    ravi about 11 years
    and as @guido, has written in reply of the question, using --color, will acutally show what's been matched.