grep with regex containing pipe character

8,228

It appears that grep accepts \| as a separator between alternative search expressions (like | in egrep, where \| matches a literal |).

Apart from that, your expression has other problems:-

  • + is supported in egrep (or grep -E) only.
  • \s is not supported within a [] character group.
  • I don't see the need for | in the character group.

So the following works for grep:-

grep "{{flag|[a-zA-Z ][a-zA-Z ]*}}" <temp

Or (thanks to Glenn Jackman's input):-

grep "{{flag|[a-zA-Z ]\+}}" <temp

In egrep the {} characters have special significance, so they need to be escaped:-

egrep "\{\{flag\|[a-zA-Z ]+\}\}" <temp

Note that I have removed the unnecessary use of cat.

Share:
8,228

Related videos on Youtube

XPLOT1ON
Author by

XPLOT1ON

Will code for food :) Networking, Algorithms, Parallel Processing, Optimisation, Distributed System

Updated on September 18, 2022

Comments

  • XPLOT1ON
    XPLOT1ON over 1 year

    I am trying to grep with regex that contains pipe character |. However, It doesn't work as expected. The regex does not match the | inclusively as seen in the attach image below.

    enter image description here

    this is my bash command

    cat data | grep "{{flag\|[a-z|A-Z\s]+}}"

    the sample data are the following

    | 155||NA||{{flag|Central African Republic}}||2.693||NA||0.000||0.000||0.019||0.271||0.281||0.057||2.066
    |{{flagicon|Kosovo}} ''[[Kosovo]]'' <ref name="KOS" group=Note>{{Kosovo-note}}</ref>
    |{{flagicon|Somaliland}} [[Somaliland|Somaliland region]]
    |{{flagicon|Palestine}} ''[[Palestinian Territories]]''{{refn|See the following on statehood criteria:
    

    the expected output is

    | 155||NA||{{flag|Central African Republic}}||2.693||NA||0.000||0.000||0.019||0.271||0.281||0.057||2.066
    

    However, having tested it with Regex101.com, the result came out as expected.

    • jpaugh
      jpaugh over 6 years
      POSIX (Grep, e.g.), Vim, and Perl are the three major syntaxes for regexes you'll encounter; unfortunately, each of them is quite different, in both ability and syntax. Luckilly, almost all modern software has settled on Perl's syntax. That's the reason any online service will disagree somewhat with grep: JavaScript's regex engine is based on Perl syntax and semantics.
  • jpaugh
    jpaugh over 6 years
    You can also do <temp egrep ..., to put temp back at the beginning where it feels natural
  • AFH
    AFH over 6 years
    @jpaugh - Yes, I know, but I usually avoid it because it doesn't work with internal commands.
  • jpaugh
    jpaugh over 6 years
    Really? I started using it to break myself of the cat habit; I always think of the file first. Looking at help in bash, I can't see many built-ins which use stdio. It does break read, though; sure enough.
  • glenn jackman
    glenn jackman over 6 years
    with basic grep regex, you can change [a-zA-Z ][a-zA-Z ]* to [a-zA-Z ]\+ -- ref: gnu.org/software/gnulib/manual/html_node/…
  • AFH
    AFH over 6 years
    @glennjackman - Thanks very much: I didn't know about that, and I shall certainly use it from now on. I wonder how many of the other ERE specifics can be escaped in BRE...
  • AFH
    AFH over 6 years
    @jpaugh - I not infrequently use things like while read -r l; do ...; done to handle lines of input, and this works fine with redirection at the end, but not at the beginning. It is annoying, and almost qualifies as a bug, but I have learned to live with it. Otherwise, I'd use your format as a matter of course.
  • AFH
    AFH over 6 years
    This does not use the same search criteria as the questioner wants: it can yield more lines than are required. If you simplify the search to this extent, then the grep string is equally simplified.