grep with regex containing pipe character
It appears that grep
accepts \|
as a separator between alternative search expressions (like |
in egrep
, where \|
matches a literal |
).
Apart from that, your expression has other problems:-
-
+
is supported inegrep
(orgrep -E
) only. -
\s
is not supported within a[]
character group. - I don't see the need for
|
in the character group.
So the following works for grep
:-
grep "{{flag|[a-zA-Z ][a-zA-Z ]*}}" <temp
Or (thanks to Glenn Jackman's input):-
grep "{{flag|[a-zA-Z ]\+}}" <temp
In egrep
the {}
characters have special significance, so they need to be escaped:-
egrep "\{\{flag\|[a-zA-Z ]+\}\}" <temp
Note that I have removed the unnecessary use of cat
.
Related videos on Youtube
XPLOT1ON
Will code for food :) Networking, Algorithms, Parallel Processing, Optimisation, Distributed System
Updated on September 18, 2022Comments
-
XPLOT1ON over 1 year
I am trying to grep with regex that contains pipe character
|
. However, It doesn't work as expected. The regex does not match the|
inclusively as seen in the attach image below.this is my bash command
cat data | grep "{{flag\|[a-z|A-Z\s]+}}"
the sample data are the following
| 155||NA||{{flag|Central African Republic}}||2.693||NA||0.000||0.000||0.019||0.271||0.281||0.057||2.066 |{{flagicon|Kosovo}} ''[[Kosovo]]'' <ref name="KOS" group=Note>{{Kosovo-note}}</ref> |{{flagicon|Somaliland}} [[Somaliland|Somaliland region]] |{{flagicon|Palestine}} ''[[Palestinian Territories]]''{{refn|See the following on statehood criteria:
the expected output is
| 155||NA||{{flag|Central African Republic}}||2.693||NA||0.000||0.000||0.019||0.271||0.281||0.057||2.066
However, having tested it with Regex101.com, the result came out as expected.
-
jpaugh over 6 yearsPOSIX (Grep, e.g.), Vim, and Perl are the three major syntaxes for regexes you'll encounter; unfortunately, each of them is quite different, in both ability and syntax. Luckilly, almost all modern software has settled on Perl's syntax. That's the reason any online service will disagree somewhat with grep: JavaScript's regex engine is based on Perl syntax and semantics.
-
-
jpaugh over 6 yearsYou can also do
<temp egrep ...
, to put temp back at the beginning where it feels natural -
AFH over 6 years@jpaugh - Yes, I know, but I usually avoid it because it doesn't work with internal commands.
-
jpaugh over 6 yearsReally? I started using it to break myself of the
cat
habit; I always think of the file first. Looking athelp
in bash, I can't see many built-ins which use stdio. It does breakread
, though; sure enough. -
glenn jackman over 6 yearswith basic grep regex, you can change
[a-zA-Z ][a-zA-Z ]*
to[a-zA-Z ]\+
-- ref: gnu.org/software/gnulib/manual/html_node/… -
AFH over 6 years@glennjackman - Thanks very much: I didn't know about that, and I shall certainly use it from now on. I wonder how many of the other ERE specifics can be escaped in BRE...
-
AFH over 6 years@jpaugh - I not infrequently use things like
while read -r l; do ...; done
to handle lines of input, and this works fine with redirection at the end, but not at the beginning. It is annoying, and almost qualifies as a bug, but I have learned to live with it. Otherwise, I'd use your format as a matter of course. -
AFH over 6 yearsThis does not use the same search criteria as the questioner wants: it can yield more lines than are required. If you simplify the search to this extent, then the
grep
string is equally simplified.