Regular expression in bash script
Solution 1
From man 7 regex
:
A bracket expression is a list of characters enclosed in "[]". …
… To include a literal '-', make it the first or last character…. [A]ll other special characters, including '\', lose their special significance within a bracket expression.
Trying the regexp with egrep gives an error:
$ echo "username : username usergroup" | egrep "^([a-zA-Z0-9\-_]+ : [a-zA-Z0-9\-_]+) (usergroup)$"
egrep: Invalid range end
Here is a simpler version, that also gives an error:
$ echo 'hi' | egrep '[\-_]'
egrep: Invalid range end
Since \
is not special, that is a range, just like [a-z]
would be. You need to put your -
at the end, like [_-]
or:
echo "username : username usergroup" | egrep "^([a-zA-Z0-9_-]+ : [a-zA-Z0-9_-]+) (usergroup)$"
username : username usergroup
This should work regardless of your libc version (in either egrep or bash).
edit: This actually depends on your locale settings too. The manpage does warn about this:
Ranges are very collating-sequence-dependent, and portable programs should avoid relying on them.
For example:
$ echo '\_' | LC_ALL=en_US.UTF8 egrep '[\-_]'
egrep: Invalid range end
$ echo '\_' | LC_ALL=C egrep '[\-_]'
\_
Of course, even though it didn't error, it isn't doing what you want:
$ echo '\^_' | LC_ALL=C egrep '^[\-_]+$'
\^_
It's a range, which in ASCII, includes \
, [
, ^
, and _
.
Solution 2
General rule with regexps (and any bugs in larger pieces of code): cut it down and rebuild it step by step or use bisecting - whatever works better for you.
In this case the culprit turned out to be the underscore - escaping it with a backslash has made it work.
Related videos on Youtube
Adam Westh
I like to build scalable CQRS/ES applications, and turn complex business ideas into beautiful software.
Updated on September 18, 2022Comments
-
Adam Westh almost 2 years
This is my first time bash scripting so I'm probably making an easy mistake.
Basically, I'm trying to write a script that gets the groups of a user, and if they are in a certain group, it will log that accordingly. Evidently there will be more functionality, but there's no point building that when I can't even get the regex working!
So far, I have this:
#!/bin/bash regex="^([a-zA-Z0-9\-_]+ : [a-zA-Z0-9\-_]+) (usergroup)$" # example output groups="username : username usergroup" echo "$groups" >> /home/jrdn/log if [[ "$groups" =~ $regex ]]; then echo "Match!" >> /home/jrdn/log else echo "No match" >> /home/jrdn/log fi
Every place I've tried that regex, it works. But in the bash script, it only ever outputs the
$groups
, followed byNo match
. So can someone tell me what's wrong with it?-
manatwork over 10 yearsWhat makes you think anything is wrong with it?
-
Adam Westh over 10 yearsIt echoes "No match". Could be something wrong with the comparison, there's something wrong somewhere.
-
peterph over 10 yearsWorks for me. What version of bash do you have?
-
Adam Westh over 10 yearsGNU bash, version 4.2.37(1)-release (x86_64-pc-linux-gnu)
-
manatwork over 10 yearsWorks for me too.
bash
4.1.10(4). pastebin.com/PgyiZujJ Actually I see no reason to not work. How you run it? -
peterph over 10 yearsInteresting, looks like something in your environment. How about trying a much simpler regex like trying to match
^a
on"asd"
and"qwe"
and then expanding it piece by piece? -
Adam Westh over 10 years@manatwork: just running it like:
./install.sh
@peterph: running^([a])
againstabc
anddbc
returns the proper results -
peterph over 10 years@jrdnhannah then try to slowly re-create your target regexp, first match
^([a-zA-Z0-9\-_]+)
then add the colon and so on... you should find out pretty soon, where is the problem. -
Adam Westh over 10 years@peterph I just tried running it on my mac, on the off chance it works.. And it does. I will simple it down though, and work out what my box doesn't like, and then try and figure out why it doesn't like it. Thanks
-
terdon over 10 yearsSame here with bash 4.2.45. Escaping the underscore fixed it. Weird. @jrdnhannah could you write that up as an answer and accept it please?
-
Adam Westh over 10 yearsSince I've only just signed up to the Unix SE, it requires me to wait 8 hours before answering my own. Happy to mark it as answered if somebody else does, though.
-
peterph over 10 yearsThere you go. Interesting thing is that my Bash 4.2.45 was ok with the unescaped underscore.
-
derobert over 10 yearsSounds like a bug in bash and/or [e]glibc. Broken on my Debian 4.2.45(1). Same problem with
egrep
; so this is probably eglibc, not bash. I have 2.17-92+b1. Actually, by the docs, the regex is wrong... -
terdon over 10 years@peterph seriously? It worked on your bash 4.2.45(1)? Which distro?
-
derobert over 10 years@terdon bash just calls libc's regex functions, probably. So it depends on the libc version, not the bash version. See my answer... (Or maybe even on the collation sequence you have in use)
-
peterph over 10 years@terdon seems that my
LC_COLLATE=POSIX
(which is the only thing differing from my[ll_CC].utf8
) "saved" me again. :)
-
-
manatwork over 10 yearsInteresting. My
egrep
gives no error, just matches it correctly. -
derobert over 10 years@manatwork your collation sequence probably allows the range....
-
manatwork over 10 yearsI not know much about collation. You mean this:
LC_COLLATE="en_US.UTF-8"
? -
derobert over 10 years@manatwork I've edited the question to give an example. Note it may be different on your system, because sometimes those collation (sorting) sequences change.
-
manatwork over 10 yearsYes, thank you. I noticed the edit too late.
-
derobert over 10 years@manatwork Its OK, I almost filed a bug report before I noticed the attempt to escape
-
...