Is this C++11 regex error me or the compiler?
Solution 1
Update: <regex>
is now implemented and released in GCC 4.9.0
Old answer:
ECMAScript syntax accepts [0-9]
, \s
, \w
, etc, see ECMA-262 (15.10). Here's an example with boost::regex
that also uses the ECMAScript syntax by default:
#include <boost/regex.hpp>
int main(int argc, char* argv[]) {
using namespace boost;
regex e("[0-9]");
return argc > 1 ? !regex_match(argv[1], e) : 2;
}
It works:
$ g++ -std=c++0x *.cc -lboost_regex && ./a.out 1
According to the C++11 standard (28.8.2) basic_regex()
uses regex_constants::ECMAScript
flag by default so it must understand this syntax.
Is this C++11 regex error me or the compiler?
gcc-4.6.1 doesn't support c++11 regular expressions (28.13).
Solution 2
The error is because creating a regex by default uses ECMAScript syntax for the expression, which doesn't support brackets. You should declare the expression with the basic
or extended
flag:
std::regex r4("[0-9]", std::regex_constants::basic);
Edit Seems like libstdc++ (part of GCC, and the library that handles all C++ stuff) doesn't fully implement regular expressions yet. In their status document they say that Modified ECMAScript regular expression grammar is not implemented yet.
Solution 3
Regex support improved between gcc 4.8.2 and 4.9.2. For example, the regex =[A-Z]{3}
was failing for me with:
Regex error
After upgrading to gcc 4.9.2, it works as expected.
Shay Guy
Updated on December 23, 2020Comments
-
Shay Guy over 3 years
OK, this isn't the original program I had this problem in, but I duplicated it in a much smaller one. Very simple problem.
main.cpp:
#include <iostream> #include <regex> using namespace std; int main() { regex r1("S"); printf("S works.\n"); regex r2("."); printf(". works.\n"); regex r3(".+"); printf(".+ works.\n"); regex r4("[0-9]"); printf("[0-9] works.\n"); return 0; }
Compiled successfully with this command, no error messages:
$ g++ -std=c++0x main.cpp
The last line of
g++ -v
, by the way, is:gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3)
And the result when I try to run it:
$ ./a.out S works. . works. .+ works. terminate called after throwing an instance of 'std::regex_error' what(): regex_error Aborted
It happens the same way if I change r4 to
\\s
,\\w
, or[a-z]
. Is this a problem with the compiler? I might be able to believe that C++11's regex engine has different ways of saying "whitespace" or "word character," but square brackets not working is a stretch. Is it something that's been fixed in 4.6.2?EDIT:
Joachim Pileborg has supplied a partial solution, using an extra
regex_constants
parameter to enable a syntax that supports square brackets, but neitherbasic
,extended
,awk
, norECMAScript
seem to support backslash-escaped terms like\\s
,\\w
, or\\t
.EDIT 2:
Using raw strings (
R"(\w)"
instead of"\\w"
) doesn't seem to work either.