Is this C++11 regex error me or the compiler?

61,187

Solution 1

Update: <regex> is now implemented and released in GCC 4.9.0


Old answer:

ECMAScript syntax accepts [0-9], \s, \w, etc, see ECMA-262 (15.10). Here's an example with boost::regex that also uses the ECMAScript syntax by default:

#include <boost/regex.hpp>

int main(int argc, char* argv[]) {
  using namespace boost;
  regex e("[0-9]");
  return argc > 1 ? !regex_match(argv[1], e) : 2;
}

It works:

$ g++ -std=c++0x *.cc -lboost_regex && ./a.out 1

According to the C++11 standard (28.8.2) basic_regex() uses regex_constants::ECMAScript flag by default so it must understand this syntax.

Is this C++11 regex error me or the compiler?

gcc-4.6.1 doesn't support c++11 regular expressions (28.13).

Solution 2

The error is because creating a regex by default uses ECMAScript syntax for the expression, which doesn't support brackets. You should declare the expression with the basic or extended flag:

std::regex r4("[0-9]", std::regex_constants::basic);

Edit Seems like libstdc++ (part of GCC, and the library that handles all C++ stuff) doesn't fully implement regular expressions yet. In their status document they say that Modified ECMAScript regular expression grammar is not implemented yet.

Solution 3

Regex support improved between gcc 4.8.2 and 4.9.2. For example, the regex =[A-Z]{3} was failing for me with:

Regex error

After upgrading to gcc 4.9.2, it works as expected.

Share:
61,187
Shay Guy
Author by

Shay Guy

Updated on December 23, 2020

Comments

  • Shay Guy
    Shay Guy over 3 years

    OK, this isn't the original program I had this problem in, but I duplicated it in a much smaller one. Very simple problem.

    main.cpp:

    #include <iostream>
    #include <regex>
    using namespace std;
    
    int main()
    {
        regex r1("S");
        printf("S works.\n");
        regex r2(".");
        printf(". works.\n");
        regex r3(".+");
        printf(".+ works.\n");
        regex r4("[0-9]");
        printf("[0-9] works.\n");
        return 0;
    }
    

    Compiled successfully with this command, no error messages:

    $ g++ -std=c++0x main.cpp
    

    The last line of g++ -v, by the way, is:

    gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3)
    

    And the result when I try to run it:

    $ ./a.out 
    S works.
    . works.
    .+ works.
    terminate called after throwing an instance of 'std::regex_error'
      what():  regex_error
    Aborted
    

    It happens the same way if I change r4 to \\s, \\w, or [a-z]. Is this a problem with the compiler? I might be able to believe that C++11's regex engine has different ways of saying "whitespace" or "word character," but square brackets not working is a stretch. Is it something that's been fixed in 4.6.2?

    EDIT:

    Joachim Pileborg has supplied a partial solution, using an extra regex_constants parameter to enable a syntax that supports square brackets, but neither basic, extended, awk, nor ECMAScript seem to support backslash-escaped terms like \\s, \\w, or \\t.

    EDIT 2:

    Using raw strings (R"(\w)" instead of "\\w") doesn't seem to work either.