Is gcc 4.8 or earlier buggy about regular expressions?

36,411

Solution 1

<regex> was implemented and released in GCC 4.9.0.

In your (older) version of GCC, it is not implemented.

That prototype <regex> code was added when all of GCC's C++0x support was highly experimental, tracking early C++0x drafts and being made available for people to experiment with. That allowed people to find problems and give feedback to the standard committee before the standard was finalised. At the time lots of people were grateful to have had access to bleeding edge features long before C++11 was finished and before many other compilers provided any support, and that feedback really helped improve C++11. This was a Good ThingTM.

The <regex> code was never in a useful state, but was added as a work-in-progress like many other bits of code at the time. It was checked in and made available for others to collaborate on if they wanted to, with the intention that it would be finished eventually.

That's often how open source works: Release early, release often -- unfortunately in the case of <regex> we only got the early part right and not the often part that would have finished the implementation.

Most parts of the library were more complete and are now almost fully implemented, but <regex> hadn't been, so it stayed in the same unfinished state since it was added.

Seriously though, who though that shipping an implementation of regex_search that only does "return false" was a good idea?

It wasn't such a bad idea a few years ago, when C++0x was still a work in progress and we shipped lots of partial implementations. No-one thought it would remain unusable for so long so, with hindsight, maybe it should have been disabled and required a macro or built-time option to enable it. But that ship sailed long ago. There are exported symbols from the libstdc++.so library that depend on the regex code, so simply removing it (in, say, GCC 4.8) would not have been trivial.

Solution 2

Feature Detection

This is a snippet to detect if the libstdc++ implementation is implemented with C preprocessor defines:

#include <regex>
#if __cplusplus >= 201103L &&                             \
    (!defined(__GLIBCXX__) || (__cplusplus >= 201402L) || \
        (defined(_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT) || \
         defined(_GLIBCXX_REGEX_STATE_LIMIT)           || \
             (defined(_GLIBCXX_RELEASE)                && \
             _GLIBCXX_RELEASE > 4)))
#define HAVE_WORKING_REGEX 1
#else
#define HAVE_WORKING_REGEX 0
#endif

Macros

  • _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT is defined in bits/regex.tcc in 4.9.x
  • _GLIBCXX_REGEX_STATE_LIMIT is defined in bits/regex_automatron.h in 5+
  • _GLIBCXX_RELEASE was added to 7+ as a result of this answer and is the GCC major version

Testing

You can test it with GCC like this:

cat << EOF | g++ --std=c++11 -x c++ - && ./a.out
#include <regex>

#if __cplusplus >= 201103L &&                             \
    (!defined(__GLIBCXX__) || (__cplusplus >= 201402L) || \
        (defined(_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT) || \
         defined(_GLIBCXX_REGEX_STATE_LIMIT)           || \
             (defined(_GLIBCXX_RELEASE)                && \
             _GLIBCXX_RELEASE > 4)))
#define HAVE_WORKING_REGEX 1
#else
#define HAVE_WORKING_REGEX 0
#endif

#include <iostream>

int main() {
  const std::regex regex(".*");
  const std::string string = "This should match!";
  const auto result = std::regex_search(string, regex);
#if HAVE_WORKING_REGEX
  std::cerr << "<regex> works, look: " << std::boolalpha << result << std::endl;
#else
  std::cerr << "<regex> doesn't work, look: " << std::boolalpha << result << std::endl;
#endif
  return result ? EXIT_SUCCESS : EXIT_FAILURE;
}
EOF

Results

Here are some results for various compilers:


$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ./a.out
<regex> doesn't work, look: false

$ gcc --version
gcc (GCC) 6.2.1 20160830
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ./a.out
<regex> works, look: true

$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ./a.out
<regex> works, look: true

$ gcc --version
gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ./a.out
<regex> works, look: true

$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ./a.out
<regex> works, look: true

$ gcc --version
gcc (GCC) 6.2.1 20160830
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ clang --version
clang version 3.9.0 (tags/RELEASE_390/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ ./a.out  # compiled with 'clang -lstdc++'
<regex> works, look: true

Here be Dragons

This is totally unsupported and relies on the detection of private macros that the GCC developers have put into the bits/regex* headers. They could change and go away at anytime. Hopefully, they won't be removed in the current 4.9.x, 5.x, 6.x releases but they could go away in the 7.x releases.

If the GCC developers added a #define _GLIBCXX_HAVE_WORKING_REGEX 1 (or something, hint hint nudge nudge) in the 7.x release that persisted, this snippet could be updated to include that and later GCC releases would work with the snippet above.

As far as I know, all other compilers have a working <regex> when __cplusplus >= 201103L but YMMV.

Obviously this would completely break if someone defined the _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT or _GLIBCXX_REGEX_STATE_LIMIT macros outside of the stdc++-v3 headers.

Share:
36,411

Related videos on Youtube

tunnuz
Author by

tunnuz

...

Updated on July 08, 2022

Comments

  • tunnuz
    tunnuz almost 2 years

    I am trying to use std::regex in a C++11 piece of code, but it appears that the support is a bit buggy. An example:

    #include <regex>
    #include <iostream>
    
    int main (int argc, const char * argv[]) {
        std::regex r("st|mt|tr");
        std::cerr << "st|mt|tr" << " matches st? " << std::regex_match("st", r) << std::endl;
        std::cerr << "st|mt|tr" << " matches mt? " << std::regex_match("mt", r) << std::endl;
        std::cerr << "st|mt|tr" << " matches tr? " << std::regex_match("tr", r) << std::endl;
    }
    

    outputs:

    st|mt|tr matches st? 1
    st|mt|tr matches mt? 1
    st|mt|tr matches tr? 0
    

    when compiled with gcc (MacPorts gcc47 4.7.1_2) 4.7.1, either with

    g++ *.cc -o test -std=c++11
    g++ *.cc -o test -std=c++0x
    

    or

    g++ *.cc -o test -std=gnu++0x
    

    Besides, the regex works well if I only have two alternative patterns, e.g. st|mt, so it looks like the last one is not matched for some reasons. The code works well with the Apple LLVM compiler.

    Any ideas about how to solve the issue?

    Update one possible solution is to use groups to implement multiple alternatives, e.g. (st|mt)|tr.

    • kennytm
      kennytm over 11 years
      Yes libstdc++'s <regex> support is incomplete. What can we help you?
    • tunnuz
      tunnuz over 11 years
      I just wanted to know if it was possible to solve it in a different way (e.g. some flags for the compiler, or by using a specific version of libstdc++).
    • ecatmur
      ecatmur over 11 years
      For the status of regex in libstdc++, see gcc.gnu.org/onlinedocs/libstdc++/manual/…
    • ecatmur
      ecatmur over 11 years
    • Paul Rubel
      Paul Rubel over 11 years
      Seriously though, who though that shipping an implementation of regex_search that only does "return false" was a good idea? "Oh, we documented it" seems kind of a weak reply.
    • im so confused
      im so confused over 11 years
      IMO, that's the problem with freely distributed software. No one is held accountable for errors.
    • rubenvb
      rubenvb over 11 years
      @AK4749: this is not an error. It's just outright unimplemented. Although the amount of times this question shows up is alarming, especially since nothing changed about the libstdc++ <regex>in the past 3-4 years (as in: it remains unimplemented).
    • im so confused
      im so confused over 11 years
      That's true, I'll concede that.
    • Ed S.
      Ed S. over 11 years
      @rubenvb: It's not surprising at all; people typically expect things to work, or be absent. Not both at the same time. Counting on all of your users to read the documentation is not realistic (though it would be nice!) and, in this case, they have to perform research just to find out that... this is an unimplemented "feature". Should just be absent.
    • NoSenseEtAl
      NoSenseEtAl over 11 years
      to make matters worse VS has had a cra*y regex implementation also... I hope they fixed it in VS 2012. I *guess boost regex is the safest bet. :)
    • Keith Thompson
      Keith Thompson over 11 years
      It's important to note that the <regex> header and the associated code that implements it (or doesn't) isn't part of gcc. On my system, it's part of the "libstdc++6-4.7-dev" package. It's possible that another system might provide the gcc compiler with a different implementation of the C++ standard library.
    • Jonathan Wakely
      Jonathan Wakely over 11 years
      @KeithThompson, while it's true that <regex> is provided by libstdc++ (the GCC standard library) not gcc (the compiler front end), it is part of GCC (the project). See "libstdc++-v3 is developed and released as part of GCC". If your distro chooses to split it into a separate package that's nothing to do with GCC.
    • jfs
      jfs over 9 years
    • tjwrona1992
      tjwrona1992 about 5 years
      I just wasted hours because of this... SO FRUSTRATING! >:(
    • Alexis
      Alexis about 2 years
      Why doesn't the compiler set an error at compile time? Why a non-working feature can be allowed in a standard if the compiler doesn't support it?
  • Jonathan Wakely
    Jonathan Wakely over 7 years
    Very nice! I was going to suggest checking for the header guard macro from one of the headers that is new in GCC 4.9, but they don't have guards :-\ The macros aren't changing for GCC 7, but theoretically they could do for GCC 8+, so please file an enhancement request at gcc.gnu.org/bugzilla asking for something like _GLIBCXX_REGEX_IS_OK_NOW_KTHXBAI in the headers, so it doesn't get forgotten - thanks!
  • Matt Clarkson
    Matt Clarkson over 7 years
    @JonathanWakely have added 78905. I'm not sure how to make that into an enhancement bug but it's in the system now.
  • Jonathan Wakely
    Jonathan Wakely over 6 years
    "At this moment (using std=c++14 in g++ (GCC) 4.9.2) is still not accepting regex_match." That's not true, you're probably using it wrong.
  • Jonathan Wakely
    Jonathan Wakely over 6 years
    Your code is not "an approach that works like regex_match" because that function tries to match sub-strings, not the entire string, so I still think you're using it wrong. You can do it with std::regex_search though, see wandbox.org/permlink/rLbGyYcYGNsBWsaB