Regex grouping matches with C++ 11 regex library

34,618

Solution 1

Your regular expression is incorrect because neither capture group does what you want. The first is looking to match a single character from the set [a-zA-Z0-9] followed by <space>:, which works for single character usernames, but nothing else. The second capture group will always be empty because you're looking for zero or more characters, but also specifying the match should not be greedy, which means a zero character match is a valid result.

Fixing both of these your regex becomes

std::regex rgx("WEBMSG #([a-zA-Z0-9]+) :(.*)");

But simply instantiating a regex and a match_results object does not produce matches, you need to apply a regex algorithm. Since you only want to match part of the input string the appropriate algorithm to use in this case is regex_search.

std::regex_search(s, matches, rgx);

Putting it all together

    std::string s{R"(
tХB:[email protected] Connected
tХB:[email protected] WEBMSG #Username :this is a message
tХB:[email protected] Status: visible
)"};

    std::regex rgx("WEBMSG #([a-zA-Z0-9]+) :(.*)");
    std::smatch matches;

    if(std::regex_search(s, matches, rgx)) {
        std::cout << "Match found\n";

        for (size_t i = 0; i < matches.size(); ++i) {
            std::cout << i << ": '" << matches[i].str() << "'\n";
        }
    } else {
        std::cout << "Match not found\n";
    }

Live demo

Solution 2

"WEBMSG #([a-zA-Z0-9]) :(.*?)"

This regex will match only strings, which contain username of 1 character length and any message after semicolon, but second group will be always empty, because tries to find the less non-greedy match of any characters from 0 to unlimited.

This should work:

"WEBMSG #([a-zA-Z0-9]+) :(.*)"
Share:
34,618

Related videos on Youtube

Vivendi
Author by

Vivendi

Updated on June 11, 2020

Comments

  • Vivendi
    Vivendi almost 4 years

    I'm trying to use a regex for group matching. I want to extract two strings from one big string.

    The input string looks something like this:

    tХB:[email protected] Connected
    tХB:[email protected] WEBMSG #Username :this is a message
    tХB:[email protected] Status: visible
    

    The Username can be anything. Same goes for the end part this is a message.

    What I want to do is extract the Username that comes after the pound sign #. Not from any other place in the string, since that can vary aswell. I also want to get the message from the string that comes after the semicolon :.

    I tried that with the following regex. But it never outputs any results.

    regex rgx("WEBMSG #([a-zA-Z0-9]) :(.*?)");
    smatch matches;
    
    for(size_t i=0; i<matches.size(); ++i) {
        cout << "MATCH: " << matches[i] << endl;
    }
    

    I'm not getting any matches. What is wrong with my regex?

    • Galik
      Galik about 9 years
      Is it essential to solve this using regex because it seems to me stream extraction functions could achieve this.
  • Hugh Perkins
    Hugh Perkins about 2 years
    demo page doesnt load