C++ - Split string by regex

51,174

Solution 1

You don't need to use regular expressions if you just want to split a string by multiple spaces. Writing your own regex library is overkill for something that simple.

The answer you linked to in your comments, Split a string in C++?, can easily be changed so that it doesn't include any empty elements if there are multiple spaces.

std::vector<std::string> &split(const std::string &s, char delim,std::vector<std::string> &elems) {
    std::stringstream ss(s);
    std::string item;
    while (std::getline(ss, item, delim)) {
        if (item.length() > 0) {
            elems.push_back(item);  
        }
    }
    return elems;
}


std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    split(s, delim, elems);
    return elems;
}

By checking that item.length() > 0 before pushing item on to the elems vector you will no longer get extra elements if your input contains multiple delimiters (spaces in your case)

Solution 2

#include <regex>

std::regex rgx("\\s+");
std::sregex_token_iterator iter(string_to_split.begin(),
    string_to_split.end(),
    rgx,
    -1);
std::sregex_token_iterator end;
for ( ; iter != end; ++iter)
    std::cout << *iter << '\n';

The -1 is the key here: when the iterator is constructed the iterator points at the text that precedes the match and after each increment the iterator points at the text that followed the previous match.

If you don't have C++11, the same thing should work with TR1 or (possibly with slight modification) with Boost.

Solution 3

To expand on the answer by @Pete Becker I provide an example of resplit function that can be used to split text using regexp:

#include <regex>

std::vector<std::string> resplit(const std::string &s, const std::regex &sep_regex = std::regex{"\\s+"}) {
  std::sregex_token_iterator iter(s.begin(), s.end(), sep_regex, -1);
  std::sregex_token_iterator end;
  return {iter, end};
}

This works as follows:

   string s1 = "first   second third    ";
   vector<string> v22 = resplit(s1);

   for (const auto & e: v22) {
       cout <<"Token:" << e << endl;
   }

   //Token:first
   //Token:second
   //Token:third


   string s222 = "first|second:third,forth";
   vector<string> v222 = resplit(s222, "[|:,]");

   for (const auto & e: v222) {
       cout <<"Token:" << e << endl;
   }

   //Token:first
   //Token:second
   //Token:third
   //Token:forth

Solution 4

string s = "foo bar  baz";
regex e("\\s+");
regex_token_iterator<string::iterator> i(s.begin(), s.end(), e, -1);
regex_token_iterator<string::iterator> end;
while (i != end)
   cout << " [" << *i++ << "]";

prints [foo] [bar] [baz]

Share:
51,174

Related videos on Youtube

nothing-special-here
Author by

nothing-special-here

Maciej Kowalski Freelance - Ruby / JRuby / Rails / Backbone / AngularJS / Ember.js

Updated on July 09, 2022

Comments

  • nothing-special-here
    nothing-special-here almost 2 years

    I want to split std::string by regex.

    I have found some solutions on Stackoverflow, but most of them are splitting string by single space or using external libraries like boost.

    I can't use boost.

    I want to split string by regex - "\\s+".

    I am using this g++ version g++ (Debian 4.4.5-8) 4.4.5 and i can't upgrade.

    • nothing-special-here
      nothing-special-here almost 11 years
      Right know I am using this functions to split string: stackoverflow.com/a/236803/418518 it works only by single char. The regex format is correct, I have already used him in one java project. Works brillant.
    • nothing-special-here
      nothing-special-here almost 11 years
      The problem is that I don't know C++ much... and I just want to know how to split std::string using old c++ standard (C++03 probably). If you have some links / code just paste it. :) Thanks!
    • melwil
      melwil almost 11 years
      Can you show example input and desired output?
    • Bernhard Barker
      Bernhard Barker almost 11 years
      Using boost may be an option.
    • nothing-special-here
      nothing-special-here almost 11 years
      @melwil: Desired input / output: gist.github.com/maciejkowalski/af7e0ce2b92d967e050c
    • nothing-special-here
      nothing-special-here almost 11 years
      @Dukeling: Unfortunatelly, I can't use boost. ;/
    • Bernhard Barker
      Bernhard Barker almost 11 years
      If that version of g++ C++11 compliant, this / this may be a starting point. Otherwise, splitting by regex pattern without an external library will probably require writing a regex parser (which is no small task, or a small copy-paste task, assuming you can find code to do it). However, if you just want to split by multiple spaces, a simple iterative solution probably won't be too difficult, or simply split by a single space and ignore empty strings.
    • n. m.
      n. m. almost 11 years
      C++03 does not come with a regex library. C++11 does but your compiler won't support C++11. You need to either use an existing third-party regex library, or write one of your own.
  • nothing-special-here
    nothing-special-here almost 11 years
    Well, we figured out the same way in the same time. :) But you were actually faster (~10 min) in pasting answer on SO. +1 & accept.
  • Pete Becker
    Pete Becker about 9 years
    @Narek - either that, or add explicit template arguments: regex_token_iterator<std::string::iterator>. sregex_token_iterator is easier. Fixed. Thanks.
  • Lu4
    Lu4 almost 9 years
    You should agree also on fact that using C++ to split string looks like even larger overkill, in C# you just do str.split(...) ;)
  • solstice333
    solstice333 over 7 years
    the last example on cplusplus.com reference doc is similar to this answer