Splitting a stringstream that contains comma separated entries

11,696

Solution 1

The istream >> operator when applied to a string discards the eventual initial spaces and reads up to the first "space".

It works the same for whatever type (including int). It works in your code because at the ',' the "int reader" fails, and assumes the following is something else.

The simplest way to read comma separated strings is using the std::getline function, giving a ',' as a separator.

In your case, your template function

template <typename T>
std::istream &operator>>(std::istream &is, Array<T> &t)
{ ...... }

remains valid, but requires a specialization

std::istream &operator>>(std::istream &is, Array<std::string> &t)
{
    std::string r;
    while(std::getline(is,r,','))
        t.push_back(r);
    return is;
}

Solution 2

If you need to split the string exactly at commas, the easiest approach I'm aware off is to redefine the meaning of space for a stream. This is done easily replacing the std::ctype<char> facet. Here is a version of this which I posted before...

#include <iostream>
#include <iterator>
#include <string>
#include <set>
#include <algorithm>

using namespace std;

typedef string T; // to simplify, always consider T as string

template<typename input_iterator>
void do_something(const input_iterator& first, const input_iterator& last) {
    const ostream_iterator<T> os(cout, "\n");
    const set<T> words(first, last);
    copy(words.begin(), words.end(), os);
}

#include <locale>
template <char S0, char S1>
struct commactype_base {
    commactype_base(): table_() {
        std::transform(std::ctype<char>::classic_table(),
                       std::ctype<char>::classic_table() + std::ctype<char>::table_size,
                       this->table_, 
                       [](std::ctype_base::mask m) -> std::ctype_base::mask {
                           return m & ~(std::ctype_base::space);
                       });
        this->table_[static_cast<unsigned char>(S0)] |= std::ctype_base::space;
        this->table_[static_cast<unsigned char>(S1)] |= std::ctype_base::space;
    }
    std::ctype<char>::mask table_[std::ctype<char>::table_size];
    static std::ctype_base::mask clear_space(std::ctype_base::mask m) {
        return m & ~(std::ctype_base::space);
    }
};
template <char S0, char S1 = S0>
struct ctype:
    commactype_base<S0, S1>,
    std::ctype<char>
{
    ctype(): std::ctype<char>(this->table_, false) {}
};

int main() {
    std::cin.imbue(std::locale(std::locale(), new ::ctype<',', '\n'>));
    const istream_iterator<T> is(cin), eof;
    do_something(is, eof);
    return 0;
}
Share:
11,696
The Quantum Physicist
Author by

The Quantum Physicist

I did a PhD of particle physics back in the days, and then left academia to start a professional career with software development, because I love it. I still contribute to science by leading software development efforts for scientific experiments. I love science, but I love programming even more :-)

Updated on June 04, 2022

Comments

  • The Quantum Physicist
    The Quantum Physicist almost 2 years

    I have two strings that look as follows:

    string text1 = "you,are,good";
    string text2 = "1,2,3,4,5";
    stringstream t1(text1);
    stringstream t2(text2);
    

    I'm using the following code to to parse it as comma separated data

    template <typename T>
    std::istream &operator>>(std::istream &is, Array<T> &t)
    {
        T i;
        while (is >> i)
        {
            t.push_back(i);
    
            if (is.peek() == ',')
                is.ignore();
        }
        return is;
    }
    

    where "is" is t1 or t2. This separates text2 but fails with text1. Could you guys please help me with that and tell me why it doesn't work with strings? I need a general code that would parse strings and numbers.

    Thanks for any efforts :)

  • Dietmar Kühl
    Dietmar Kühl over 11 years
    Sure. you could either set up the stream using the locale as above or you could deal with strings specifically. The things is that reading a string stops when the stream finds a space. Since your input doesn't contain any spaces, it is necessary to turn commas to be considered spaces (or somehow deal with strings differently).