changing the delimiter for cin (c++)

45,416

Solution 1

It is possible to change the inter-word delimiter for cin or any other std::istream, using std::ios_base::imbue to add a custom ctype facet.

If you are reading a file in the style of /etc/passwd, the following program will read each :-delimited word separately.

#include <locale>
#include <iostream>


struct colon_is_space : std::ctype<char> {
  colon_is_space() : std::ctype<char>(get_table()) {}
  static mask const* get_table()
  {
    static mask rc[table_size];
    rc[':'] = std::ctype_base::space;
    rc['\n'] = std::ctype_base::space;
    return &rc[0];
  }
};

int main() {
  using std::string;
  using std::cin;
  using std::locale;

  cin.imbue(locale(cin.getloc(), new colon_is_space));

  string word;
  while(cin >> word) {
    std::cout << word << "\n";
  }
}

Solution 2

For strings, you can use the std::getline overloads to read using a different delimiter.

For number extraction, the delimiter isn't really "whitespace" to begin with, but any character invalid in a number.

Solution 3

This is an improvement on Robᵩ's answer, because that is the right one (and I'm disappointed that it hasn't been accepted.)

What you need to do is change the array that ctype looks at to decide what a delimiter is.

In the simplest case you could create your own:

const ctype<char>::mask foo[ctype<char>::table_size] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ctype_base::space};

On my machine '\n' is 10. I've set that element of the array to the delimiter value: ctype_base::space. A ctype initialized with foo would only delimit on '\n' not ' ' or '\t'.

Now this is a problem because the array passed into ctype defines more than just what a delimiter is, it also defines leters, numbers, symbols, and some other junk needed for streaming. (Ben Voigt's answer touches on this.) So what we really want to do is modify a mask, not create one from scratch.

That can be accomplished like this:

const auto temp = ctype<char>::classic_table();
vector<ctype<char>::mask> bar(temp, temp + ctype<char>::table_size);

bar[' '] ^= ctype_base::space;
bar['\t'] &= ~(ctype_base::space | ctype_base::cntrl);
bar[':'] |= ctype_base::space;

A ctype initialized with bar would delimit on '\n' and ':' but not ' ' or '\t'.

You go about setting up cin, or any other istream, to use your custom ctype like this:

cin.imbue(locale(cin.getloc(), new ctype<char>(data(bar))));

You can also switch between ctypes and the behavior will change mid-stream:

cin.imbue(locale(cin.getloc(), new ctype<char>(foo)));

If you need to go back to default behavior, just do this:

cin.imbue(locale(cin.getloc(), new ctype<char>));

Live example

Solution 4

This is an improvement on Jon's answer, and the example from cppreference.com. So this follows the same premise as both, but combines them with parameterized delimiters.

struct delimiter_ctype : std::ctype<char> {
    static const mask* make_table(std::string delims)
    {
        // make a copy of the "C" locale table
        static std::vector<mask> v(classic_table(), classic_table() + table_size);
        for(mask m : v){
            m &= ~space;
        }
        for(char d : delims){
            v[d] |= space;
        }
        return &v[0];
    }
    delimiter_ctype(std::string delims, ::size_t refs = 0) : ctype(make_table(delims), false, refs) {}
};

Cheers!

Share:
45,416
yotamoo
Author by

yotamoo

Updated on March 21, 2020

Comments

  • yotamoo
    yotamoo about 4 years

    I've redirected "cin" to read from a file stream cin.rdbug(inF.rdbug()) When I use the extraction operator it reads until it reaches a white space character.

    Is it possible to use another delimiter? I went through the api in cplusplus.com, but didn't find anything.

  • Earth Engine
    Earth Engine about 11 years
    Using new in uncontrolled way is evil, needless to say that you have not delete your struct (and there is no way to delete an unnamed pointer). ALWAYS try shared_ptr instead when possible.
  • matth
    matth about 11 years
    That is generally excellent advice which does not apply in this specific case. In this case, std::facet is a refernce-counted pointer, std::locale::locale requires a raw pointer, not a shared pointer, and std::locale::~locale is defined to delete the facet pointer. If you have a problem with the interface to locale, take it up with the standards committee, not me. See the example program at en.cppreference.com/w/cpp/locale/locale/locale
  • Earth Engine
    Earth Engine about 11 years
    Even though I will suggest to define a wrapper function get_locale to wrap those unusual use of new with comments. So the code reviewer will realize there are something wrong with the interface, not the code writer. And this is what I mean for "controled" way of using new.
  • Earth Engine
    Earth Engine about 11 years
    If not creating new functions, a better way to represent the ownership transfer could be unique_ptr<colon_is_space>(new colon is_space).release(). Although it is basically the same thing of your code but more verbose, it indicates that you are transferring pointer ownership.
  • Jonathan Mee
    Jonathan Mee over 9 years
    I'm not sure how you can say the delimiter isn't "whitespace" for numbers, if foo is an int, istringstream("123 456") >> foo; puts 123 in foo, not 123456.
  • Ben Voigt
    Ben Voigt over 9 years
    @JonathanMee: I didn't say that whitespace aren't delimiters, I said the set of delimiters is not only whitespace. Try istringstream("123_456") >> foo; or Try istringstream("123|456") >> foo;
  • Jonathan Mee
    Jonathan Mee over 9 years
    Ahhh, I understand, you're saying that rather than looking for a character defined as ctype_base::space the stream is looking for a character not defined as ctype_base::digit.
  • Ben Voigt
    Ben Voigt over 9 years
    @JonathanMee: Right, although it's more complex than that, some punctuation characters are allowed during numeric parsing. And obviously whether it is classified as a space may affect the status flags, but whitespace is not the only thing that causes numeric extraction to stop.
  • Ben Voigt
    Ben Voigt over 9 years
    that will set bar['\t'] to zero, probably not intended. To clear a bit, use &~ (bit-wise AND with bit-wise NOT). ! is logical NOT and won't have the desired effect.
  • Jonathan Mee
    Jonathan Mee over 9 years
    @BenVoigt Thank you, I wanted to strip out the space and cntrl bits and I accidentally got everything.
  • Wolf
    Wolf over 8 years
    Does it make sense to expect that std::getline is optimized for performance?
  • Jonathan Mee
    Jonathan Mee over 8 years
    @Wolf streams in general are one of the least performant things in the standard. But typically you're going to use streams with input/output so slow performance will be negligible relative to the cost of the input/output operation. For performance reasons though arrays should be preferred over streams.
  • Ben Voigt
    Ben Voigt over 8 years
    @JonathanMee: "slow performance will be negligible relative to the cost of the input/output operation" has NEVER been true in my experience. The fact is that in many applications both file I/O and parsing are negligible compared to the cost of other processing, or waiting for the user to hit the start button, or network requests. But in I/O heavy applications built with iostreams, it's the iostream code, not the I/O operations, that dominates.
  • Jonathan Mee
    Jonathan Mee over 8 years
    Hmmm... I guess it's the type of project that I have a history with. Thanks for the clarification. It's good to have a balancing point of view. I suppose a better answer for @Wolf's question would be: "getline is no slower than the stream is as a whole, but if performance is a concern for you, you should look for non-stream options."