changing the delimiter for cin (c++)
Solution 1
It is possible to change the inter-word delimiter for cin
or any other std::istream
, using std::ios_base::imbue
to add a custom ctype
facet
.
If you are reading a file in the style of /etc/passwd, the following program will read each :
-delimited word separately.
#include <locale>
#include <iostream>
struct colon_is_space : std::ctype<char> {
colon_is_space() : std::ctype<char>(get_table()) {}
static mask const* get_table()
{
static mask rc[table_size];
rc[':'] = std::ctype_base::space;
rc['\n'] = std::ctype_base::space;
return &rc[0];
}
};
int main() {
using std::string;
using std::cin;
using std::locale;
cin.imbue(locale(cin.getloc(), new colon_is_space));
string word;
while(cin >> word) {
std::cout << word << "\n";
}
}
Solution 2
For strings, you can use the std::getline
overloads to read using a different delimiter.
For number extraction, the delimiter isn't really "whitespace" to begin with, but any character invalid in a number.
Solution 3
This is an improvement on Robᵩ's answer, because that is the right one (and I'm disappointed that it hasn't been accepted.)
What you need to do is change the array that ctype
looks at to decide what a delimiter is.
In the simplest case you could create your own:
const ctype<char>::mask foo[ctype<char>::table_size] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ctype_base::space};
On my machine '\n'
is 10. I've set that element of the array to the delimiter value: ctype_base::space
. A ctype
initialized with foo
would only delimit on '\n'
not ' '
or '\t'
.
Now this is a problem because the array passed into ctype
defines more than just what a delimiter is, it also defines leters, numbers, symbols, and some other junk needed for streaming. (Ben Voigt's answer touches on this.) So what we really want to do is modify a mask
, not create one from scratch.
That can be accomplished like this:
const auto temp = ctype<char>::classic_table();
vector<ctype<char>::mask> bar(temp, temp + ctype<char>::table_size);
bar[' '] ^= ctype_base::space;
bar['\t'] &= ~(ctype_base::space | ctype_base::cntrl);
bar[':'] |= ctype_base::space;
A ctype
initialized with bar
would delimit on '\n'
and ':'
but not ' '
or '\t'
.
You go about setting up cin
, or any other istream
, to use your custom ctype
like this:
cin.imbue(locale(cin.getloc(), new ctype<char>(data(bar))));
You can also switch between ctype
s and the behavior will change mid-stream:
cin.imbue(locale(cin.getloc(), new ctype<char>(foo)));
If you need to go back to default behavior, just do this:
cin.imbue(locale(cin.getloc(), new ctype<char>));
Solution 4
This is an improvement on Jon's answer, and the example from cppreference.com. So this follows the same premise as both, but combines them with parameterized delimiters.
struct delimiter_ctype : std::ctype<char> {
static const mask* make_table(std::string delims)
{
// make a copy of the "C" locale table
static std::vector<mask> v(classic_table(), classic_table() + table_size);
for(mask m : v){
m &= ~space;
}
for(char d : delims){
v[d] |= space;
}
return &v[0];
}
delimiter_ctype(std::string delims, ::size_t refs = 0) : ctype(make_table(delims), false, refs) {}
};
Cheers!
yotamoo
Updated on March 21, 2020Comments
-
yotamoo about 4 years
I've redirected "cin" to read from a file stream
cin.rdbug(inF.rdbug())
When I use the extraction operator it reads until it reaches a white space character.Is it possible to use another delimiter? I went through the api in cplusplus.com, but didn't find anything.
-
Earth Engine about 11 yearsUsing
new
in uncontrolled way is evil, needless to say that you have notdelete
your struct (and there is no way to delete an unnamed pointer). ALWAYS tryshared_ptr
instead when possible. -
matth about 11 yearsThat is generally excellent advice which does not apply in this specific case. In this case,
std::facet
is a refernce-counted pointer,std::locale::locale
requires a raw pointer, not a shared pointer, andstd::locale::~locale
is defined todelete
the facet pointer. If you have a problem with the interface tolocale
, take it up with the standards committee, not me. See the example program at en.cppreference.com/w/cpp/locale/locale/locale -
Earth Engine about 11 yearsEven though I will suggest to define a wrapper function
get_locale
to wrap those unusual use ofnew
with comments. So the code reviewer will realize there are something wrong with the interface, not the code writer. And this is what I mean for "controled" way of usingnew
. -
Earth Engine about 11 yearsIf not creating new functions, a better way to represent the ownership transfer could be
unique_ptr<colon_is_space>(new colon is_space).release()
. Although it is basically the same thing of your code but more verbose, it indicates that you are transferring pointer ownership. -
Jonathan Mee over 9 yearsI'm not sure how you can say the delimiter isn't "whitespace" for numbers, if
foo
is anint
,istringstream("123 456") >> foo;
puts123
infoo
, not123456
. -
Ben Voigt over 9 years@JonathanMee: I didn't say that whitespace aren't delimiters, I said the set of delimiters is not only whitespace. Try
istringstream("123_456") >> foo;
or Tryistringstream("123|456") >> foo;
-
Jonathan Mee over 9 yearsAhhh, I understand, you're saying that rather than looking for a character defined as
ctype_base::space
the stream is looking for a character not defined asctype_base::digit
. -
Ben Voigt over 9 years@JonathanMee: Right, although it's more complex than that, some punctuation characters are allowed during numeric parsing. And obviously whether it is classified as a space may affect the status flags, but whitespace is not the only thing that causes numeric extraction to stop.
-
Ben Voigt over 9 yearsthat will set
bar['\t']
to zero, probably not intended. To clear a bit, use&~
(bit-wise AND with bit-wise NOT).!
is logical NOT and won't have the desired effect. -
Jonathan Mee over 9 years@BenVoigt Thank you, I wanted to strip out the
space
andcntrl
bits and I accidentally got everything. -
Wolf over 8 yearsDoes it make sense to expect that
std::getline
is optimized for performance? -
Jonathan Mee over 8 years@Wolf streams in general are one of the least performant things in the standard. But typically you're going to use streams with input/output so slow performance will be negligible relative to the cost of the input/output operation. For performance reasons though arrays should be preferred over streams.
-
Ben Voigt over 8 years@JonathanMee: "slow performance will be negligible relative to the cost of the input/output operation" has NEVER been true in my experience. The fact is that in many applications both file I/O and parsing are negligible compared to the cost of other processing, or waiting for the user to hit the start button, or network requests. But in I/O heavy applications built with iostreams, it's the iostream code, not the I/O operations, that dominates.
-
Jonathan Mee over 8 yearsHmmm... I guess it's the type of project that I have a history with. Thanks for the clarification. It's good to have a balancing point of view. I suppose a better answer for @Wolf's question would be: "
getline
is no slower than the stream is as a whole, but if performance is a concern for you, you should look for non-stream options."