remove whitespace in std::string
Solution 1
Simple combination of std::remove_if
and std::string::erase
.
Not totally safe version
s.erase( std::remove_if( s.begin(), s.end(), ::isspace ), s.end() );
For safer version replace ::isspace
with
std::bind( std::isspace<char>, _1, std::locale::classic() )
(Include all relevant headers)
For a version that works with alternative character types replace <char>
with <ElementType>
or whatever your templated character type is. You can of course also replace the locale with a different one. If you do that, beware to avoid the inefficiency of recreating the locale facet too many times.
In C++11 you can make the safer version into a lambda with:
[]( char ch ) { return std::isspace<char>( ch, std::locale::classic() ); }
Solution 2
If C++03
struct RemoveDelimiter
{
bool operator()(char c)
{
return (c =='\r' || c =='\t' || c == ' ' || c == '\n');
}
};
std::string s("\t\tHELLO WORLD\r\nHELLO\t\nWORLD \t");
s.erase( std::remove_if( s.begin(), s.end(), RemoveDelimiter()), s.end());
Or use C++11 lambda
s.erase(std::remove_if( s.begin(), s.end(),
[](char c){ return (c =='\r' || c =='\t' || c == ' ' || c == '\n');}), s.end() );
PS. Erase-remove idiom is used
Solution 3
c++11
std::string input = "\t\tHELLO WORLD\r\nHELLO\t\nWORLD \t";
auto rs = std::regex_replace(input,std::regex("\\s+"), "");
std::cout << rs << std::endl;
/tmp ❮❮❮ ./play
HELLOWORLDHELLOWORLD
Solution 4
In C++11 you can use a lambda rather than using std::bind:
str.erase(
std::remove_if(str.begin(), str.end(),
[](char c) -> bool
{
return std::isspace<char>(c, std::locale::classic());
}),
str.end());
Solution 5
You could use Boost.Algorithm's erase_all
#include <boost/algorithm/string/erase.hpp>
#include <iostream>
#include <string>
int main()
{
std::string s = "Hello World!";
// or the more expensive one-liner in case your string is const
// std::cout << boost::algorithm::erase_all_copy(s, " ") << "\n";
boost::algorithm::erase_all(s, " ");
std::cout << s << "\n";
}
NOTE: as is mentioned in the comments: trim_copy
(or its cousins trim_copy_left
and trim_copy_right
) only remove whitespace from the beginning and end of a string.
Related videos on Youtube
Mr. Smith
What say you to three shillings and we forget the about me.
Updated on August 17, 2020Comments
-
Mr. Smith almost 4 years
In C++, what's an easy way to turn:
This std::string
\t\tHELLO WORLD\r\nHELLO\t\nWORLD \t
Into:
HELLOWORLDHELLOWORLD
-
Mr. Smith over 11 years@tomislav-maric I don't think it's a duplicate of that post, the OP there was working with a
cin
stream, and thus using iostream functions. -
CashCow over 11 yearssimilar but not exact duplicate, so not voting to close.
-
tmaric over 11 years@CashCow I checked it again.. you are right, sorry about that.
-
user over 10 years
-
-
Ivaylo Strandjev over 11 years@chris
::isspace
includes the new line as well: cplusplus.com/reference/cctype/isspace -
CashCow over 11 yearsit will. isspace will return true for newlines.
-
R. Martinho Fernandes over 11 years
isspace
has UB for all characters except those in the basic something something. C99 §7.4/1. -
CashCow over 11 yearsHow did you perform your output? Are you sure you didn't stick one in e.g .(std::cout << s << std::endl)
-
chris over 11 yearsNever mind, it was me being completely stupid and not passing the second argument to
erase
(I typed one up before the answer). -
CashCow over 11 years@R.MartinhoFernandes does C99 standard apply to C++? C++ has its own standard.
-
R. Martinho Fernandes over 11 yearsC++98 delegates the behaviour of the C standard library to C89, and C++11 delegates the behaviour of the C standard library to C99.
-
CashCow over 11 years@chris Yes as std::remove_if returns an iterator, and erase has an overload for a single iterator, it will indeed compile and not give you the result you want if you forget the second s.end()
-
Mr. Smith over 11 yearsI saw some solutions that used Boost, but I'm not after a
trim
function, trimming I believe is doing something likeXX___XX_
->XX_XX
whereas I want the final solution to beXXXX
. -
chris over 11 years@CashCow, I know, it's completely irritating when you forget it. In my case, I never saw the second argument when reading it how many times before I finally used it, so it's still wired in my brain that it only takes one.
-
Steve Jessop over 11 yearsDoesn't work when there are adjacent space characters. The first one is erased, moving the second one down to position
i
. Then you go around the loop, incrementi
, and never check the second one. -
SelectricSimian over 11 yearsYou're right. Fixed it.
-
CashCow over 11 yearsPresumably the -1 from Mr Fernandes for use of ::isspace. perhaps he will enlighten us as to the special locale-based / character-set-based? You know for perfect UTF-8 it is not necessarily even a character-char one-to-one relationship so no functor / lambda will work here officially. The only thing that will work for perfect UTF-8 iteration that might be multi-character is a custom iterator.
-
R. Martinho Fernandes over 11 yearsFWIW, all the whitespace characters in the example are encoded as single byte sequences in UTF-8, so yes, a simple lambda works for UTF-8.
-
CashCow over 11 yearsYou are saying that what looks like a whitespace will never appear as part of a multibyte character? I don't know the UTF-8 standard. The only thing I see as "undefined" are things like (non-breaking space) which is commonly ASCII 160 (or 0xA0) but might vary in other character sets.
-
R. Martinho Fernandes over 11 yearsMy apologies. I got slightly confused about the true nature of the problem :) I knew using isspace was wrong, but I got confused as to the why. The why is related to
isspace
taking anint
and tochar
being signed. Here is a small program that explains the issue stacked-crooked.com/view?id=817f92f4a2482e5da0b7533285e53edb. -
R. Martinho Fernandes over 11 years(And as a side note, NBSP is not in ASCII. ASCII has only 128 values).
-
R. Martinho Fernandes over 11 years(And note how this is not about multibyte encodings; any byte with a value higher than 0x7F in the source, regardless of encoding will trigger this issue; even single byte encodings like Latin-1 or Windows-1252 will cause it. Only 7-bit encodings like ASCII work fine)
-
CashCow over 11 yearsOk I have given the alternative answer that uses std::isspace with a locale.
-
PatchyFog almost 9 yearsDoesn't the lambda version require a "return" statement?
-
bmatovu over 7 yearsFor C++ newbies like me _1 is from std::placeholders, and represent future arguments