remove whitespace in std::string

58,351

Solution 1

Simple combination of std::remove_if and std::string::erase.

Not totally safe version

s.erase( std::remove_if( s.begin(), s.end(), ::isspace ), s.end() );

For safer version replace ::isspace with

std::bind( std::isspace<char>, _1, std::locale::classic() )

(Include all relevant headers)

For a version that works with alternative character types replace <char> with <ElementType> or whatever your templated character type is. You can of course also replace the locale with a different one. If you do that, beware to avoid the inefficiency of recreating the locale facet too many times.

In C++11 you can make the safer version into a lambda with:

[]( char ch ) { return std::isspace<char>( ch, std::locale::classic() ); }

Solution 2

If C++03

struct RemoveDelimiter
{
  bool operator()(char c)
  {
    return (c =='\r' || c =='\t' || c == ' ' || c == '\n');
  }
};

std::string s("\t\tHELLO WORLD\r\nHELLO\t\nWORLD     \t");
s.erase( std::remove_if( s.begin(), s.end(), RemoveDelimiter()), s.end());

Or use C++11 lambda

s.erase(std::remove_if( s.begin(), s.end(), 
     [](char c){ return (c =='\r' || c =='\t' || c == ' ' || c == '\n');}), s.end() );

PS. Erase-remove idiom is used

Solution 3

c++11

std::string input = "\t\tHELLO WORLD\r\nHELLO\t\nWORLD     \t";

auto rs = std::regex_replace(input,std::regex("\\s+"), "");

std::cout << rs << std::endl;

/tmp ❮❮❮ ./play

HELLOWORLDHELLOWORLD

Solution 4

In C++11 you can use a lambda rather than using std::bind:

str.erase(
    std::remove_if(str.begin(), str.end(), 
        [](char c) -> bool
        { 
            return std::isspace<char>(c, std::locale::classic()); 
        }), 
    str.end());

Solution 5

You could use Boost.Algorithm's erase_all

#include <boost/algorithm/string/erase.hpp>
#include <iostream>
#include <string>

int main()
{
    std::string s = "Hello World!";
    // or the more expensive one-liner in case your string is const
    // std::cout << boost::algorithm::erase_all_copy(s, " ") << "\n";
    boost::algorithm::erase_all(s, " "); 
    std::cout << s << "\n";
}

NOTE: as is mentioned in the comments: trim_copy (or its cousins trim_copy_left and trim_copy_right) only remove whitespace from the beginning and end of a string.

Share:
58,351

Related videos on Youtube

Mr. Smith
Author by

Mr. Smith

What say you to three shillings and we forget the about me.

Updated on August 17, 2020

Comments

  • Mr. Smith
    Mr. Smith almost 4 years

    In C++, what's an easy way to turn:

    This std::string

    \t\tHELLO WORLD\r\nHELLO\t\nWORLD     \t
    

    Into:

    HELLOWORLDHELLOWORLD
    
    • Mr. Smith
      Mr. Smith over 11 years
      @tomislav-maric I don't think it's a duplicate of that post, the OP there was working with a cin stream, and thus using iostream functions.
    • CashCow
      CashCow over 11 years
      similar but not exact duplicate, so not voting to close.
    • tmaric
      tmaric over 11 years
      @CashCow I checked it again.. you are right, sorry about that.
    • user
      user over 10 years
  • Ivaylo Strandjev
    Ivaylo Strandjev over 11 years
    @chris ::isspace includes the new line as well: cplusplus.com/reference/cctype/isspace
  • CashCow
    CashCow over 11 years
    it will. isspace will return true for newlines.
  • R. Martinho Fernandes
    R. Martinho Fernandes over 11 years
    isspace has UB for all characters except those in the basic something something. C99 §7.4/1.
  • CashCow
    CashCow over 11 years
    How did you perform your output? Are you sure you didn't stick one in e.g .(std::cout << s << std::endl)
  • chris
    chris over 11 years
    Never mind, it was me being completely stupid and not passing the second argument to erase (I typed one up before the answer).
  • CashCow
    CashCow over 11 years
    @R.MartinhoFernandes does C99 standard apply to C++? C++ has its own standard.
  • R. Martinho Fernandes
    R. Martinho Fernandes over 11 years
    C++98 delegates the behaviour of the C standard library to C89, and C++11 delegates the behaviour of the C standard library to C99.
  • CashCow
    CashCow over 11 years
    @chris Yes as std::remove_if returns an iterator, and erase has an overload for a single iterator, it will indeed compile and not give you the result you want if you forget the second s.end()
  • Mr. Smith
    Mr. Smith over 11 years
    I saw some solutions that used Boost, but I'm not after a trim function, trimming I believe is doing something like XX___XX_ -> XX_XX whereas I want the final solution to be XXXX.
  • chris
    chris over 11 years
    @CashCow, I know, it's completely irritating when you forget it. In my case, I never saw the second argument when reading it how many times before I finally used it, so it's still wired in my brain that it only takes one.
  • Steve Jessop
    Steve Jessop over 11 years
    Doesn't work when there are adjacent space characters. The first one is erased, moving the second one down to position i. Then you go around the loop, increment i, and never check the second one.
  • SelectricSimian
    SelectricSimian over 11 years
    You're right. Fixed it.
  • CashCow
    CashCow over 11 years
    Presumably the -1 from Mr Fernandes for use of ::isspace. perhaps he will enlighten us as to the special locale-based / character-set-based? You know for perfect UTF-8 it is not necessarily even a character-char one-to-one relationship so no functor / lambda will work here officially. The only thing that will work for perfect UTF-8 iteration that might be multi-character is a custom iterator.
  • R. Martinho Fernandes
    R. Martinho Fernandes over 11 years
    FWIW, all the whitespace characters in the example are encoded as single byte sequences in UTF-8, so yes, a simple lambda works for UTF-8.
  • CashCow
    CashCow over 11 years
    You are saying that what looks like a whitespace will never appear as part of a multibyte character? I don't know the UTF-8 standard. The only thing I see as "undefined" are things like &nbsp; (non-breaking space) which is commonly ASCII 160 (or 0xA0) but might vary in other character sets.
  • R. Martinho Fernandes
    R. Martinho Fernandes over 11 years
    My apologies. I got slightly confused about the true nature of the problem :) I knew using isspace was wrong, but I got confused as to the why. The why is related to isspace taking an int and to char being signed. Here is a small program that explains the issue stacked-crooked.com/view?id=817f92f4a2482e5da0b7533285e53edb‌​.
  • R. Martinho Fernandes
    R. Martinho Fernandes over 11 years
    (And as a side note, NBSP is not in ASCII. ASCII has only 128 values).
  • R. Martinho Fernandes
    R. Martinho Fernandes over 11 years
    (And note how this is not about multibyte encodings; any byte with a value higher than 0x7F in the source, regardless of encoding will trigger this issue; even single byte encodings like Latin-1 or Windows-1252 will cause it. Only 7-bit encodings like ASCII work fine)
  • CashCow
    CashCow over 11 years
    Ok I have given the alternative answer that uses std::isspace with a locale.
  • PatchyFog
    PatchyFog almost 9 years
    Doesn't the lambda version require a "return" statement?
  • bmatovu
    bmatovu over 7 years
    For C++ newbies like me _1 is from std::placeholders, and represent future arguments