C++ check if string is space or null

29,637

Solution 1

Since you haven't specified an interpretation of characters > 0x7f, I'm assuming ASCII (i.e. no high characters in the string).

#include <string>
#include <cctype>

// Returns false if the string contains any non-whitespace characters
// Returns false if the string contains any non-ASCII characters
bool is_only_ascii_whitespace( const std::string& str )
{
    auto it = str.begin();
    do {
        if (it == str.end()) return true;
    } while (*it >= 0 && *it <= 0x7f && std::isspace(*(it++)));
             // one of these conditions will be optimized away by the compiler,
             // which one depends on whether char is signed or not
    return false;
}

Solution 2

bool isWhitespace(std::string s){
    for(int index = 0; index < s.length(); index++){
        if(!std::isspace(s[index]))
            return false;
    }
    return true;
}

Solution 3

std::string str = ...;
if (str.empty() || str == " ") {
    // It's empty or a single space.
}

Solution 4

 std::string mystr = "hello";

 if(mystr == " " || mystr == "")
   //do something

In breaking a string down, std::stringstream can be helpful.

Solution 5

You don't have a nullstring "in some of the lines of the files".

But you can have an empty string, i.e. an empty line.

You can use e.g. std::string.length, or if you like C better, strlen function.

In order to check for whitespace, the isspace function is handy, but note that for char characters the argument should be casted to unsigned char, e.g., off the cuff,

bool isSpace( char c )
{
    typedef unsigned char UChar;
    return bool( ::isspace( UChar( c ) ) );
}

Cheers & hth.,

Share:
29,637
Mark
Author by

Mark

Updated on April 06, 2020

Comments

  • Mark
    Mark over 3 years

    Basically I have string of whitespace " " or blocks of whitespace or "" empty in some of the lines of the files and I would like to know if there is a function in C++ that checks this.

    *note:* As a side question, in C++ if I want to break a string down and check it for a pattern which library should I use? If I want to code it myself which basic functions should I know to manipulate string? Are there any good references?

  • Lightness Races in Orbit
    Lightness Races in Orbit over 12 years
    ...which would be completely overkill for such a simple scenario like this. This answer is also lacking any detail.
  • Daniel
    Daniel over 12 years
    he asked for "if I want to break a string down and check it for a pattern which library should I use"
  • Lightness Races in Orbit
    Lightness Races in Orbit over 12 years
    As a "side question". Stack Overflow does not do "side questions". And "regexp" is not a library. It is a broad description of a wide range of Regular Expression engines, implemented by a wide range of libraries.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    -1 generally incorrect call of std::isspace. argument needs to be casted to unsigned char (or equivalent expression). please fix.
  • Ben Voigt
    Ben Voigt over 12 years
    @Alf: Casting to unsigned char wouldn't be correct either. When you start supporting non-ASCII characters, you need to know an encoding, start thinking about multi-byte characters, etc.
  • Ben Voigt
    Ben Voigt over 12 years
    This doesn't handle strings at all, nevermind that a "string of whitespace (characters)" has arbitrary length (whitespace is uncountable). And blindly casting to unsigned char is usually the wrong thing to do with non-ASCII strings.
  • Ben Voigt
    Ben Voigt over 12 years
    @Alf: I fixed it to never pass negative numbers to std::isspace. Do you think there's still a problem?
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    @Ben: Positive, you have understood correctly that the function doesn't handle a string. It handles a char. You have failed to understand the purpose of the cast. It is not a good idea to fill in that void by an assumption of "blindly casting". This cast is necessary to avoid Undefined Behavior, in general. It exemplifies how to use this function & family correctly. Cheers & hth.,
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    @Ben: yes, there's still a problem, namely failure to recognize as whitespace a value that is negative as char. That is, that the function can produce a false negative. Simply cast to unsigned char to fix it, for the default encoding (the actual argument is then implicitly promoted further up to int, but the total effect is not the same as a direct cast to int: you should cast to unsigned char).
  • Ben Voigt
    Ben Voigt over 12 years
    @Alf: You replace undefined behavior with probably wrong behavior. e.g. most non-ASCII characters are represented in UTF-8 these days, and ::isspace will do the wrong thing if you pass it a UTF-8 lead byte.
  • Ben Voigt
    Ben Voigt over 12 years
    @Alf: What part of "If it's not ASCII, you need to account for multi-byte characters" is unclear? This function is written (and now documented) to work correctly on ASCII strings. If the input isn't ASCII, the logic would be (1) encoding-dependent and (2) much more complicated. A cast to unsigned char is not an appropriate fix.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    @Ben: Your argument, if correct, would apply to most of the C++ standard library's character handling... :-( Dealing with UTF-8 and other variable length encodings is much more difficult, because the standard library has a fixed size assumption. The function above is the most efficient and general function. As such it can be wrapped with any conditions you want, at the cost of efficiency. Going the other way, producing the efficient and most general function from a limited function, is in general not possible. In essence, you can't get rid of inefficiency once you add it in at bottom.
  • Ben Voigt
    Ben Voigt over 12 years
    @Alf: AFAICT, all of the C++ standard library character handling is only specified for the basic character set, which means ASCII on all current platforms. Handling of OEM characters > 0x7f is completely implementation-defined and non-portable. C++ doesn't even specify what the range of valid values for a character is, (except that it definitely is a strict superset of 0-127)
  • Ben Voigt
    Ben Voigt over 12 years
    @Alf: But here, I'll make it obvious that it tests for a string of ASCII whitespace characters, by putting ASCII in the function name. Is that good?
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    @Ben: sorry, I don't believe you believe that yourself. You must be aware of the zillions of applications made using C++. Heh.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    @Ben: Yes, in itself. But consider the simpler and more efficient is_sbcs_whitespace function, utilizing the check isspace( (unsigned char) ch ), where the cast in practice generates no machine code at all. Since ASCII is a Single Byte Character Set any users needing this functionality for ASCII could then use your simpler & more efficient function, and also users with characters encoded as Latin-1 or Windows ANSI Western or other SBCS'es could then use is. So, while the new contract is good ... it can be even better. Well, unless you want to sell also a Latin-1 version, etc... ;-)
  • Ben Voigt
    Ben Voigt over 12 years
    @Alf: I don't think so. isspace will be correct for ASCII and exactly one unspecified SBCS. If it works for Latin-1, then it doesn't work for ANSI.
  • Ben Voigt
    Ben Voigt over 12 years
    @Alf: I'm pretty sure that anyone using an extended character set who cares about correctness is using some NLS library (typically iconv) and not relying on C++ standard library string manipulation.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    @Ben: isspace isn't hardcoded that way. Its effect depends, as I recall, on the C library's locale. That locale can be selected by setlocale. And for that reason it's generally a good idea to call setlocale( LC_ALL, "" ) at the start of the program. This changes the locale from the C locale (with pure ASCII) to whatever the natural user's locale is on the machine, e.g. one with Windows ANSI. Perhaps I should have mentioned that. I forgot, so, thanks for drilling down to that.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    @Ben: tell that to Microsoft. :-)
  • Ben Voigt
    Ben Voigt over 12 years
    @Alf: And at some point in there, the extra locale handling becomes more expensive than a non-negativity check (if I really cared about performance, I'd tighten the upper and lower bounds to be strict on the set of ASCII whitespace characters, and then do lookup in a small table). Then, there's the overload of isspace in <locale> for handling extended characters in a controlled way.
  • Ben Voigt
    Ben Voigt over 12 years
    @Alf: Microsoft provides NLS in the Win32 API, IIRC. Unfortunately, UTF-8 is not among the supported codepages for nearly all these functions.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 12 years
    @Ben: I was not talking about any "extra" locale handling. Just that isspace isn't hardcoded the way that you apparently thought, and that an initial call to setlocale is a good idea to be able to use those C lib functions. The idea you have for improving performance is, AFAIK, how a typical isspace implementation works. PS: Latin-1 is a strict subset of Windows ANSI Western, and I don't think the latter adds any whitespace characters. That's irrelevant to our discussion, though, except that you inadvertently (most probably incorrectly?) used those two in your example. Cheers,
  • Ben Voigt
    Ben Voigt over 12 years
    @Alf: Not sure, but I think most implementations of isspace does a table lookup for all characters, with no short-circuiting. But what I was saying was that if I cared about performance, I'd use a single table without the locale-specific indirection. And those two codepages -- I assumed that you'd mentioned two signficantly different codepages in your earlier comment. My mistake.
  • Ben Voigt
    Ben Voigt almost 10 years
    I got an unexplained downvote with regard to this topic today as well. So it might be strategic voting. Or someone who thinks they know something we don't, but is unwilling to share.
  • Paul Hazen
    Paul Hazen almost 3 years
    Doesn't account for multiple spaces