C++ check if string is space or null
Solution 1
Since you haven't specified an interpretation of characters > 0x7f
, I'm assuming ASCII (i.e. no high characters in the string).
#include <string>
#include <cctype>
// Returns false if the string contains any non-whitespace characters
// Returns false if the string contains any non-ASCII characters
bool is_only_ascii_whitespace( const std::string& str )
{
auto it = str.begin();
do {
if (it == str.end()) return true;
} while (*it >= 0 && *it <= 0x7f && std::isspace(*(it++)));
// one of these conditions will be optimized away by the compiler,
// which one depends on whether char is signed or not
return false;
}
Solution 2
bool isWhitespace(std::string s){
for(int index = 0; index < s.length(); index++){
if(!std::isspace(s[index]))
return false;
}
return true;
}
Solution 3
std::string str = ...;
if (str.empty() || str == " ") {
// It's empty or a single space.
}
Solution 4
std::string mystr = "hello";
if(mystr == " " || mystr == "")
//do something
In breaking a string down, std::stringstream
can be helpful.
Solution 5
You don't have a nullstring "in some of the lines of the files".
But you can have an empty string, i.e. an empty line.
You can use e.g. std::string.length
, or if you like C better, strlen
function.
In order to check for whitespace, the isspace
function is handy, but note that for char
characters the argument should be casted to unsigned char
, e.g., off the cuff,
bool isSpace( char c )
{
typedef unsigned char UChar;
return bool( ::isspace( UChar( c ) ) );
}
Cheers & hth.,
Mark
Updated on April 06, 2020Comments
-
Mark over 3 years
Basically I have string of whitespace
" "
or blocks of whitespace or""
empty in some of the lines of the files and I would like to know if there is a function in C++ that checks this.*note:* As a side question, in C++ if I want to break a string down and check it for a pattern which library should I use? If I want to code it myself which basic functions should I know to manipulate string? Are there any good references?
-
Lightness Races in Orbit over 12 years...which would be completely overkill for such a simple scenario like this. This answer is also lacking any detail.
-
Daniel over 12 yearshe asked for "if I want to break a string down and check it for a pattern which library should I use"
-
Lightness Races in Orbit over 12 yearsAs a "side question". Stack Overflow does not do "side questions". And "regexp" is not a library. It is a broad description of a wide range of Regular Expression engines, implemented by a wide range of libraries.
-
Cheers and hth. - Alf over 12 years-1 generally incorrect call of
std::isspace
. argument needs to be casted tounsigned char
(or equivalent expression). please fix. -
Ben Voigt over 12 years@Alf: Casting to
unsigned char
wouldn't be correct either. When you start supporting non-ASCII characters, you need to know an encoding, start thinking about multi-byte characters, etc. -
Ben Voigt over 12 yearsThis doesn't handle strings at all, nevermind that a "string of whitespace (characters)" has arbitrary length (whitespace is uncountable). And blindly casting to
unsigned char
is usually the wrong thing to do with non-ASCII strings. -
Ben Voigt over 12 years@Alf: I fixed it to never pass negative numbers to
std::isspace
. Do you think there's still a problem? -
Cheers and hth. - Alf over 12 years@Ben: Positive, you have understood correctly that the function doesn't handle a string. It handles a
char
. You have failed to understand the purpose of the cast. It is not a good idea to fill in that void by an assumption of "blindly casting". This cast is necessary to avoid Undefined Behavior, in general. It exemplifies how to use this function & family correctly. Cheers & hth., -
Cheers and hth. - Alf over 12 years@Ben: yes, there's still a problem, namely failure to recognize as whitespace a value that is negative as
char
. That is, that the function can produce a false negative. Simply cast tounsigned char
to fix it, for the default encoding (the actual argument is then implicitly promoted further up toint
, but the total effect is not the same as a direct cast toint
: you should cast tounsigned char
). -
Ben Voigt over 12 years@Alf: You replace undefined behavior with probably wrong behavior. e.g. most non-ASCII characters are represented in UTF-8 these days, and
::isspace
will do the wrong thing if you pass it a UTF-8 lead byte. -
Ben Voigt over 12 years@Alf: What part of "If it's not ASCII, you need to account for multi-byte characters" is unclear? This function is written (and now documented) to work correctly on ASCII strings. If the input isn't ASCII, the logic would be (1) encoding-dependent and (2) much more complicated. A cast to
unsigned char
is not an appropriate fix. -
Cheers and hth. - Alf over 12 years@Ben: Your argument, if correct, would apply to most of the C++ standard library's character handling... :-( Dealing with UTF-8 and other variable length encodings is much more difficult, because the standard library has a fixed size assumption. The function above is the most efficient and general function. As such it can be wrapped with any conditions you want, at the cost of efficiency. Going the other way, producing the efficient and most general function from a limited function, is in general not possible. In essence, you can't get rid of inefficiency once you add it in at bottom.
-
Ben Voigt over 12 years@Alf: AFAICT, all of the C++ standard library character handling is only specified for the basic character set, which means ASCII on all current platforms. Handling of OEM characters > 0x7f is completely implementation-defined and non-portable. C++ doesn't even specify what the range of valid values for a character is, (except that it definitely is a strict superset of 0-127)
-
Ben Voigt over 12 years@Alf: But here, I'll make it obvious that it tests for a string of ASCII whitespace characters, by putting ASCII in the function name. Is that good?
-
Cheers and hth. - Alf over 12 years@Ben: sorry, I don't believe you believe that yourself. You must be aware of the zillions of applications made using C++. Heh.
-
Cheers and hth. - Alf over 12 years@Ben: Yes, in itself. But consider the simpler and more efficient
is_sbcs_whitespace
function, utilizing the checkisspace( (unsigned char) ch )
, where the cast in practice generates no machine code at all. Since ASCII is a Single Byte Character Set any users needing this functionality for ASCII could then use your simpler & more efficient function, and also users with characters encoded as Latin-1 or Windows ANSI Western or other SBCS'es could then use is. So, while the new contract is good ... it can be even better. Well, unless you want to sell also a Latin-1 version, etc... ;-) -
Ben Voigt over 12 years@Alf: I don't think so.
isspace
will be correct for ASCII and exactly one unspecified SBCS. If it works for Latin-1, then it doesn't work for ANSI. -
Ben Voigt over 12 years@Alf: I'm pretty sure that anyone using an extended character set who cares about correctness is using some NLS library (typically iconv) and not relying on C++ standard library string manipulation.
-
Cheers and hth. - Alf over 12 years@Ben:
isspace
isn't hardcoded that way. Its effect depends, as I recall, on the C library's locale. That locale can be selected bysetlocale
. And for that reason it's generally a good idea to callsetlocale( LC_ALL, "" )
at the start of the program. This changes the locale from the C locale (with pure ASCII) to whatever the natural user's locale is on the machine, e.g. one with Windows ANSI. Perhaps I should have mentioned that. I forgot, so, thanks for drilling down to that. -
Cheers and hth. - Alf over 12 years@Ben: tell that to Microsoft. :-)
-
Ben Voigt over 12 years@Alf: And at some point in there, the extra locale handling becomes more expensive than a non-negativity check (if I really cared about performance, I'd tighten the upper and lower bounds to be strict on the set of ASCII whitespace characters, and then do lookup in a small table). Then, there's the overload of
isspace
in<locale>
for handling extended characters in a controlled way. -
Ben Voigt over 12 years@Alf: Microsoft provides NLS in the Win32 API, IIRC. Unfortunately, UTF-8 is not among the supported codepages for nearly all these functions.
-
Cheers and hth. - Alf over 12 years@Ben: I was not talking about any "extra" locale handling. Just that
isspace
isn't hardcoded the way that you apparently thought, and that an initial call tosetlocale
is a good idea to be able to use those C lib functions. The idea you have for improving performance is, AFAIK, how a typicalisspace
implementation works. PS: Latin-1 is a strict subset of Windows ANSI Western, and I don't think the latter adds any whitespace characters. That's irrelevant to our discussion, though, except that you inadvertently (most probably incorrectly?) used those two in your example. Cheers, -
Ben Voigt over 12 years@Alf: Not sure, but I think most implementations of
isspace
does a table lookup for all characters, with no short-circuiting. But what I was saying was that if I cared about performance, I'd use a single table without the locale-specific indirection. And those two codepages -- I assumed that you'd mentioned two signficantly different codepages in your earlier comment. My mistake. -
Ben Voigt almost 10 yearsI got an unexplained downvote with regard to this topic today as well. So it might be strategic voting. Or someone who thinks they know something we don't, but is unwilling to share.
-
Paul Hazen almost 3 yearsDoesn't account for multiple spaces