Difference between strlen(str.c_str()) and str.length() for std::string
Solution 1
Your understanding is incorrect. Sort of.
std::string
may contain char
s with the value '\0'
; when you extract a C-string, you have no way of knowing how long it is other than to scan for \0
s, which by necessity cannot account for "binary data".
This is a limitation of strlen
, and one that std::string
"fixes" by actually remembering this metadata as a count of char
s that it knows are encapsulated.
The standard doesn't really need to "say" anything about it, except that std::string::length
gives you the string length, no matter what the value of the char
s you inserted into the string, and that is it not prohibited to insert a '\0'
. By contrast, strlen
is defined to tell you how many char
s exist up to the next \0
, which is a fundamentally different definition.
There is no explicit wording about this, because there does not need to be. If there were an exception to the very simple rules ("there is a string; it has char
s; it can tell you how many char
s it has") then that would be stated explicitly… and it's not.
Solution 2
Standard C function std::strlen
calculates the length of a character array based on the presence of the terminating zero in the array.
On the other hand objects of class std::string
may have embedded zeroes. Thus function strlen applied to c_str() can yields result that differs from the value returned by member function length.
Consider a simple example
std::string s( 10, '\0' );
std::cout << s.length() << std::endl;
std::cout << std::strlen( s.c_str() ) << std::endl;
In this case the first output statement will output 10 while the second output statement will output 0.
Moreover if you have a string like for example
std::string s( "Hello" );
and then call member function resize
s.resize( 10 );
then the function appends the original string with four values of type char()
that is by zeroes. And member function s.length()
returns 10.
Solution 3
The standard has this to say about length()
from string
Returns: size().
And size()
is defined as
Returns: A count of the number of char-like objects currently in the string.
So as you can see you will get the number of char like objects in the string even if the char like objects value is '\0'
.
Xlea
Updated on June 19, 2022Comments
-
Xlea almost 2 years
As an implicit understanding, I always thought that every implementation of
std::string
necessarily must satisfystrlen(str.c_str()) == str.length()
for every stringstr
.What does the C++ standard say about this? (Does it?)
Background: At least the implementations shipped with Visual C++ and gcc do not have this property. Consider this example (see here for a live example):
// Output: // string says its length is: 13 // strlen says: 5 #include <iostream> #include <cstring> #include <string> int main() { std::string str = "Hello, world!"; str[5] = 0; std::cout << "string says its length is: " << str.length() << std::endl; std::cout << "strlen says: " << strlen(str.c_str()) << std::endl; return 0; }
Of course, the writing operation without
str
noticing is causing "the problem". But that's not my question. I want to know what the standard has to say about this behavior. -
Lightness Races in Orbit about 9 years@Xlea: It is not and does not need to be specified explicitly. Strings accept
char
s (with no mention of a restriction on their value), and the string class can tell you the number ofchar
s in the string. There is no wording to state "except that this is broken in the same way asstrlen
is, despite the fact that the entire purpose of this class is to improve upon the horrible C-string semantics". I can't prove a negative! -
Lightness Races in Orbit about 9 yearsWhat is ambiguous about "working consistently"? There are strings. Strings contain
char
s. Strings can tell you how manychar
s are in them. This function will always work. That's it.... -
Steve Jessop about 9 years@Xlea: AFAIK there's no explicit statement in the standard that "
str.length()
might give a different answer fromstd::strlen(str.c_str())
". But there doesn't have to be, since you can work it out from the definitions of the functions.string::length
is defined to return the number of characters in the string, whereasstrlen
is defined to return the number of characters up to the first 0 character. -
Jonathan Wakely about 9 years@LightnessRacesinOrbit, because that's what 21.2 [strings.general] defines to refer to any character type stored in a
basic_string
. "This Clause describes components for manipulating sequences of any non-array POD (3.9) type. In this Clause such types are called char-like types, and objects of char-like types are called char-like objects or simply characters." -
Lightness Races in Orbit about 9 years@JonathanWakely: Seems strange wording though? It could have said "
CharT
objects" instead. I acknowledge that it's well-defined in this clause, but why go to the trouble of introducing a new term? Oh well. -
NathanOliver about 9 yearsThe quote was from the definition of basic_string 21.4.4 and maybe they are covering for he fact that it can also hold wide characters which are
char
-like and notchar
-
Jonathan Wakely about 9 years@LightnessRacesinOrbit, in some contexts where it's used in Clause 21 there is no
_CharT
template parameter "in scope" for the wording. -
chris about 9 years@LightnessRacesinOrbit, In the rest of this Clause, the type of the char-like objects held in a basic_string object is designated by charT. - I guess it's just that
charT
is defined in terms of this, not the other way around. -
Xlea about 9 years@LightnessRacesinOrbit:
std::string
semantics (it's the most sensible I can imagine) is "vector of chars", then. I perfectly agree that it is the right design for anstd::string
when designing from scratch. Yet, there is backward compatibility and with this semantics at hand you have to be much more careful when porting old code tostd::string
. Btw, I can also define "consistent behavior" in C-style strings (i.e., with a special terminating symbol, e.g., given in a trait). So "working consistently" refers right back to the definition of what a "string" is and thus explains nothing. -
Xlea about 9 yearsAgain, that's a matter of definition of the underlying semantics. What is "inside a string"? For example, C defines to be in the string all characters to the right of the start and strictly to the left of the first 0-character (assuming for simplicity that there's one). I guess the best hint is the absolute symmetry of all characters.
-
Lightness Races in Orbit about 9 years@Xlea: C doesn't really define string at all, which is the key here.
-
Lightness Races in Orbit about 9 years@Xlea: "Working consistently" means "functioning per the requirements of the language, for all valid inputs, always". Period.
-
Xlea about 9 years@LightnessRacesinOrbit: Precisely. My question was exactly what are those "requirements of the language", question mark. But, let's end the discussion here. I got the point and that it's (only) implicit in the standard.
-
Jonathan Wakely about 9 years@Xlea, nonsense, it's not implicit. The standard clearly states that
length()
andsize()
are the number of elements held by the string, not the number of leftmost non-zero elements. See NathanOliver's answer. As for backward compatibility,std::string
is not supposed to be backward compatible withchar*
, it's supposed to have other semantics such as storing its length. If you want a C string then callc_str()
and use the C string functions. -
Xlea about 9 years@JonathanWakely: I disagree. You assume an implicit understanding of what it means for a string to "hold" an element. Why should the C-answer be incorrect? (It may seem pedantic, but a precise, explicit semantic definition is exactly the point of my question.) But let's leave it at that.
-
Jonathan Wakely about 9 years@Xlea, no, the standard is very explicit in several places. See the
basic_string(const char_type* s, size_type n, const Allocator&)
constructor which sets the string length ton
notstrlen(s)
, see the fact thatlength()
is required to be constant time (not linear likestrlen
), see the fact thatresize(size_type)
increases the size by adding'\0'
characters to the string! Obviously you can't change the result ofstrlen()
by appending zero bytes to the end! Did you even try to find the answer in the standard before requesting that others provide references for you? -
Xlea about 9 yearsThat's more convincing. Btw, the discussion evolved from a first version of the post, just stating std::string::length 'works consistently' which eludes the answer. So, yeah, sorry for asking a question on stackoverflow.
-
Lightness Races in Orbit about 9 yearsThere was nothing wrong with the original wording (plus I think you meant "evades", not "eludes"); I only changed it because you (as the OP) failed to comprehend it!
-
Xlea about 9 yearsYes, I'm not a native speaker, thanks for pointing that out; I don't think to rub it in does not improve the emotions here, do you? Now, I think it unfair for Jonathan (or you at that) to downvote the question, just because you disagree with the comments. (Correct me if it wasn't either of you and, in that case, consider this as my apologies.) I would have understood downvoting the comment (is there such a thing?), but not downvoting the answer. I even accepted your (edited) answer, didn't I? What's more, I even upvoted it.
-
Lightness Races in Orbit about 9 years@Xlea: I'm not "rubbing it in", I'm kindly informing you for free so that you can improve in the future. You're welcome. And who is "downvoting the question just because we disagree with the comments"? No idea where you got that notion from. Sorry but I don't really understand what you're talking about.
-
Xlea about 9 yearsOf course. I think it's quite a coincidence that the downvote is not 5min after Jonathan's emotionally quite charged answer. That's no proof so, naturally, mea culpa! Btw, is there a kind of "private chat feature" in stackoverflow, because I don't think this is still relevant for the question?
-
Lightness Races in Orbit about 9 years@Xlea: It's a million miles from proof. I didn't downvote the question and I doubt Jonathan did either. No need for "private chat" (though yes, there is such a thing) as the question is solved!