Difference between strlen(str.c_str()) and str.length() for std::string

13,315

Solution 1

Your understanding is incorrect. Sort of.

std::string may contain chars with the value '\0'; when you extract a C-string, you have no way of knowing how long it is other than to scan for \0s, which by necessity cannot account for "binary data".

This is a limitation of strlen, and one that std::string "fixes" by actually remembering this metadata as a count of chars that it knows are encapsulated.

The standard doesn't really need to "say" anything about it, except that std::string::length gives you the string length, no matter what the value of the chars you inserted into the string, and that is it not prohibited to insert a '\0'. By contrast, strlen is defined to tell you how many chars exist up to the next \0, which is a fundamentally different definition.

There is no explicit wording about this, because there does not need to be. If there were an exception to the very simple rules ("there is a string; it has chars; it can tell you how many chars it has") then that would be stated explicitly… and it's not.

Solution 2

Standard C function std::strlen calculates the length of a character array based on the presence of the terminating zero in the array. On the other hand objects of class std::string may have embedded zeroes. Thus function strlen applied to c_str() can yields result that differs from the value returned by member function length.

Consider a simple example

std::string s( 10, '\0' );

std::cout << s.length() << std::endl;
std::cout << std::strlen( s.c_str() ) << std::endl;

In this case the first output statement will output 10 while the second output statement will output 0.

Moreover if you have a string like for example

std::string s( "Hello" );

and then call member function resize

s.resize( 10 );

then the function appends the original string with four values of type char() that is by zeroes. And member function s.length() returns 10.

Solution 3

The standard has this to say about length() from string

Returns: size().

And size() is defined as

Returns: A count of the number of char-like objects currently in the string.

So as you can see you will get the number of char like objects in the string even if the char like objects value is '\0'.

Share:
13,315
Xlea
Author by

Xlea

Updated on June 19, 2022

Comments

  • Xlea
    Xlea almost 2 years

    As an implicit understanding, I always thought that every implementation of std::string necessarily must satisfy strlen(str.c_str()) == str.length() for every string str.

    What does the C++ standard say about this? (Does it?)

    Background: At least the implementations shipped with Visual C++ and gcc do not have this property. Consider this example (see here for a live example):

    // Output:
    // string says its length is: 13
    // strlen says: 5
    #include <iostream>
    #include <cstring>
    #include <string>
    
    int main() {
      std::string str = "Hello, world!";
      str[5] = 0;
      std::cout << "string says its length is: " << str.length() << std::endl;
      std::cout << "strlen says: " << strlen(str.c_str()) << std::endl;
      return 0;
    }
    

    Of course, the writing operation without str noticing is causing "the problem". But that's not my question. I want to know what the standard has to say about this behavior.

  • Lightness Races in Orbit
    Lightness Races in Orbit about 9 years
    @Xlea: It is not and does not need to be specified explicitly. Strings accept chars (with no mention of a restriction on their value), and the string class can tell you the number of chars in the string. There is no wording to state "except that this is broken in the same way as strlen is, despite the fact that the entire purpose of this class is to improve upon the horrible C-string semantics". I can't prove a negative!
  • Lightness Races in Orbit
    Lightness Races in Orbit about 9 years
    What is ambiguous about "working consistently"? There are strings. Strings contain chars. Strings can tell you how many chars are in them. This function will always work. That's it....
  • Steve Jessop
    Steve Jessop about 9 years
    @Xlea: AFAIK there's no explicit statement in the standard that "str.length() might give a different answer from std::strlen(str.c_str())". But there doesn't have to be, since you can work it out from the definitions of the functions. string::length is defined to return the number of characters in the string, whereas strlen is defined to return the number of characters up to the first 0 character.
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    @LightnessRacesinOrbit, because that's what 21.2 [strings.general] defines to refer to any character type stored in a basic_string. "This Clause describes components for manipulating sequences of any non-array POD (3.9) type. In this Clause such types are called char-like types, and objects of char-like types are called char-like objects or simply characters."
  • Lightness Races in Orbit
    Lightness Races in Orbit about 9 years
    @JonathanWakely: Seems strange wording though? It could have said "CharT objects" instead. I acknowledge that it's well-defined in this clause, but why go to the trouble of introducing a new term? Oh well.
  • NathanOliver
    NathanOliver about 9 years
    The quote was from the definition of basic_string 21.4.4 and maybe they are covering for he fact that it can also hold wide characters which are char-like and not char
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    @LightnessRacesinOrbit, in some contexts where it's used in Clause 21 there is no _CharT template parameter "in scope" for the wording.
  • chris
    chris about 9 years
    @LightnessRacesinOrbit, In the rest of this Clause, the type of the char-like objects held in a basic_string object is designated by charT. - I guess it's just that charT is defined in terms of this, not the other way around.
  • Xlea
    Xlea about 9 years
    @LightnessRacesinOrbit: std::string semantics (it's the most sensible I can imagine) is "vector of chars", then. I perfectly agree that it is the right design for an std::string when designing from scratch. Yet, there is backward compatibility and with this semantics at hand you have to be much more careful when porting old code to std::string. Btw, I can also define "consistent behavior" in C-style strings (i.e., with a special terminating symbol, e.g., given in a trait). So "working consistently" refers right back to the definition of what a "string" is and thus explains nothing.
  • Xlea
    Xlea about 9 years
    Again, that's a matter of definition of the underlying semantics. What is "inside a string"? For example, C defines to be in the string all characters to the right of the start and strictly to the left of the first 0-character (assuming for simplicity that there's one). I guess the best hint is the absolute symmetry of all characters.
  • Lightness Races in Orbit
    Lightness Races in Orbit about 9 years
    @Xlea: C doesn't really define string at all, which is the key here.
  • Lightness Races in Orbit
    Lightness Races in Orbit about 9 years
    @Xlea: "Working consistently" means "functioning per the requirements of the language, for all valid inputs, always". Period.
  • Xlea
    Xlea about 9 years
    @LightnessRacesinOrbit: Precisely. My question was exactly what are those "requirements of the language", question mark. But, let's end the discussion here. I got the point and that it's (only) implicit in the standard.
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    @Xlea, nonsense, it's not implicit. The standard clearly states that length() and size() are the number of elements held by the string, not the number of leftmost non-zero elements. See NathanOliver's answer. As for backward compatibility, std::string is not supposed to be backward compatible with char*, it's supposed to have other semantics such as storing its length. If you want a C string then call c_str() and use the C string functions.
  • Xlea
    Xlea about 9 years
    @JonathanWakely: I disagree. You assume an implicit understanding of what it means for a string to "hold" an element. Why should the C-answer be incorrect? (It may seem pedantic, but a precise, explicit semantic definition is exactly the point of my question.) But let's leave it at that.
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    @Xlea, no, the standard is very explicit in several places. See the basic_string(const char_type* s, size_type n, const Allocator&) constructor which sets the string length to n not strlen(s), see the fact that length() is required to be constant time (not linear like strlen), see the fact that resize(size_type) increases the size by adding '\0' characters to the string! Obviously you can't change the result of strlen() by appending zero bytes to the end! Did you even try to find the answer in the standard before requesting that others provide references for you?
  • Xlea
    Xlea about 9 years
    That's more convincing. Btw, the discussion evolved from a first version of the post, just stating std::string::length 'works consistently' which eludes the answer. So, yeah, sorry for asking a question on stackoverflow.
  • Lightness Races in Orbit
    Lightness Races in Orbit about 9 years
    There was nothing wrong with the original wording (plus I think you meant "evades", not "eludes"); I only changed it because you (as the OP) failed to comprehend it!
  • Xlea
    Xlea about 9 years
    Yes, I'm not a native speaker, thanks for pointing that out; I don't think to rub it in does not improve the emotions here, do you? Now, I think it unfair for Jonathan (or you at that) to downvote the question, just because you disagree with the comments. (Correct me if it wasn't either of you and, in that case, consider this as my apologies.) I would have understood downvoting the comment (is there such a thing?), but not downvoting the answer. I even accepted your (edited) answer, didn't I? What's more, I even upvoted it.
  • Lightness Races in Orbit
    Lightness Races in Orbit about 9 years
    @Xlea: I'm not "rubbing it in", I'm kindly informing you for free so that you can improve in the future. You're welcome. And who is "downvoting the question just because we disagree with the comments"? No idea where you got that notion from. Sorry but I don't really understand what you're talking about.
  • Xlea
    Xlea about 9 years
    Of course. I think it's quite a coincidence that the downvote is not 5min after Jonathan's emotionally quite charged answer. That's no proof so, naturally, mea culpa! Btw, is there a kind of "private chat feature" in stackoverflow, because I don't think this is still relevant for the question?
  • Lightness Races in Orbit
    Lightness Races in Orbit about 9 years
    @Xlea: It's a million miles from proof. I didn't downvote the question and I doubt Jonathan did either. No need for "private chat" (though yes, there is such a thing) as the question is solved!