c++ std::ostringstream vs std::string::append

c++ string stream

29,533

Solution 1

std::ostringstream is not necessarily stored as a sequential array of characters in memory. You would actually need to have continuous array of characters while sending those HTTP headers and that might copy/modify the internal buffer to make it sequential.

std::string using appropriate std::string::reserve has no reason to act slower than std::ostringstream in this situation.

However, std::ostringstream is probably faster for appending if you absolutely have no idea about the size you have to reserve. If you use std::string and your string grows, it eventually requires reallocation and copying of whole buffer. It would be better to use one std::ostringstream::str() to make the data sequential at once compared to multiple re-allocations that would happen otherwise.

P.S. Pre-C++11 std::string is not required to be sequential either, whilst almost all libraries implement it as sequential. You could risk it or use std::vector<char> instead. You would need to use the following to do appending:

char str[] = ";charset=";
vector.insert(vector.end(), str, str + sizeof(str) - 1);

std::vector<char> would be best for performance because it is most probably cheaper to construct, but it is probably not of importance compared to std::string and the actual time they take to construct. I have done something similar to what you are trying and went with std::vector<char> before. Purely because of logical reasons; vector seemed to fit the job better. You do not actually want string manipulations or such. Also, benchmarks I did later proved it to perform better or maybe it was only because I did not implement operations well enough with std::string.

While choosing, the container that has requirements for your needs and minimal extra features usually does the job best.

Solution 2

constructing a stream object is a significantly more complex operation than constructing a string object, because it has to hold (and, therefore, construct) its std::locale member, among other things needed to maintain state (but the locale is by a large margin the heaviest).

Appending is similar: both maintain a contiguous array of characters, both allocate more when the capacity is exceeded. The only differences I can think of is that when appending to a stream, there is one virtual member function call per overflow (in addition to memory allocation/copying, which dominates overflow handling anyway), and operator<< has to do some extra checks of the stream state.

Also, note that you're calling str(), which copies the entire string one more time, so based on what your code is written to do, the stream example does more and should be slower.

Let's test:

#include <sstream>
#include <string>
#include <numeric>

volatile unsigned int sink;
std::string contentType(50, ' ');
std::string charset(50, ' ');
int main()
{
 for(long n = 0; n < 10000000; ++n)
 {
#ifdef TEST_STREAM    
    std::ostringstream os;
    os << "Content-Type: " << contentType << ";charset=" << charset << "\r\n";
    std::string header = os.str();
#endif
#ifdef TEST_STRING
    std::string header("Content-Type: ");
    header.append(contentType);
    header.append(";charset=");
    header.append(charset);
    header.append("\r\n");
#endif
    sink += std::accumulate(header.begin(), header.end(), 0);
 }
}

that's 10 million repetitions

On my Linux, I get

                   stream         string
g++ 4.8          7.9 seconds      4.4 seconds
clang++/libc++  11.3 seconds      3.3 seconds

so, for this use case, in these two implementations, strings appear to work faster, but obviously both ways have a lot to improve (reserve() the string, move stream construction out of the loop, use a stream that doesn't require copying to access its buffer, etc)

Solution 3

With stream you can have your class Myclass override the << operation so that you can write

MyClass x;
ostringstream y;
y << x;

For append you need to have a ToString method (or something similar) since you can't override the append function of string.

For some code pieces use whatever you feel more comfortable with. Use stream for bigger projects where it's useful to be able to simply stream an object.

29,533

Author by

NickSoft

I'm sure "about me" won't fit in this field even if it's 64kB BLOB. Even if it does I just don't have the time to fill it.

Updated on November 09, 2020

Comments

NickSoft over 3 years
In all examples that use some kind of buffering I see they use stream instead of string. How is std::ostringstream and << operator different than using string.append. Which one is faster and which one uses less resourses (memory).

One difference I know is that you can output different types into output stream (like integer) rather than the limited types that string::append accepts.

Here is an example:
```
std::ostringstream os;
os << "Content-Type: " << contentType << ";charset=" << charset << "\r\n";
std::string header = os.str();
```
vs
```
std::string header("Content-Type: ");
header.append(contentType);
header.append(";charset=");
header.append(charset);
header.append("\r\n");
```
Obviously using stream is shorter, but I think append returns reference to the string so it can be written like this:
```
std::string header("Content-Type: ");
header.append(contentType)
  .append(";charset=")
  .append(charset)
  .append("\r\n");
```
And with output stream you can do:
```
std::string content;
...
os << "Content-Length: " << content.length() << "\r\n";
```
But what about memory usage and speed? Especially when used in a big loop.

Update:

To be more clear the question is: Which one should I use and why? Is there situations when one is preferred or the other? For performance and memory ... well I think benchmark is the only way since every implementation could be different.

Update 2:

Well I don't get clear idea what should I use from the answers which means that any of them will do the job, plus vector. Cubbi did nice benchmark with the addition of Dietmar Kühl that the biggest difference is construction of those objects. If you are looking for an answer you should check that too. I'll wait a bit more for other answers (look previous update) and if I don't get one I think I'll accept Tolga's answer because his suggestion to use vector is already done before which means vector should be less resource hungry.
Havenard over 10 years

Each type invokes a different function and this choice is made at compile time. I don't think this aspect weights in performance at all.
jthill over 10 years

growing buffers on the fly is counterintuitively cheap.
Havenard over 10 years

Using appropriate reserve I agree, otherwise it implies in continuous reallocation of memory and therefore lower performance. And despite the fact ostringstream doesn't store it sequentially (for performance reasons) doesn't mean you cannot fetch it in a continuous buffer with str().c_str().
Cubbi over 10 years

stream buffers are sequential in memory, their entire non-virtual interface (sgetc/sputc/etc), relies on it, since it works through pointers.
NickSoft over 10 years

@Tolga I don't quite understood why I have to bother how stream is stored - sequentially or not. When I need it I can always fetch sequental data as Havenard said uing .str().c_str() or .str().data() combined with .str().length() or size(). The same is valid for std::string. Regardless of implementation you get sequential memory using c_str() or data().
Etherealone over 10 years

@NickSoft it requires an extra operation to make it sequential while you can already have it sequential without any operation, if you need absolute performance. He is probably trying to write a high performance web server and these string operations are usually where the bottleneck is since it is mostly all the web server does.
NickSoft over 10 years

but as Dieter Lücking pointed out you could use + to append strings. You can easily override + operator.
Etherealone over 10 years

@Havenard You are right, I have edited my answer to make the relation between reallocation and serialization more clear.
Slava over 10 years

@Havenard did I say something otherwise?
Sorin over 10 years

True, but not the append function. If you override the + operator you can run into trouble for not overriding all orders, or when the compiler decides to evaluate some other operation first. I'd recommend against overriding + operator, unless your class is some scalar or vector value.
Slava over 10 years

You are forgetting handling something like std::ios_base::width
Havenard over 10 years

Now you mentioned std::vector<char>, I have seen this being used before to implement protocols, I only don't know if they do that because of performance or because it can contain null bytes, ignore charsets etc. This stuff can be important when building buffers that must be binary safe.
Cubbi over 10 years

@Slava edited in an honorable mention as extra payload for stream construction: string's operator<< doesn't have anything special to do when width is zero.
Dietmar Kühl over 10 years

Changing the setup slightly to construct the stream outside the loop and merely resetting it (os.str("")) changes the numbers in interesting ways: the stream is now faster on gcc but slower on clang. I get gcc/string=4.5s, gcc/stream=2.5s, clang/string=2.25s, clang/stream=4.1s: nicely crossed over ;)
NickSoft over 10 years

@Tolga Yes I know it's extra operation, but it's one operation. Growing buffer on string::append() copies data on every append/graw unless memory is preallocated, but you don't know the final size to use string::reserve(). It still seams strange to me to use vector. Is it really a good option and how do you convert the vector to string when you need to send it?
NickSoft over 10 years

@Havenard string or streams have no problems with null bytes, but this is one more example usage of vector. I wander why they bothered to make up stream when people would use vector instead.
NickSoft over 10 years

so unless you are constructing the stream every time it's actually comparable to using string.
Etherealone over 10 years

@NickSoft You would just send the buffer using vector.data(). I explicitly said he should now the size to reserve. This is a http server/client, the headers won't get bigger than a certain size for 90% of requests, he does not need to know the exact size to reserve.
NickSoft over 10 years

vector::data() is C++11 according to cplusplus.com. My project is C++98. If there is no way to get the raw data from vector other than char by char with vector::at() then it can't be used as buffer (efficiently).
Etherealone over 10 years

@NickSoft Since vector is sequential, you can access its buffer by accessing its first element: char const* data = &vector[0];
Etherealone over 10 years

@NickSoft I don't think any headers will exceed 2KB which seems to be a good value for reserve. Even if you have 1 million clients connected it would only use 2GB of RAM which will be nothing compared to what your database server will need to perform decently with that amount of traffic assuming only 5~10% of actual users will be on simultaneously (of course this is a raw assumption and database operations may not exist at all if not very cheap). You can even log statistics and do calculations once a day to find the right size to reserve dynamically.
NickSoft over 10 years

@Tolga by the description in cplusplus.com I can only assume they vector is sequental "outside". They say nothing about how it MUST be implemented internally. Individual elements are accessed by their position in this sequence - position in sequence, i.e. index. No one talks about memory address. I need a bit more to assume sequental memory. Can you give me a quote of well known source?
NickSoft over 10 years

@Tolga Yes, I will pre-allocate headers memory. I can spend as much as I want since I'm replacing php which uses minimum memory of tens of MB. 2kb is nothing compared to that. I'll probably pre-allocate memory for the web document too. I just want to figure out what buffer implementation to use.
Etherealone over 10 years

@NickSoft en.cppreference.com/w/cpp/container/vector : The elements are stored contiguously, which means that elements can be accessed not only through iterators, but also using offsets on regular pointers to elements. This means that a pointer to an element of a vector may be passed to any function that expects a pointer to an element of an array. (By the way I suggest using cppreference.com to lookup things)
NickSoft over 10 years

Well that is written pretty clearly. I use google to search for reference because I'm lazy. The description at cppreference.com is way better (at least about vector).
Vassilis almost 4 years

@jthill, could you explain why is cheap? Any references or examples?
jthill almost 4 years

@Vassilis say you double the buffer size and copy the existing contents every time it winds up too small. Worst case, assuming you started with a one byte buffer, on average every element has been copied twice: all once, half another, a quarter another an eighth another, 1.111111 binary is two.