Why do std::string operations perform poorly?

c++ python performance node.js stl

19,845

Solution 1

It's not that std::string performs poorly (as much as I dislike C++), it's that string handling is so heavily optimized for those other languages.

Your comparisons of string performance are misleading, and presumptuous if they are intended to represent more than just that.

I know for a fact that Python string objects are completely implemented in C, and indeed on Python 2.7, numerous optimizations exist due to the lack of separation between unicode strings and bytes. If you ran this test on Python 3.x you will find it considerably slower.

Javascript has numerous heavily optimized implementations. It's to be expected that string handling is excellent here.

Your Java result may be due to improper string handling, or some other poor case. I expect that a Java expert could step in and fix this test with a few changes.

As for your C++ example, I'd expect performance to slightly exceed the Python version. It does the same operations, with less interpreter overhead. This is reflected in your results. Preceding the test with s.reserve(limit); would remove reallocation overhead.

I'll repeat that you're only testing a single facet of the languages' implementations. The results for this test do not reflect the overall language speed.

I've provided a C version to show how silly such pissing contests can be:

#define _GNU_SOURCE
#include <string.h>
#include <stdio.h>

void test()
{
    int limit = 102 * 1024;
    char s[limit];
    size_t size = 0;
    while (size < limit) {
        s[size++] = 'X';
        if (memmem(s, size, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", 26)) {
            fprintf(stderr, "zomg\n");
            return;
        }
    }
    printf("x's length = %zu\n", size);
}

int main()
{
    test();
    return 0;
}

Timing:

matt@stanley:~/Desktop$ time ./smash 
x's length = 104448

real    0m0.681s
user    0m0.680s
sys     0m0.000s

Solution 2

So I went and played a bit with this on ideone.org.

Here a slightly modified version of your original C++ program, but with the appending in the loop eliminated, so it only measures the call to std::string::find(). Note that I had to cut the number of iterations to ~40%, otherwise ideone.org would kill the process.

#include <iostream>
#include <string>

int main()
{
    const std::string::size_type limit = 42 * 1024;
    unsigned int found = 0;

    //std::string s;
    std::string s(limit, 'X');
    for (std::string::size_type i = 0; i < limit; ++i) {
        //s += 'X';
        if (s.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) != std::string::npos)
            ++found;
    }

    if(found > 0)
        std::cout << "Found " << found << " times!\n";
    std::cout << "x's length = " << s.size() << '\n';

    return 0;
}

My results at ideone.org are time: 3.37s. (Of course, this is highly questionably, but indulge me for a moment and wait for the other result.)

Now we take this code and swap the commented lines, to test appending, rather than finding. Note that, this time, I had increased the number of iterations tenfold in trying to see any time result at all.

#include <iostream>
#include <string>

int main()
{
    const std::string::size_type limit = 1020 * 1024;
    unsigned int found = 0;

    std::string s;
    //std::string s(limit, 'X');
    for (std::string::size_type i = 0; i < limit; ++i) {
        s += 'X';
        //if (s.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) != std::string::npos)
        //    ++found;
    }

    if(found > 0)
        std::cout << "Found " << found << " times!\n";
    std::cout << "x's length = " << s.size() << '\n';

    return 0;
}

My results at ideone.org, despite the tenfold increase in iterations, are time: 0s.

My conclusion: This benchmark is, in C++, highly dominated by the searching operation, the appending of the character in the loop has no influence on the result at all. Was that really your intention?

Solution 3

The idiomatic C++ solution would be:

#include <iostream>
#include <string>
#include <algorithm>

int main()
{
    const int limit = 102 * 1024;
    std::string s;
    s.reserve(limit);

    const std::string pattern("ABCDEFGHIJKLMNOPQRSTUVWXYZ");

    for (int i = 0; i < limit; ++i) {
        s += 'X';
        if (std::search(s.begin(), s.end(), pattern.begin(), pattern.end()) != s.end())
            std::cout << "Omg Wtf found!";
    }
    std::cout << "X's length = " << s.size();
    return 0;
}

I could speed this up considerably by putting the string on the stack, and using memmem -- but there seems to be no need. Running on my machine, this is over 10x the speed of the python solution already..

[On my laptop]

time ./test X's length = 104448 real 0m2.055s user 0m2.049s sys 0m0.001s

Solution 4

That is the most obvious one: please try to do s.reserve(limit); before main loop.

Documentation is here.

I should mention that direct usage of standard classes in C++ in the same way you are used to do it in Java or Python will often give you sub-par performance if you are unaware of what is done behind the desk. There is no magical performance in language itself, it just gives you right tools.

Solution 5

What you are missing here is the inherent complexity of the find search.

You are executing the search 102 * 1024 (104 448) times. A naive search algorithm will, each time, try to match the pattern starting from the first character, then the second, etc...

Therefore, you have a string that is going from length 1 to N, and at each step you search the pattern against this string, which is a linear operation in C++. That is N * (N+1) / 2 = 5 454 744 576 comparisons. I am not as surprised as you are that this would take some time...

Let us verify the hypothesis by using the overload of find that searches for a single A:

Original: 6.94938e+06 ms
Char    : 2.10709e+06 ms

About 3 times faster, so we are within the same order of magnitude. Therefore the use of a full string is not really interesting.

Conclusion ? Maybe that find could be optimized a bit. But the problem is not worth it.

Note: and to those who tout Boyer Moore, I am afraid that the needle is too small, so it won't help much. May cut an order of magnitude (26 characters), but no more.

View more solutions

19,845

Author by

Wu Shu

Updated on June 15, 2022

Comments

Wu Shu almost 2 years

I made a test to compare string operations in several languages for choosing a language for the server-side application. The results seemed normal until I finally tried C++, which surprised me a lot. So I wonder if I had missed any optimization and come here for help.

The test are mainly intensive string operations, including concatenate and searching. The test is performed on Ubuntu 11.10 amd64, with GCC's version 4.6.1. The machine is Dell Optiplex 960, with 4G RAM, and Quad-core CPU.

in Python (2.7.2):

def test():
    x = ""
    limit = 102 * 1024
    while len(x) < limit:
        x += "X"
        if x.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) > 0:
            print("Oh my god, this is impossible!")
    print("x's length is : %d" % len(x))

test()

which gives result:

x's length is : 104448

real    0m8.799s
user    0m8.769s
sys     0m0.008s

in Java (OpenJDK-7):

public class test {
    public static void main(String[] args) {
        int x = 0;
        int limit = 102 * 1024;
        String s="";
        for (; s.length() < limit;) {
            s += "X";
            if (s.indexOf("ABCDEFGHIJKLMNOPQRSTUVWXYZ") > 0)
            System.out.printf("Find!\n");
        }
        System.out.printf("x's length = %d\n", s.length());
    }
}

which gives result:

x's length = 104448

real    0m50.436s
user    0m50.431s
sys     0m0.488s

in Javascript (Nodejs 0.6.3)

function test()
{
    var x = "";
    var limit = 102 * 1024;
    while (x.length < limit) {
        x += "X";
        if (x.indexOf("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) > 0)
            console.log("OK");
    }
    console.log("x's length = " + x.length);
}();

which gives result:

x's length = 104448

real    0m3.115s
user    0m3.084s
sys     0m0.048s

in C++ (g++ -Ofast)

It's not surprising that Nodejs performas better than Python or Java. But I expected libstdc++ would give much better performance than Nodejs, whose result really suprised me.

#include <iostream>
#include <string>
using namespace std;
void test()
{
    int x = 0;
    int limit = 102 * 1024;
    string s("");
    for (; s.size() < limit;) {
        s += "X";
        if (s.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) != string::npos)
            cout << "Find!" << endl;
    }
    cout << "x's length = " << s.size() << endl;
}

int main()
{
    test();
}

which gives result:

x length = 104448

real    0m5.905s
user    0m5.900s
sys     0m0.000s

Brief Summary

OK, now let's see the summary:

javascript on Nodejs(V8): 3.1s
Python on CPython 2.7.2 : 8.8s
C++ with libstdc++: 5.9s
Java on OpenJDK 7: 50.4s

Surprisingly! I tried "-O2, -O3" in C++ but noting helped. C++ seems about only 50% performance of javascript in V8, and even poor than CPython. Could anyone explain to me if I had missed some optimization in GCC or is this just the case? Thank you a lot.

NPE over 12 years

On my machine adding s.reserve(limit) before the loop makes no perceptible difference to performance.
Szabolcs over 12 years

I agree with what you are saying in general, but have you tested this? With gcc 4.6 I don't get any speedup when using string::reserve. Can you show how to do the concatenation in a fast way, exploiting the knowledge of how the classes work in the background?
Admin over 12 years

Is that really an issue here? Each string::operator++ is only appending a single character, so memory reallocation and copying shouldn't be a big drain.
Wu Shu over 12 years

Yes I tried s.reserve(102 * 1024), but no help. It's about 5.895s and little improvement. I think the bottle neck is in string search.
swegi over 12 years

I have just checked the example code myself, and it turns out that almost all of the runtime is spend during string search.
Mike Nakis over 12 years

o_O -- OK, then there is something totally weird going on. Prior to posting my answer I checked the documentation of all the find() and indexOf() methods in all of the above languages to make sure that they all perform straight non-regex, case-sensitive search. So, if search is the problem despite the triviality of the task, I do not know what to say.
Duncan over 12 years

FWIW the difference between Python 2.7 and 3.2 is just under 10%. It is possible that PEP 393 will remove that difference in Python 3.3. Also it might be worth mentioned that searching strings in Python uses a form of Boyer-Moore so when the string gets longer it should gain an advantage over languages that do a plain search.
swegi over 12 years

Well, I checked only the C++ example, I think you are right for the really poor performance of the Java example.
Mihails Strasuns over 12 years

Well, checked this in practice. Replacing s += "X" with string s(102*1024, 'X'); made enormous speed improvement ( real 0m0.003s in my VBox ). std::string::reserve not helping though, despite what I have said ( should have had same effect in my opinion though ). Need to investigate a bit more. Edited: lol, only now have paid attention to the way for loop is stated :) ok, rollback everything
Admin over 12 years

@MikeNakis - a straight search can benefit from knowing the length of the search pattern - limiting the range where it searches and even dismissing the possibility of searching for a long pattern in a short string at all. A C string doesn't know it's own length, so either the find operation will use a linear-time strlen on the pattern for each call, or else it will work without that advanced knowledge of the length and do more work searching as a result.
Duncan over 12 years

@swegi which languages did you check? I expect it may vary between them. With Python 2.7 the code as written takes 13.1s on my system, removing the find call it takes 0.019s. So the string concatenation (at least on Python) is the irrelevant part of the test. This is probably only true because the C version of Python uses reference counting and can do the concatenation in-place when it can detect that there is only one reference to the string.
Steve Jessop over 12 years

std::string::operator+= is the construct offered by C++ for avoiding the thing that in Java causes String concatenation to be slow. std::string is a mutable class, same as StringBuilder. TBH I think it's a bit confusing that the question is "why is C++ slow?", but includes Java results that are waaay slower, prompting various people to explain why the Java results are slow. Which is irrelevant to the question ;-)
Mike Nakis over 12 years

@Steve314 I thought the same, but I tried searching for "A" instead of "ABCDEFGHIJKLMNOPQRSTUVWXYZ" and there is no change in the running time. (Speaking of the C++ sample.)
Matthieu M. over 12 years

@sbi: and that is when one notes than in C++ find is O(N), while in Python indexOf uses Boyer-Moore (as noted by Duncan in a comment). Once again, "batteries included".
Matthieu M. over 12 years

Of course building the string made enormous speed improvement. You then bypass the loop completely... You need to change the loop condition to iterate on a i = 0 variable if you allocate the string first, and then you'll notice that the search is the real issue.
Wu Shu over 12 years

@Matt: Well, the C program is too extreme... I didn't tried to make a battle or contest between languages, for each language has its optimization in different ways. I just want to find a language that can proceed strings with quite good efficiency. The program just described a case that a program reads from an input(console or socket), maybe decrypts it then, and searches through the string for a specified pattern. My test program simplified the procedure and was just a demo, of course. The result just reminds me C++ is not always the sharpest knife. And thanks anyway :)
Admin over 12 years

@MikeNakis - actually, now I think about it, I'd expect that result. A straight string search for "ABC..." in a string of "XXX..." reduces to a straight search for 'A' (each check starting at a particular position fails at the first character - 'A' != 'X'). This even raises questions about just how smart optimizers may be - find the right language and compiler and maybe all the searching will be optimized away (proven unnecessary) at compile-time. With Haskell and GHC, for example, you may find that the final executable just outputs a constant "x's length ..." string.
Wu Shu over 12 years

Well, if you remove string::find, this is just string concatenation, and this would be no much difference between languages/runtimes optimized for string: string in C++ is also much more optimized than in C (string as an array of char). string::find is not just a test for searching algorithm, but also a test for traversing string. I'll make another test.
swegi over 12 years

@Wu Shu: If the specific pattern to search for is fixed and predetermined, you can construct an automaton to search for the pattern. This will be much faster than repeated calls to std::string::find.
Mike Nakis over 12 years

@Steve314 I beg to differ; if that was possible, then the "Halting Problem" would be solvable. en.wikipedia.org/wiki/Halting_problem.
Admin over 12 years

@MikeNakis - this program isn't all possible programs. Haskell may well do what I said, not by recognising a specific special case, but because there's no run-time specific data to block it from executing pretty much the whole program at compile-time. For the more specific optimisation, is it possible for a particular compiler to spot that the string to search will never contain an "A"? Yes. But it's an unlikely case in itself, except in that it might be a special case of a more general (but still not universal) optimisation.
Mike Nakis over 12 years

@Matthieu M.: Boyer-Moore does not gain you anything here, because the first character of the search-for string is not found at all in the search-into string. On the contrary, it may be adding some overhead, needlessly processing the search-for string in every iteration of the loop.
Admin over 12 years

@MikeNakis - while I won't say that will happen, and especially not in any particular language, if you have studied Haskell to any depth it might change some of your assumptions about what is and what is not plausible.
Mike Nakis over 12 years

OK, I got your point, you pointed out my mistake when you said "this program isn't all possible programs".
UncleBens over 12 years

There is no A in the hay-stack, so it should just check each character in the string that it is not found and not look at the other characters of the pattern. You seem to be describing find_any_of method, which again should find the 'X' very fast here.
aman.gupta over 12 years

Are we sure that string::find(const char*) isn't just implemented in terms of string::find(const string&)? If it was, memory allocations could be expensive here.
Matthieu M. over 12 years

@UncleBens: not at all, I am talking about find, which even for a string pattern should stop on the first character of the pattern if it does not match and move on in the haystack. The fact that looking for a single character A (the first character of the pattern) is only 3 times faster confirms my suspicion that it is not the pattern search that is slow, but simply that looking for a pattern in such a long string so many times is plain slow in itself.
Matthieu M. over 12 years

@Kylotan: I tested both. No visible difference.
Matthieu M. over 12 years

@MikeNakis: Indeed, I tested it and even doing the loop invariant code motion by hand (to move the pattern analysis out of the loop) the boyer-moore search was still slower. Therefore I suspect they use something even trickier, perhaps closer to memmem.
Matthieu M. over 12 years

@WuShu: actually, C and C++ are probably the sharpest knives. It's just that Python and Node.js may be chainsaws. It's heavy and sometimes overkill, but when you tire in C++ you appreciate the "batteries included" approach they took in Python.
Alpedar over 12 years

In java, using StringBuilder instead of String speeds it up (on my machine) aproximately 4 times, rest is searching. In java, string is immutable, so what he is doing is atrociously wrong string manipulation in java. Then there is issue of timing VM start instead of timing usefulactions (it is issue for all languages on VM, not just java)
Darren Cook over 12 years

Confirmed. g++ 4.4.3. In my test 5s for search, 12.5s for find (both in the same exe; my test times are longer as I pre-created the string with std::string s(limit,'X'); I.e. search and find had more work to do.) CONCLUSION: stdlib find() on g++ has lots of potential for optimization!
Darren Cook over 12 years

Wow; added a memmem() version, and it is 0.75s (using the same string, accessed via c_str()). (Actually, it was 0; the whole loop seemed to get optimized away; so I added some minor computation to the loop to stop that.) NEW CONCLUSION: find() and search() are doing something weird, that even -O3 cannot optimize, or memmem is using some special CPU feature. Fascinating!
Heptic over 12 years

The reason std::search is faster than std::string::search is the because (by convention?) std::search is implemented in the header which gives the compiler much more room to optimise. std::string::search on the other hand is not. (And because this is calling the function so many times, it makes a big different)
sbi over 11 years

@Heptic: Um. std::string is but a typedef for std::basic_string<char>, which is a template, and as such fully implemented in headers.
Tim Seguine over 10 years

@MatthieuM. WRT C++ find being linear time, am I missing something? Boyer-Moore is also linear time, isn't it?
Matthieu M. over 10 years

@Tim: linear can be faster than linear :)
Tim Seguine over 10 years

@MatthieuM. of course, but the wording of your comment implied a contrast of some sort. OK, no more language lawyering from me.