How exactly is std::string_view faster than const std::string&?

62,384

Solution 1

std::string_view is faster in a few cases.

First, std::string const& requires the data to be in a std::string, and not a raw C array, a char const* returned by a C API, a std::vector<char> produced by some deserialization engine, etc. The avoided format conversion avoids copying bytes, and (if the string is longer than the SBO¹ for the particular std::string implementation) avoids a memory allocation.

void foo( std::string_view bob ) {
  std::cout << bob << "\n";
}
int main(int argc, char const*const* argv) {
  foo( "This is a string long enough to avoid the std::string SBO" );
  if (argc > 1)
    foo( argv[1] );
}

No allocations are done in the string_view case, but there would be if foo took a std::string const& instead of a string_view.

The second really big reason is that it permits working with substrings without a copy. Suppose you are parsing a 2 gigabyte json string (!)². If you parse it into std::string, each such parse node where they store the name or value of a node copies the original data from the 2 gb string to a local node.

Instead, if you parse it to std::string_views, the nodes refer to the original data. This can save millions of allocations and halve memory requirements during parsing.

The speedup you can get is simply ridiculous.

This is an extreme case, but other "get a substring and work with it" cases can also generate decent speedups with string_view.

An important part to the decision is what you lose by using std::string_view. It isn't much, but it is something.

You lose implicit null termination, and that is about it. So if the same string will be passed to 3 functions all of which require a null terminator, converting to std::string once may be wise. Thus if your code is known to need a null terminator, and you don't expect strings fed from C-style sourced buffers or the like, maybe take a std::string const&. Otherwise take a std::string_view.

If std::string_view had a flag that stated if it was null terminated (or something fancier) it would remove even that last reason to use a std::string const&.

There is a case where taking a std::string with no const& is optimal over a std::string_view. If you need to own a copy of the string indefinitely after the call, taking by-value is efficient. You'll either be in the SBO case (and no allocations, just a few character copies to duplicate it), or you'll be able to move the heap-allocated buffer into a local std::string. Having two overloads std::string&& and std::string_view might be faster, but only marginally, and it would cause modest code bloat (which could cost you all of the speed gains).


¹ Small Buffer Optimization

² Actual use case.

Solution 2

One way that string_view improves performance is that it allows removing prefixes and suffixes easily. Under the hood, string_view can just add the prefix size to a pointer to some string buffer, or subtract the suffix size from the byte counter, this is usually fast. std::string on the other hand has to copy its bytes when you do something like substr (this way you get a new string that owns its buffer, but in many cases you just want to get part of original string without copying). Example:

std::string str{"foobar"};
auto bar = str.substr(3);
assert(bar == "bar");

With std::string_view:

std::string str{"foobar"};
std::string_view bar{str.c_str(), str.size()};
bar.remove_prefix(3);
assert(bar == "bar");

Update:

I wrote a very simple benchmark to add some real numbers. I used awesome google benchmark library. Benchmarked functions are:

string remove_prefix(const string &str) {
  return str.substr(3);
}
string_view remove_prefix(string_view str) {
  str.remove_prefix(3);
  return str;
}
static void BM_remove_prefix_string(benchmark::State& state) {                
  std::string example{"asfaghdfgsghasfasg3423rfgasdg"};
  while (state.KeepRunning()) {
    auto res = remove_prefix(example);
    // auto res = remove_prefix(string_view(example)); for string_view
    if (res != "aghdfgsghasfasg3423rfgasdg") {
      throw std::runtime_error("bad op");
    }
  }
}
// BM_remove_prefix_string_view is similar, I skipped it to keep the post short

Results

(x86_64 linux, gcc 6.2, "-O3 -DNDEBUG"):

Benchmark                             Time           CPU Iterations
-------------------------------------------------------------------
BM_remove_prefix_string              90 ns         90 ns    7740626
BM_remove_prefix_string_view          6 ns          6 ns  120468514

Solution 3

There are 2 main reasons:

  • string_view is a slice in an existing buffer, it does not require a memory allocation
  • string_view is passed by value, not by reference

The advantages of having a slice are multiple:

  • you can use it with char const* or char[] without allocating a new buffer
  • you can take multiple slices and subslices into an existing buffer without allocating
  • substring is O(1), not O(N)
  • ...

Better and more consistent performance all over.


Passing by value also has advantages over passing by reference, because aliasing.

Specifically, when you have a std::string const& parameter, there is no guarantee that the reference string will not be modified. As a result, the compiler must re-fetch the content of the string after each call into an opaque method (pointer to data, length, ...).

On the other hand, when passing a string_view by value, the compiler can statically determine that no other code can modify the length and data pointers now on the stack (or in registers). As a result, it can "cache" them across function calls.

Solution 4

One thing it can do is avoid constructing an std::string object in the case of an implicit conversion from a null terminated string:

void foo(const std::string& s);

...

foo("hello, world!"); // std::string object created, possible dynamic allocation.
char msg[] = "good morning!";
foo(msg); // std::string object created, possible dynamic allocation.

Solution 5

std::string_view is basically just a wrapper around a const char*. And passing const char* means that there will be one less pointer in the system in comparison with passing const string* (or const string&), because string* implies something like:

string* -> char* -> char[]
           |   string    |

Clearly for the purpose of passing const arguments the first pointer is superfluous.

p.s. One substancial difference between std::string_view and const char*, nevertheless, is that the string_views are not required to be null-terminated (they have built-in size), and this allows for random in-place splicing of longer strings.

Share:
62,384

Related videos on Youtube

Patryk
Author by

Patryk

Software Engineer C++/Go/shell/python coder Linux enthusiast Github profiles: https://github.com/pmalek https://github.com/pmalekn

Updated on June 10, 2020

Comments

  • Patryk
    Patryk almost 4 years

    std::string_view has made it to C++17 and it is widely recommended to use it instead of const std::string&.

    One of the reasons is performance.

    Can someone explain how exactly std::string_view is/will be faster than const std::string& when used as a parameter type? (let's assume no copies in the callee are made)

    • QuestionC
      QuestionC over 7 years
      std::string_view is just an abstraction of the (char * begin, char * end) pair. You use it when making a std::string would be an unnecessary copy.
    • TheArchitect
      TheArchitect over 6 years
      In my opinion the question is not exactly which one is faster, but when to use them. If I need some manipulation on string and it is not permanent and/or keep the original value, string_view is perfect because I don't need to make a copy of string to it. But if I only need to check something on string using string::find for example, then the reference is better.
    • sehe
      sehe about 6 years
      @QuestionC you use it when you don't want your API to restrict to std::string (string_view can accept raw arrays, vectors, std::basic_string<> with non-default allocators etc. etc. etc. Oh, and other string_views obviously)
  • Martin Bonner supports Monica
    Martin Bonner supports Monica over 7 years
    It might be worth saying that const std::string str{"goodbye!"}; foo(str); probably won't be any faster with string_view than with string&
  • Daniel Kamil Kozar
    Daniel Kamil Kozar over 7 years
    It's great that you provided an actual benchmark. This really shows what can be gained in relevant use cases.
  • Pavel Davydov
    Pavel Davydov over 7 years
    @DanielKamilKozar Thanks for the feedback. I also think benchmarks are valuable, sometimes they change everything.
  • n.caillou
    n.caillou over 7 years
    What's with the downvotes? std::string_views are just fancy const char*s, period. GCC implements them like this: class basic_string_view {const _CharT* _M_str; size_t _M_len;}
  • mlvljr
    mlvljr over 7 years
    just get to 65K rep (from your current 65) and this would be the accepted answer (waves to the cargo-cult crowds) :)
  • sehe
    sehe over 6 years
    @mlvljr Nobody passes std::string const*. And that diagram is unintelligible. @ n.caillou: Your own comment is already more accurate than the answer. That makes string_view more than "fancy char const*" - it's really quite obvious.
  • mlvljr
    mlvljr over 6 years
    @sehe I could be that nobody, no problemo (i.e. passing a pointer (or reference) to a const string, why not?) :)
  • n.caillou
    n.caillou over 6 years
    @sehe You do understand that from an optimization or execution perspective, std::string const* and std::string const& are the same, don't you?
  • balki
    balki over 6 years
    Wont string_view be slow as it has to copy two pointers as opposed to one pointer in const string& ?
  • Deduplicator
    Deduplicator over 6 years
    You also lose ownership. Which is only of interest if the string is returned and it might have to be anything besides a sub-string of a buffer which is guaranteed to survive long enough. Actually, the loss of ownership is a very two-edged weapon.
  • phuclv
    phuclv over 4 years
    SBO sounds strange. I've always heard SSO (small string optimization)
  • Yakk - Adam Nevraumont
    Yakk - Adam Nevraumont over 4 years
    @phu Sure; but strings are not the only thing you use the trick on.
  • Daniel Langr
    Daniel Langr about 4 years
    @phuclv SSO is just a specific case of SBO, which stands for small buffer optimization. Alternative terms are small data opt., small object opt., or small size opt..
  • einpoklum
    einpoklum over 3 years
    Couldn't a compiler, theoretically, optimize away the construction of the string in memory, based on the the actual use pattern of a const std::string&?
  • Deduplicator
    Deduplicator over 3 years
    A std::string_view is also often more uniform than a std::string when accessing the stored sequence. That also counts for something. Not to mention that a std::string const& has an additional indirection compared to a std::string_view, with all the costs involved.
  • Deduplicator
    Deduplicator over 3 years
    @einpoklum Sure, if its static type is const std::string (thus disallowing casting const away), or it has all info on how it is used, the compiler should have a good chance of getting it done. The compiler is explicitly allowed to omit allocations by providing the space needed differently as it sees fit.
  • digito_evo
    digito_evo over 2 years
    As a side note, char const*const* argv just shows how far stupidity and ambiguity can go in C++ code...
  • Yakk - Adam Nevraumont
    Yakk - Adam Nevraumont over 2 years
    @digito what is ambiguous about that?
  • green diod
    green diod over 2 years
    @Deduplicator Care providing a simple example of your comment?
  • Deduplicator
    Deduplicator over 2 years
    A valid (though bad) hello world program (newlines omitted) to demonstrate: #include <string> #include <iostream> void f(std::string const& s) { const_cast<std::string&>(s)[0] = 'H'; std::cout << s; } int main() { f("hello world!\n"); }
  • Matthew M.
    Matthew M. about 2 years
    I recommend against this rationale: "and you don't expect strings fed from C-style sourced buffers or the like" Just assume that the string in a std::string_view will never be null-terminated. Don't try to rationalize where the string originated from before the std::string_view. That's just asking for trouble.