What are the mechanics of short string optimization in libc++?

46,019

Solution 1

The libc++ basic_string is designed to have a sizeof 3 words on all architectures, where sizeof(word) == sizeof(void*). You have correctly dissected the long/short flag, and the size field in the short form.

what value would __min_cap, the capacity of short strings, take for different architectures?

In the short form, there are 3 words to work with:

  • 1 bit goes to the long/short flag.
  • 7 bits goes to the size.
  • Assuming char, 1 byte goes to the trailing null (libc++ will always store a trailing null behind the data).

This leaves 3 words minus 2 bytes to store a short string (i.e. largest capacity() without an allocation).

On a 32 bit machine, 10 chars will fit in the short string. sizeof(string) is 12.

On a 64 bit machine, 22 chars will fit in the short string. sizeof(string) is 24.

A major design goal was to minimize sizeof(string), while making the internal buffer as large as possible. The rationale is to speed move construction and move assignment. The larger the sizeof, the more words you have to move during a move construction or move assignment.

The long form needs a minimum of 3 words to store the data pointer, size and capacity. Therefore I restricted the short form to those same 3 words. It has been suggested that a 4 word sizeof might have better performance. I have not tested that design choice.

_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

There is a configuration flag called _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT which rearranges the data members such that the "long layout" changes from:

struct __long
{
    size_type __cap_;
    size_type __size_;
    pointer   __data_;
};

to:

struct __long
{
    pointer   __data_;
    size_type __size_;
    size_type __cap_;
};

The motivation for this change is the belief that putting __data_ first will have some performance advantages due to better alignment. An attempt was made to measure the performance advantages, and it was difficult to measure. It won't make the performance worse, and it may make it slightly better.

The flag should be used with care. It is a different ABI, and if accidentally mixed with a libc++ std::string compiled with a different setting of _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT will create run time errors.

I recommend this flag only be changed by a vendor of libc++.

Solution 2

The libc++ implementation is a bit complicated, I'll ignore its alternate design and suppose a little endian computer:

template <...>
class basic_string {
/* many many things */

    struct __long
    {
        size_type __cap_;
        size_type __size_;
        pointer   __data_;
    };

    enum {__short_mask = 0x01};
    enum {__long_mask  = 0x1ul};

    enum {__min_cap = (sizeof(__long) - 1)/sizeof(value_type) > 2 ?
                      (sizeof(__long) - 1)/sizeof(value_type) : 2};

    struct __short
    {
        union
        {
            unsigned char __size_;
            value_type __lx;
        };
        value_type __data_[__min_cap];
    };

    union __ulx{__long __lx; __short __lxx;};

    enum {__n_words = sizeof(__ulx) / sizeof(size_type)};

    struct __raw
    {
        size_type __words[__n_words];
    };

    struct __rep
    {
        union
        {
            __long  __l;
            __short __s;
            __raw   __r;
        };
    };

    __compressed_pair<__rep, allocator_type> __r_;
}; // basic_string

Note: __compressed_pair is essentially a pair optimized for the Empty Base Optimization, aka template <T1, T2> struct __compressed_pair: T1, T2 {};; for all intents and purposes you can consider it a regular pair. Its importance just comes up because std::allocator is stateless and thus empty.

Okay, this is rather raw, so let's check the mechanics! Internally, many functions will call __get_pointer() which itself calls __is_long to determine whether the string is using the __long or __short representation:

bool __is_long() const _NOEXCEPT
    { return bool(__r_.first().__s.__size_ & __short_mask); }

// __r_.first() -> __rep const&
//     .__s     -> __short const&
//     .__size_ -> unsigned char

To be honest, I am not too sure this is Standard C++ (I know the initial subsequence provision in union but do not know how it meshes with an anonymous union and aliasing thrown together), but a Standard Library is allowed to take advantage of implementation defined behavior anyway.

Share:
46,019

Related videos on Youtube

ValarDohaeris
Author by

ValarDohaeris

Updated on July 08, 2022

Comments

  • ValarDohaeris
    ValarDohaeris almost 2 years

    This answer gives a nice high-level overview of short string optimization (SSO). However, I would like to know in more detail how it works in practice, specifically in the libc++ implementation:

    • How short does the string have to be in order to qualify for SSO? Does this depend on the target architecture?

    • How does the implementation distinguish between short and long strings when accessing the string data? Is it as simple as m_size <= 16 or is it a flag that is part of some other member variable? (I imagine that m_size or part of it might also be used to store string data).

    I asked this question specifically for libc++ because I know that it uses SSO, this is even mentioned on the libc++ home page.

    Here are some observations after looking at the source:

    libc++ can be compiled with two slightly different memory layouts for the string class, this is governed by the _LIBCPP_ALTERNATE_STRING_LAYOUT flag. Both of the layouts also distinguish between little-endian and big-endian machines which leaves us with a total of 4 different variants. I will assume the "normal" layout and little-endian in what follows.

    Assuming further that size_type is 4 bytes and that value_type is 1 byte, this is what the first 4 bytes of a string would look like in memory:

    // short string: (s)ize and 3 bytes of char (d)ata
    sssssss0;dddddddd;dddddddd;dddddddd
           ^- is_long = 0
    
    // long string: (c)apacity
    ccccccc1;cccccccc;cccccccc;cccccccc
           ^- is_long = 1
    

    Since the size of the short string is in the upper 7 bits, it needs to be shifted when accessing it:

    size_type __get_short_size() const {
        return __r_.first().__s.__size_ >> 1;
    }
    

    Similarly, the getter and setter for the capacity of a long string uses __long_mask to work around the is_long bit.

    I am still looking for an answer to my first question, i.e. what value would __min_cap, the capacity of short strings, take for different architectures?

    Other standard library implementations

    This answer gives a nice overview of std::string memory layouts in other standard library implementations.

    • Matthieu M.
      Matthieu M. about 10 years
      libc++ being open-source, you can find its string header here, I am checking it out at the moment :)
    • Ali
      Ali about 10 years
    • ValarDohaeris
      ValarDohaeris about 10 years
      @Matthieu M.: I had seen that before, unfortunately it's a very large file, thanks for the help in checking it out.
    • ValarDohaeris
      ValarDohaeris about 10 years
      @Ali: I have stumbled over this in googling around. However, this blog post explicitly says that it is only an illustration of SSO and not a highly optimized variant that would be used in practice.
  • ValarDohaeris
    ValarDohaeris about 10 years
    Thank you for this detailed answer! The only piece I am missing is what __min_cap would evaluate to for different architectures, I am not sure what sizeof() will return and how it is influenced by aliasing.
  • justin
    justin about 10 years
    @ValarDohaeris it's implementation defined. typically, you would expect 3 * the size of one pointer in this case, which would be 12 octets on a 32 bit arch and 24 on a 64 bit arch.
  • TemplateRex
    TemplateRex about 10 years
    Not sure if there is license compatibility between libc++ and Facebook Folly, but the FBstring manages to store an extra char (i.e. 23) by changing size to remaining capacity, so that it can do double duty as null terminator for a short string of 23 chars.
  • Howard Hinnant
    Howard Hinnant about 10 years
    @TemplateRex: That is clever. However if libc++ adopts it would require libc++ to give up one other characteristic that I like about its std::string: A default constructed string is all 0 bits. That makes default construction super efficient. And if you're willing to bend the rules, sometimes even free. E.g. you could calloc memory and just declare it to be full of default constructed strings.
  • TemplateRex
    TemplateRex about 10 years
    Ah, 0-init is nice indeed! BTW, FBstring has 2 flag bits, indicating short, intermediate and large strings. It uses the SSO for strings up to 23 chars, and then uses a malloc-ed memory region for strings up to 254 chars and beyond that they do COW (no longer legal in C++11, I know).
  • phuclv
    phuclv over 7 years
    Why can't size and capacity be stored in ints so that the class can be packed to only 16 bytes on 64-bit architectures?
  • Howard Hinnant
    Howard Hinnant over 7 years
    @LưuVĩnhPhúc: I wanted to allow strings greater than 2Gb on 64-bit. The cost is admittedly a greater sizeof. But at the same time the internal buffer for char goes from 14 to 22, which is a pretty good benefit.
  • Kerrek SB
    Kerrek SB over 7 years
    @HowardHinnant: Thanks! Don't you also get an extra byte out of this new layout, because the terminal null can now double as the short flag?
  • Howard Hinnant
    Howard Hinnant over 7 years
    @KerrekSB: That sounds like a good idea, but to the best of my knowledge, has not been implemented. At least I didn't do it when I added _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT in 2013. If implemented, it should be reflected in __min_cap (which it currently isn't).
  • DevSolar
    DevSolar about 6 years
    Thank you for the explanations, that is really helpful. One question though... why not use the MSBit as short / long flag? Wouldn't that free up 7 more bits for long string capacity...? Granted, that's mostly theoretical as the OS probably won't support heap allocations of that size. I was just wondering if I am missing a point somehow.
  • Howard Hinnant
    Howard Hinnant about 6 years
    There's actually a twisty maze of options between _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT and big/little endian where that short/long bit jumps around between MSB and LSB and I honestly don't recall the details.