Legality of COW std::string implementation in C++11

31,732

Solution 1

It's not allowed, because as per the standard 21.4.1 p6, invalidation of iterators/references is only allowed for

— as an argument to any standard library function taking a reference to non-const basic_string as an argument.

— Calling non-const member functions, except operator[], at, front, back, begin, rbegin, end, and rend.

For a COW string, calling non-const operator[] would require making a copy (and invalidating references), which is disallowed by the paragraph above. Hence, it's no longer legal to have a COW string in C++11.

Solution 2

The answers by Dave S and gbjbaanb are correct. (And Luc Danton's is correct too, although it's more a side-effect of forbidding COW strings rather than the original rule that forbids it.)

But to clear up some confusion, I'm going to add some further exposition. Various comments link to a comment of mine on the GCC bugzilla which gives the following example:

std::string s("str");
const char* p = s.data();
{
    std::string s2(s);
    (void) s[0];
}
std::cout << *p << '\n';  // p is dangling

The point of that example is to demonstrate why GCC's reference counted (COW) string is not valid in C++11. The C++11 standard requires this code to work correctly. Nothing in the code permits the p to be invalidated in C++11.

Using GCC's old reference-counted std::string implementation, that code has undefined behaviour, because p is invalidated, becoming a dangling pointer. (What happens is that when s2 is constructed it shares the data with s, but obtaining a non-const reference via s[0] requires the data to be unshared, so s does a "copy on write" because the reference s[0] could potentially be used to write into s, then s2 goes out of scope, destroying the array pointed to by p).

The C++03 standard explicitly permits that behaviour in 21.3 [lib.basic.string] p5 where it says that subsequent to a call to data() the first call to operator[]() may invalidate pointers, references and iterators. So GCC's COW string was a valid C++03 implementation.

The C++11 standard no longer permits that behaviour, because no call to operator[]() may invalidate pointers, references or iterators, irrespective of whether they follow a call to data().

So the example above must work in C++11, but does not work with libstdc++'s kind of COW string, therefore that kind of COW string is not permitted in C++11.

Solution 3

It is, CoW is an acceptable mechanism for making faster strings... but...

it makes multithreading code slower (all that locking to check if you're the only one writing kills performance when using a lot of strings). This was the main reason CoW was killed off years ago.

The other reasons are that the [] operator will return you the string data, without any protection for you to overwrite a string someone else expects to be unchanging. The same applies to c_str() and data().

Quick google says that the multithreading is basically the reason it was effectively disallowed (not explicitly).

The proposal says :

Proposal

We propose to make all iterator and element access operations safely concurrently executable.

We are increasing the stability of operations even in sequential code.

This change effectively disallows copy-on-write implementations.

followed by

The largest potential loss in performance due to a switch away from copy-on-write implementations is the increased consumption of memory for applications with very large read-mostly strings. However, we believe that for those applications ropes are a better technical solution, and recommend a rope proposal be considered for inclusion in Library TR2.

Ropes are part of STLPort and SGIs STL.

Solution 4

From 21.4.2 basic_string constructors and assignment operators [string.cons]

basic_string(const basic_string<charT,traits,Allocator>& str);

[...]

2 Effects: Constructs an object of class basic_string as indicated in Table 64. [...]

Table 64 helpfully documents that after construction of an object via this (copy) constructor, this->data() has as value:

points at the first element of an allocated copy of the array whose first element is pointed at by str.data()

There are similar requirements for other similar constructors.

Solution 5

Is COW basic_string prohibited in C++11 and later?

Regarding

Am I correct that C++11 does not admit COW based implementations of std::string?

Yes.

Regarding

If so, is this restriction explicitly stated somewhere in the new standard (where)?

Almost directly, by requirements of constant complexity for a number of operations that would require O(n) physical copying of the string data in a COW implementation.

For example, for the member functions

auto operator[](size_type pos) const -> const_reference;
auto operator[](size_type pos) -> reference;

… which in a COW implementation would ¹both trigger string data copying to un-share the string value, the C++11 standard requires

C++11 §21.4.5/4:

Complexity: constant time.

… which rules out such data copying, and hence, COW.

C++03 supported COW implementations by not having these constant complexity requirements, and by, under certain restrictive conditions, allowing calls to operator[](), at(), begin(), rbegin(), end(), or rend() to invalidate references, pointers and iterators referring to the string items, i.e. to possibly incur a COW data copying. This support was removed in C++11.


Is COW also prohibited via the C++11 invalidation rules?

In another answer which at the time of writing is selected as solution, and which is heavily upvoted and therefore apparently believed, it's asserted that

For a COW string, calling non-const operator[] would require making a copy (and invalidating references), which is disallowed by the [quoted] paragraph above [C++11 §21.4.1/6]. Hence, it's no longer legal to have a COW string in C++11.

That assertion is incorrect and misleading in two main ways:

  • It incorrectly indicates that only the non-const item accessors need to trigger a COW data copying.
    But also the const item accessors need to trigger data copying, because they allow client code to form references or pointers that (in C++11) it's not permitted to invalidate later via the operations that can trigger COW data copying.
  • It incorrectly assumes that COW data copying can cause reference invalidation.
    But in a correct implementation COW data copying, un-sharing the string value, is done at a point before there are any references that can be invalidated.

To see how a correct C++11 COW implementation of basic_string would work, when the O(1) requirements that make this invalid are ignored, think of an implementation where a string can switch between ownership policies. A string instance starts out with policy Sharable. With this policy active there can be no external item references. The instance can transition to Unique policy, and it must do so when an item reference is potentially created such as with a call to .c_str() (at least if that produces a pointer to the internal buffer). In the general case of multiple instances sharing ownership of the value, this entails copying the string data. After that transition to Unique policy the instance can only transition back to Sharable by an operation that invalidates all references, such as assignment.

So, while that answer's conclusion, that COW strings are ruled out, is correct, the reasoning offered is incorrect and strongly misleading.

I suspect the cause of this misunderstanding is a non-normative note in C++11's annex C:

C++11 §C.2.11 [diff.cpp03.strings], about §21.3:

Change: basic_string requirements no longer allow reference-counted strings
Rationale: Invalidation is subtly different with reference-counted strings. This change regularizes behavor (sic) for this International Standard.
Effect on original feature: Valid C ++ 2003 code may execute differently in this International Standard

Here the rationale explains the primary why one decided to remove the C++03 special COW support. This rationale, the why, is not how the standard effectively disallows COW implementation. The standard disallows COW via the O(1) requirements.

In short, the C++11 invalidation rules don't rule out a COW implementation of std::basic_string. But they do rule out a reasonably efficient unrestricted C++03-style COW implementation like the one in at least one of g++'s standard library implementations. The special C++03 COW support allowed practical efficiency, in particular using const item accessors, at the cost of subtle, complex rules for invalidation:

C++03 §21.3/5 which includes “first call” COW support:

References, pointers, and iterators referring to the elements of a basic_string sequence may be invalidated by the following uses of that basic_string object:
— As an argument to non-member functions swap() (21.3.7.8), operator>>() (21.3.7.9), and getline() (21.3.7.9).
— As an argument to basic_string::swap().
— Calling data() and c_str() member functions.
— Calling non-const member functions, except operator[](), at(), begin(), rbegin(), end(), and rend().
— Subsequent to any of the above uses except the forms of insert() and erase() which return iterators, the first call to non-const member functions operator[](), at(), begin(), rbegin(), end(), or rend().

These rules are so complex and subtle that I doubt many programmers, if any, could give a precise summary. I could not.


What if O(1) requirements are disregarded?

If the C++11 constant time requirements on e.g. operator[] are disregarded, then COW for basic_string could be technically feasible, but difficult to implement.

Operations which could access the contents of a string without incurring COW data copying include:

  • Concatenation via +.
  • Output via <<.
  • Using a basic_string as argument to standard library functions.

The latter because the standard library is permitted to rely on implementation specific knowledge and constructs.

Additionally an implementation could offer various non-standard functions for accessing string contents without triggering COW data copying.

A main complicating factor is that in C++11 basic_string item access must trigger data copying (un-sharing the string data) but is required to not throw, e.g. C++11 §21.4.5/3 “Throws: Nothing.”. And so it can't use ordinary dynamic allocation to create a new buffer for COW data copying. One way around this is to use a special heap where memory can be reserved without being actually allocated, and then reserve the requisite amount for each logical reference to a string value. Reserving and un-reserving in such a heap can be constant time, O(1), and allocating the amount that one has already reserved, can be noexcept. In order to comply with the standard's requirements, with this approach it seems there would need to be one such special reservation-based heap per distinct allocator.


Notes:
¹ The const item accessor triggers a COW data copying because it allows the client code to obtain a reference or pointer to the data, which it's not permitted to invalidate by a later data copying triggered by e.g. the non-const item accessor.

Share:
31,732
acm
Author by

acm

LinkedIn: http://www.linkedin.com/in/andrewmorrow GitHub: https://github.com/acmorrow

Updated on July 12, 2022

Comments

  • acm
    acm almost 2 years

    It had been my understanding that copy-on-write is not a viable way to implement a conforming std::string in C++11, but when it came up in discussion recently I found myself unable to directly support that statement.

    Am I correct that C++11 does not admit COW based implementations of std::string?

    If so, is this restriction explicitly stated somewhere in the new standard (where)?

    Or is this restriction implied, in the sense that it is the combined effect of the new requirements on std::string that precludes a COW based implementation of std::string. In this case, I'd be interested in a chapter and verse style derivation of 'C++11 effectively prohibits COW based std::string implementations'.

  • mfontanini
    mfontanini over 11 years
    I think strings in C++11 are guaranteed to be stored contiguously.
  • David Rodríguez - dribeas
    David Rodríguez - dribeas over 11 years
    In the past you had to do the copies in all those situations and it was not a problem...
  • Dirk Holsopple
    Dirk Holsopple over 11 years
    @mfontanini yes, but they weren't previously
  • Christopher Smith
    Christopher Smith over 9 years
    The operator[] issue isn't really a problem. The const variant does offer protection, and the non-const variant always has the option of doing the CoW at that time (or being really crazy and setting up a page fault to trigger it).
  • M.M
    M.M over 9 years
    Some rationale: N2534
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 9 years
    -1 The logic doesn't hold water. At the time of a COW copying there are no references or iterators that can be invalidated, the whole point of doing the copying is that such references or iterators are now being obtained, so copying is necessary. But it may still be that C++11 disallows COW implementations.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 9 years
    +1 Goes to the issues.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 9 years
    +1 Explains how C++11 (at least partially) prohibits COW.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 9 years
    Sorry, I was tired. It doesn't explain anything more than that a call of .data() must trigger COW copying if buffer currently shared. Still it's useful info so I let the upvote stand.
  • Dave S
    Dave S over 9 years
    @Cheersandhth.-Alf: The logic can be seen in the following if COW were allowed: std::string a("something"); char& c1 = a[0]; std::string b(a); char& c2 = a[1]; c1 is a reference to a. You then "copy" a. Then, when you attempt to take the reference the second time, it has to make a copy to get a non-const reference since there are two strings that point to the same buffer. This would have to invalidate the first reference taken, and is against the section quoted above.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf over 9 years
    @DaveS: In the case you sketch there is no buffer sharing, since there are (possible) external references to the original string's buffer. Thus the conlusion about making a copy later is wrong. Just to top it off the assumption that that would invalidate references to the original string's buffer is also wrong, but very irrelevant since the late copying doesn't happen here. More generally it's logically impossible to construct an example that does what you want, because avoiding that case is what COW is designed to do. That's why every attempt ends up doing umpteen fallacies.
  • Tavian Barnes
    Tavian Barnes over 9 years
    @Cheersandhth.-Alf, according to this, at least GCC's COW implementation does do exactly what DaveS is saying. So at least that style of COW is prohibited by the standard.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf about 9 years
    @BenVoigt: The downvote stands: the answer is incorrect and is based simple fallacies. Readers can demonstrate that to themselves by trying to construct a concrete example. (Tavian Barnes links to a bug report for gcc and asserts that the bug demonstrates that a correct implementation could not work within the rules of the standard. That's a fallacy again. Is that what you think shows that my comments are incorrect?)
  • Ben Voigt
    Ben Voigt about 9 years
    @Alf: This answer argues that non-const operator[] (1) must make a copy and that (2) it is illegal for it to do so. Which of those two points do you disagree with? Looking at your first comment, it seems that an implementation could share the string, at least under this requirement, up until the time it is accessed, but that both read and write accesses would need to unshare it. Is that your reasoning?
  • Cheers and hth. - Alf
    Cheers and hth. - Alf about 9 years
    @BenVoigt: quoting myself, "At the time of a COW copying there are no references or iterators that can be invalidated". The assertion that copying necessarily invalidates references is just wrong. No example can be constructed, because it is not true. And I am pretty sure you understood that already.
  • Ben Voigt
    Ben Voigt about 9 years
    For clarity, by "COW copying" are you talking about incrementing the sharing count, or unsharing?
  • Cheers and hth. - Alf
    Cheers and hth. - Alf about 9 years
    The unsharing. Sorry. I thought that "copy" would indicate ... copying.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf about 9 years
    @BenVoigt: I added a proper answer. Contrary to what I expected (but as I held open in my first comment here in this list) it is indeed practically impossible to provide a conforming ref-counted implementation for C++11 and later, due to new non-throw requirements.
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    Although C++11 does guarantee strings are contiguous, that is orthogonal to banning COW strings. GCC's COW string is contiguous, so clearly your claim that "it's not possible to make a useful COW implementation" is bogus.
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    @Cheersandhth.-Alf, what is wrong with my reasoning at gcc.gnu.org/bugzilla/show_bug.cgi?id=21334#c47 ?
  • Cheers and hth. - Alf
    Cheers and hth. - Alf about 9 years
    @JonathanWakely: I see nothing wrong with it. It's a fine bug report. Your reasoning (or rather observations) there appear to 100% valid. On the other hand, the view that that bug means something about the formal, that someone else has claimed, is nonsense.
  • Jonathan Wakely
    Jonathan Wakely about 9 years
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    @Cheersandhth.-Alf, I have added my own answer, which simply expands on the correct answer by Dave S above. Maybe if you now re-read my comments above you will see that I was asking you to provide some justification for your own seemingly extraordinary claims such as "On the other hand, the view that that bug means something about the formal, that someone else has claimed, is nonsense." (that was my own view that you labelled as nonsense, yet I'm the one accused of ad hominem attacks and getting personal when I use the same word! How offensive!)
  • Cheers and hth. - Alf
    Cheers and hth. - Alf about 9 years
    An implementation that unshares on the call to .data() (and on every return of pointer, reference, or iterator) does not suffer from that problem. I.e. (invariant) a buffer is at any time unshared, or else shared with no external refs. I thought you had intended the comment about this example as an informal bug report-as-a-comment, very sorry for misunderstanding it! But as you can see by considering such implementation as I describe here, which works fine in C++11 when noexcept requirements are ignored, the example doesn't say anyting about the formal. I can provide code if you want.
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    If you unshare on almost every access to the string then you lose all the benefit of sharing. A COW implementation has to be practical for a standard library to bother using that as std::string, and I sincerely doubt you can demonstrate a useful, performant COW string that meets the C++11 invalidation requirements. So I maintain that the noexcept specifications that were added at the last minute are a consequence of banning COW strings, not the underlying reason. N2668 seems perfectly clear, why do you continue to deny the clear evidence of the committee's intent outlined there?
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    Also, remember that data() is a const member function, so must be safe to call concurrently with other const members, and for example to call data() concurrently with another thread making a copy of the string. So you're going to need all the overhead of a mutex for every string operation, even const ones, or the complexity of a lock-free mutable reference-counted structure, and after all that you only get sharing if you never modify or access your strings, so many, many strings will have a reference-count of one. Please do provide code, feel free to ignore noexcept guarantees.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf about 9 years
    Just cobbling together some code now I discovered that there are 129 basic_string member function, plus free functions. Abstraction cost: this off-the-cuff non-optimized fresh zeroth version code is 50 to 100% slower with both g++ and MSVC. It doesn't do thread safety (easy enough leveraging shared_ptr, I think) and it's just enough to support sorting a dictionary for purposes of timing, but modulo bugs it proves the point that a reference counted basic_string is permitted, except for C++ noexcept requirements. github.com/alfps/In-principle-demo-of-ref-counted-basic_stri‌​ng
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    shared_ptr won't help you get around the thread-safety problem in my last comment: you need to be able to call data() concurrently in two threads, which means allocating a new string and replacing the currently held characters. shared_ptr does not support concurrent updates, only concurrent reads. So you still need a mutex. Also if you have a const string its shared_ptr member would be const and can't be updated anyway. Given that the main motivation for outlawing COW strings was thread-safety you should not try to bolt that on afterwards, you need to design it in.
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    OK, I see the mutable on your shared_ptr member. That doesn't help with the data races though. There's no need to waste time with allocator support, that's orthogonal to the question at hand. Likewise assign/replace/find etc.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf about 9 years
    The simple idea for thread safety of const functions, to copy the shared_ptr to a local one in each such function, then use the local one. I believe there are some atomic support functions that can do that copying? Not used them.
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    Yep, you can use those to atomically update the shared_ptr, now all your strings are sharing a global pool of mutexes, which you need to use on every call to data() or operator[] or begin() even on a const string. Hello contention. And you'd still have a TOCTTOU race in Refcounted_string_::ensure_unique_buffer(). You might be able to prove that a C++11 COW string is possible, if your std::lib implementation is happy to utterly destroy the program's performance. But back in the real world, that's not an option.
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    It's also not enough to use the atomic ops "in each such function", it's a data race to do e.g. return p_buffer_->capacity(); in one thread while another thread is doing atomic_load(&p_buffer_), so you need to use the atomic ops for every single access to p_buffer_.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf about 9 years
    The vistas you paint about mutexes etc. are a bit too rich (not sure I communicated this well enough), but I think it's safe to say that just as the noexcept constraints are the formal barriers in place for a ref-counted implementation, so the thread safety requirements are the practical barriers, impacting on performance in certain scenarios. I think maybe we can agree on that. Then remains the interpretation of what the OP means by "COW", where I see a reasonable reference counted string type, which is only formally stopped by noexcept.
  • Jonathan Wakely
    Jonathan Wakely about 9 years
    "impacting on performance in certain scenarios". ROFL. OK, now try meeting the complexity requirements in 21.4.5 [string.access]. noexcept is not the problem. It's OK to be wrong sometimes, Alf.
  • Jonathan Wakely
    Jonathan Wakely about 9 years
  • Cheers and hth. - Alf
    Cheers and hth. - Alf about 9 years
    Good observation, those are new C++11 requirements, and they can't be met by a COW implementation. In addition to noexcept can't be met. Re “OK to be wrong”, will you delete this answer? When I get the time (sorry, I've used far too much already) I'll update my answer with both your observations: that (1) C++03 made a special exemption for first time calls, where e.g. s[i] can invalidate a pointer obtained from s.data(), even though in general client code can't know whether a call is first time or not, and (2) new O(1) complexity requirements on string operations, where C++03 had none.
  • Erik Aronesty
    Erik Aronesty about 9 years
    it's just silly that a std::cow_string class wasn't included, with lock_buffer(), etc. there are lots of times i know threading isn't an issue. more often than not, actually.
  • supercat
    supercat over 8 years
    @JonathanWakely: I can imagine some cases where a string which shares backing stores until they're accessed could be useful. For example, concatenating two strings with sharable backing stores could yield a new string object that holds references to both backing stores; concatenating that to something else could build up a tree. If intermediate strings are not used for any purpose except as fodder for future concatenations, such an approach could seem useful. It wouldn't quite be "copy-on-write" in the usual sense, but the idea of sharing data as long as practical would remain.
  • supercat
    supercat over 8 years
    @JonathanWakely: If one concatenates two strings and ask the resulting string for its backing store, it must report a contiguous sequence of characters. Are there any requirements about the storage of strings whose backing store has never been exposed?
  • Jonathan Wakely
    Jonathan Wakely over 8 years
    @supercat, asking for the backing store (e.g. by calling c_str()) must be O(1) and cannot throw, and must not introduce data races, so it's very difficult to meet those requirements if you lazily concatenate. In practice the only reasonable option is to always store contiguous data.
  • Jonathan Wakely
    Jonathan Wakely over 8 years
    @supercat, that would add a lot of overhead but would pessimize all uses that don't involve concatenating strings. Also see the point I made at stackoverflow.com/questions/12199710/…
  • Cheers and hth. - Alf
    Cheers and hth. - Alf almost 8 years
    −1 "The answers by Dave S and gbjbaanb are correct." is a falsehood. In particular, Dave's assertion is irrational, with language that has the look & feel of logic, but isn't. The conclusion at the end in this answer, that the buggy COW in g++'s libstdc++ isn't permitted, is OK.
  • Jonathan Wakely
    Jonathan Wakely almost 8 years
    "But in a correct implementation COW data copying, un-sharing the string value, is done at a point before there are any references that can be invalidated." The example in my answer contradicts this, showing how taking a reference before any sharing takes place gives you a reference that can later become invalid. You can't unshare a string that isn't shared yet. If you're going to downvote other answers because you disagree with the details you need to get your specifics right.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf almost 8 years
    @JonathanWakely: Your example is a good example of an incorrect-for- C++11 implementation. Possibly it was correct for C++03. But this SO question is about C++11 and later. If you're going to downvote other answers because you disagree with the details you need to get your specifics right.
  • Jonathan Wakely
    Jonathan Wakely almost 8 years
    "Your example is a good example of an incorrect-for- C++11 implementation. Possibly it was correct for C++03." Yes that's the point of the example. It shows a COW string that was legal in C++03 because it doesn't break the old iterator invalidation rules and is not legal in C++11 because it does break the new iterator invalidation rules. And it also contradicts the statement I quoted in the comment above.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf almost 8 years
    @JonathanWakely: No, your example of an incorrect implementation doesn't contradict anything. This is the same as with an incorrect attempt to open a door, e.g. sniffing on the hinges. The fact that that incorrect approach doesn't open door, has no bearing at all on whether the door can be opened: it's just stupid.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf almost 8 years
    @Jonathan: In case you fail to see how a correct implementation (modulo ignoring the O(1) requirements, i.e. correct apart from that) would work, when s.c_str() is called s becomes unshared, if it was shared/sharable.
  • Jonathan Wakely
    Jonathan Wakely almost 8 years
    What if s isn't shared yet when you call s.c_str()? How do you deal with later invalidation of the pointer returned by c_str()? The string needs to remember if any reference was ever taken, even while it was uniquely owned, and then never allow sharing. That's impractical in the real world.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf almost 8 years
    @Jonathan: A COW string is initially shared. This means that it acts as if it has a separate value with a reference count, like a shared_ptr.
  • Jonathan Wakely
    Jonathan Wakely almost 8 years
    No, it's initially uniquely owned. That's not shared. It becomes shared when the reference count goes above one. If you're going to redefine what COW means, distinct from what the question is about, then do so somewhere else.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf almost 8 years
    Assuming an incorrect implementation like you do with "No, it's initially uniquely owned", leads nowhere: it's circular logic. Or worse. My sniffing doesn't open the door, it can't be opened. What, use the handle? No, proper door opening, as I define it, is done exclusively with hinge-sniffing.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf almost 8 years
    @JonathanWakely: Again assuming that you mean what you write, and just don't entirely grok things, think of an implementation where a string can switch between ownership policies. A COW string starts out with policy Sharable. It can transition to Unique policy, and it must do so when an item reference is potentially created, such as with a call to .c_str() (at least if that produces a pointer to the internal buffer). This generally entails copying the string data. After that it can only transition Sharable by an operation that invalidates all references, e.g. assignment. Is that clear?
  • Jonathan Wakely
    Jonathan Wakely almost 8 years
    If you'd said sharable not initially shared I would not have argued. Saying something is initially shared is just confusing. Shared with itself? That's not what the word means. But I repeat: your attempt to argue that the C++11 iterator invalidation rules don't forbid some hypothetical COW string that was never used in practice (and would have unacceptable performance), when they most certainly do forbid the kind of COW string that was used in practice, is somewhat academic and pointless.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf almost 8 years
    @JonathanWakely: No, the question about what C++11 permits, is not about g++'s old C++03 COW strings. That just misdirection from you. The question is what C++11 permits or not. And C++11 does not permit COW implementations of basic_string. Is it right to downvote an answer that gets that basic fact right, when it has other bits wrong? Yes, I downvote for incorrectness, when the person refuses to fix things.
  • Nicol Bolas
    Nicol Bolas almost 8 years
    Your proposed COW string is interesting, but I'm not sure how useful it would be. The point of a COW string is to only copy the string data in the event that the two strings are written to. Your suggested implementation requires copying when any user-defined read operation happens. Even if the compiler knows its only a read, it must still copy. Furthermore, copying a Unique string will result in a copy of its string data (presumably to a Sharable state), which again makes COW rather pointless. So without the complexity guarantees, you could write... a really crappy COW string.
  • Nicol Bolas
    Nicol Bolas almost 8 years
    So while you are technically correct that the complexity guarantees are what prevents you from writing any form of COW, it really is [basic.string]/5 that prevents you from writing any genuinely useful form of COW string.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf almost 8 years
    "requires copying when any user-defined read operation happens", no, that is incorrect.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf almost 8 years
    "it really is [basic.string]/5 that prevents you from writing any genuinely useful form of COW string", happily the question wasn't about subjective opinion of usefulness :-), but about what's absolutely permitted or not.
  • Jonathan Wakely
    Jonathan Wakely almost 8 years
    The OP said "It had been my understanding that copy-on-write is not a viable way to implement a conforming std::string in C++11". Unacceptable performance would not be a viable way to implement a conforming std::string. My users would not accept it, and they wouldn't accept it if I told them their subjective opinion on its usefulness was not relevant.
  • Cheers and hth. - Alf
    Cheers and hth. - Alf almost 8 years
    @JonathanWakely: (1) Your quote is not the question. Here is the question: “Am I correct that C++11 does not admit COW based implementations of std::string? If so, is this restriction explicitly stated somewhere in the new standard (where)?” (2) Your opinion that a COW std::string, when disregarding the O(1) requirements, would be inefficient, is your opinion. I don't know what the performance could be, but I think that that assertion is put forward more for the feel of it, for the vibes that it conveys, than for any relevance to this answer.
  • Peregring-lk
    Peregring-lk over 7 years
    @DaveS Is it really invalidated? Because c1 is still alive through b. It is true that &c1 == &a[0] is no longer true, but c1 is a reference to an object (the storage region of the character itself), not to a name (a[0]), and the object is still alive so the reference is well-defined.
  • Voltaire
    Voltaire almost 5 years
    I like the suggestion of an alternative, i.g. ropes. I wonder if there are any other alternatives types and implementations available.