C++11 rvalues and move semantics confusion (return statement)

125,645

Solution 1

First example

std::vector<int> return_vector(void)
{
    std::vector<int> tmp {1,2,3,4,5};
    return tmp;
}

std::vector<int> &&rval_ref = return_vector();

The first example returns a temporary which is caught by rval_ref. That temporary will have its life extended beyond the rval_ref definition and you can use it as if you had caught it by value. This is very similar to the following:

const std::vector<int>& rval_ref = return_vector();

except that in my rewrite you obviously can't use rval_ref in a non-const manner.

Second example

std::vector<int>&& return_vector(void)
{
    std::vector<int> tmp {1,2,3,4,5};
    return std::move(tmp);
}

std::vector<int> &&rval_ref = return_vector();

In the second example you have created a run time error. rval_ref now holds a reference to the destructed tmp inside the function. With any luck, this code would immediately crash.

Third example

std::vector<int> return_vector(void)
{
    std::vector<int> tmp {1,2,3,4,5};
    return std::move(tmp);
}

std::vector<int> &&rval_ref = return_vector();

Your third example is roughly equivalent to your first. The std::move on tmp is unnecessary and can actually be a performance pessimization as it will inhibit return value optimization.

The best way to code what you're doing is:

Best practice

std::vector<int> return_vector(void)
{
    std::vector<int> tmp {1,2,3,4,5};
    return tmp;
}

std::vector<int> rval_ref = return_vector();

I.e. just as you would in C++03. tmp is implicitly treated as an rvalue in the return statement. It will either be returned via return-value-optimization (no copy, no move), or if the compiler decides it can not perform RVO, then it will use vector's move constructor to do the return. Only if RVO is not performed, and if the returned type did not have a move constructor would the copy constructor be used for the return.

Solution 2

None of them will copy, but the second will refer to a destroyed vector. Named rvalue references almost never exist in regular code. You write it just how you would have written a copy in C++03.

std::vector<int> return_vector()
{
    std::vector<int> tmp {1,2,3,4,5};
    return tmp;
}

std::vector<int> rval_ref = return_vector();

Except now, the vector is moved. The user of a class doesn't deal with it's rvalue references in the vast majority of cases.

Solution 3

The simple answer is you should write code for rvalue references like you would regular references code, and you should treat them the same mentally 99% of the time. This includes all the old rules about returning references (i.e. never return a reference to a local variable).

Unless you are writing a template container class that needs to take advantage of std::forward and be able to write a generic function that takes either lvalue or rvalue references, this is more or less true.

One of the big advantages to the move constructor and move assignment is that if you define them, the compiler can use them in cases were the RVO (return value optimization) and NRVO (named return value optimization) fail to be invoked. This is pretty huge for returning expensive objects like containers & strings by value efficiently from methods.

Now where things get interesting with rvalue references, is that you can also use them as arguments to normal functions. This allows you to write containers that have overloads for both const reference (const foo& other) and rvalue reference (foo&& other). Even if the argument is too unwieldy to pass with a mere constructor call it can still be done:

std::vector vec;
for(int x=0; x<10; ++x)
{
    // automatically uses rvalue reference constructor if available
    // because MyCheapType is an unamed temporary variable
    vec.push_back(MyCheapType(0.f));
}


std::vector vec;
for(int x=0; x<10; ++x)
{
    MyExpensiveType temp(1.0, 3.0);
    temp.initSomeOtherFields(malloc(5000));

    // old way, passed via const reference, expensive copy
    vec.push_back(temp);

    // new way, passed via rvalue reference, cheap move
    // just don't use temp again,  not difficult in a loop like this though . . .
    vec.push_back(std::move(temp));
}

The STL containers have been updated to have move overloads for nearly anything (hash key and values, vector insertion, etc), and is where you will see them the most.

You can also use them to normal functions, and if you only provide an rvalue reference argument you can force the caller to create the object and let the function do the move. This is more of an example than a really good use, but in my rendering library, I have assigned a string to all the loaded resources, so that it is easier to see what each object represents in the debugger. The interface is something like this:

TextureHandle CreateTexture(int width, int height, ETextureFormat fmt, string&& friendlyName)
{
    std::unique_ptr<TextureObject> tex = D3DCreateTexture(width, height, fmt);
    tex->friendlyName = std::move(friendlyName);
    return tex;
}

It is a form of a 'leaky abstraction' but allows me to take advantage of the fact I had to create the string already most of the time, and avoid making yet another copying of it. This isn't exactly high-performance code but is a good example of the possibilities as people get the hang of this feature. This code actually requires that the variable either be a temporary to the call, or std::move invoked:

// move from temporary
TextureHandle htex = CreateTexture(128, 128, A8R8G8B8, string("Checkerboard"));

or

// explicit move (not going to use the variable 'str' after the create call)
string str("Checkerboard");
TextureHandle htex = CreateTexture(128, 128, A8R8G8B8, std::move(str));

or

// explicitly make a copy and pass the temporary of the copy down
// since we need to use str again for some reason
string str("Checkerboard");
TextureHandle htex = CreateTexture(128, 128, A8R8G8B8, string(str));

but this won't compile!

string str("Checkerboard");
TextureHandle htex = CreateTexture(128, 128, A8R8G8B8, str);

Solution 4

Not an answer per se, but a guideline. Most of the time there is not much sense in declaring local T&& variable (as you did with std::vector<int>&& rval_ref). You will still have to std::move() them to use in foo(T&&) type methods. There is also the problem that was already mentioned that when you try to return such rval_ref from function you will get the standard reference-to-destroyed-temporary-fiasco.

Most of the time I would go with following pattern:

// Declarations
A a(B&&, C&&);
B b();
C c();

auto ret = a(b(), c());

You don't hold any refs to returned temporary objects, thus you avoid (inexperienced) programmer's error who wish to use a moved object.

auto bRet = b();
auto cRet = c();
auto aRet = a(std::move(b), std::move(c));

// Either these just fail (assert/exception), or you won't get 
// your expected results due to their clean state.
bRet.foo();
cRet.bar();

Obviously there are (although rather rare) cases where a function truly returns a T&& which is a reference to a non-temporary object that you can move into your object.

Regarding RVO: these mechanisms generally work and compiler can nicely avoid copying, but in cases where the return path is not obvious (exceptions, if conditionals determining the named object you will return, and probably couple others) rrefs are your saviors (even if potentially more expensive).

Solution 5

None of those will do any extra copying. Even if RVO isn't used, the new standard says that move construction is preferred to copy when doing returns I believe.

I do believe that your second example causes undefined behavior though because you're returning a reference to a local variable.

Share:
125,645
Tarantula
Author by

Tarantula

Updated on March 06, 2020

Comments

  • Tarantula
    Tarantula about 4 years

    I'm trying to understand rvalue references and move semantics of C++11.

    What is the difference between these examples, and which of them is going to do no vector copy?

    First example

    std::vector<int> return_vector(void)
    {
        std::vector<int> tmp {1,2,3,4,5};
        return tmp;
    }
    
    std::vector<int> &&rval_ref = return_vector();
    

    Second example

    std::vector<int>&& return_vector(void)
    {
        std::vector<int> tmp {1,2,3,4,5};
        return std::move(tmp);
    }
    
    std::vector<int> &&rval_ref = return_vector();
    

    Third example

    std::vector<int> return_vector(void)
    {
        std::vector<int> tmp {1,2,3,4,5};
        return std::move(tmp);
    }
    
    std::vector<int> &&rval_ref = return_vector();
    
  • Tarantula
    Tarantula over 13 years
    Are you really sure that the third example is going to do vector copy ?
  • Puppy
    Puppy over 13 years
    @Tarantula: It's going to bust your vector. Whether or not it did or didn't copy it before breaking doesn't really matter.
  • fredoverflow
    fredoverflow over 13 years
    I don't see any reason for the busting you propose. It is perfectly fine to bind a local rvalue reference variable to an rvalue. In that case, the temporary object's lifetime is extended to the lifetime of the rvalue reference variable.
  • Keith
    Keith about 11 years
    So, from what I gather the best thing to do is for objects to have a move constructor. I probably should just google this, but I'm being lazy at the moment; are there any common guidelines for compilers on RVO?
  • Howard Hinnant
    Howard Hinnant about 11 years
    Compilers will RVO when you return a local object by value, and the type of the local and the return of the function are the same, and neither is cv-qualified (don't return const types). Stay away from returning with the condition (:?) statement as it can inhibit RVO. Don't wrap the local in some other function that returns a reference to the local. Just return my_local;. Multiple return statements are ok and will not inhibit RVO.
  • boycy
    boycy about 11 years
    There is a caveat: when returning a member of a local object, the move must be explicit.
  • NoSenseEtAl
    NoSenseEtAl about 11 years
    hi, can you elaborate on this: " rval_ref now holds a reference to the destructed tmp inside the function. " Do you mean temporary created in the return line, or func local variable named tmp.
  • Howard Hinnant
    Howard Hinnant about 11 years
    @NoSenseEtAl: There is no temporary created on the return line. move doesn't create a temporary. It casts an lvalue to an xvalue, making no copies, creating nothing, destroying nothing. That example is the exact same situation as if you returned by lvalue-reference and removed the move from the return line: Either way you've got a dangling reference to a local variable inside the function and which has been destructed.
  • Daniel Frey
    Daniel Frey about 11 years
    Just a nit: Since you named the variable (tmp) in the "Best practice" section, it is the NRVO that kicks in, not the RVO. These are two different optimizations. Other than that, great answer!
  • greenoldman
    greenoldman about 10 years
    @HowardHinnant, why result type cannot have const for RVO?
  • Howard Hinnant
    Howard Hinnant about 10 years
    @greenoldman: I was mistaken. RVO can work with const return types. It is just a bad idea to do so. If the RVO fails, move semantics will not kick in.
  • Deduplicator
    Deduplicator almost 10 years
    "Multiple return statements are ok and will not inhibit RVO": Only if they return the same variable.
  • Howard Hinnant
    Howard Hinnant almost 10 years
    @Deduplicator: You are correct. I was not speaking as accurately as I intended. I meant that multiple return statements do not forbid the compiler from RVO (even though it does make it impossible to implement), and therefore the return expression is still considered an rvalue.
  • void.pointer
    void.pointer almost 10 years
    In all of this we are talking about the RETURN operation possibly being an implicit move. In the case of an implicit move, how is the subsequent assignment affected? The return value of the function is by-value, not an rvalue reference, so how will std::vector know to use a move for the construction of the local variable at the call site?
  • Howard Hinnant
    Howard Hinnant almost 10 years
    @RobertDailey: The expression return_vector() is an rvalue, since the function is returning an object by value. When that expression is used to construct an object at the call site, overload resolution will choose a move constructor if it exists. If the object is already constructed, then overload resolution will instead choose an assignment operator. Since the rhs is an rvalue, it will choose the move assignment operator if it exists.
  • Mark Lakata
    Mark Lakata over 9 years
    Just a point of clarification, since I'm learning this. In this new example, the vector tmp is not moved into rval_ref, but written directly into rval_ref using RVO (i.e. copy elision). There is a distinction between std::move and copy elision. A std::move may still involve some data to be copied; in the case of a vector, a new vector is actually constructed in the copy constructor and data is allocated, but the bulk of the data array is only copied by copying the pointer (essentially). The copy elision avoids 100% of all copies.
  • gedamial
    gedamial about 8 years
    I don't understand the "if the compiler decides"... I DECIDE, I'M the programmer. Why should it do something I didn't tell it to do?
  • Howard Hinnant
    Howard Hinnant about 8 years
    @gedamial: The C++ standard says that the compiler writers get to make some of the decisions. One of those is RVO. Under a very specific set of circumstances, the compiler is allowed but not required to perform RVO. And the compiler, not you, gets to decide whether or not it performs RVO. There has been some talk about requiring RVO, but at this time, that has not been standardized.
  • gedamial
    gedamial about 8 years
    Actually I'm having troubles with NRVO/RVO, please see stackoverflow.com/questions/35506708/… and/or cplusplus.com/forum/general/187009
  • Howard Hinnant
    Howard Hinnant about 8 years
    @gedamial: Ok, I looked at the SO question. It looks like you answered it yourself, and I don't have anything to add to your answer.
  • gedamial
    gedamial about 8 years
    @HowardHinnant I don't think my answer is correct: I heard that copy elision CAN occur even when there are more than 1 return statements. I'd like to know when a copy elision is forbidden
  • Howard Hinnant
    Howard Hinnant about 8 years
    @gedamial: Ok, I've given it a shot.
  • Curious
    Curious over 6 years
    @HowardHinnant Strange question but why do standard library components return by rvalue reference and not by value? Is it to be more efficient in the case where the value is not required to be moved from and is just discarded after the fetch? I ask because this case causes undefined behavior wandbox.org/permlink/kUqjfOWWRP6N57eS
  • Howard Hinnant
    Howard Hinnant over 6 years
    @Curious: I can't think of a good reason to return a reference to an rvalue. I made this same mistake in 2005 when proposing rvalue-overloads for string+string but fortunately corrected it prior to C++11 being finalized. Here is a somewhat clunky way to work around it: wandbox.org/permlink/HQPGEOMAUXwUCMb4
  • Daniel Langr
    Daniel Langr about 6 years
    @MarkLakata This is NRVO, not RVO. NRVO is optional, even in C++17. If it is not applied, both return value and rval_ref variables are constructed using move constructor of std::vector. There is no copy constructor involved both with / without std::move. tmp is treated as an rvalue in return statement in this case.
  • Mark Lakata
    Mark Lakata about 6 years
    @DanielLangr is correct. In this case, because the return value is named tmp, then NRVO might apply (or may not, since it is optional). If return_vector was simply {return std::vector{1,2,3,4,5}; it would be RVO. My point was that with a decent compiler that can do RVO and NRVO, rval_ref is not copy constructed or move constructed - it is directly constructed as std::vector<int>{1,2,3,4,5}.
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com over 5 years
    Am I correct that using return_vector(std::vector& x) + vector::resize(0) could save a few mallocs across multiple return_vector calls, and be even more time efficient than NRVO, at the cost of larger memory usage overall? More precise code at: stackoverflow.com/questions/10476665/…
  • Howard Hinnant
    Howard Hinnant over 5 years
    @CiroSantilli新疆改造中心六四事件法轮功 Yes you are correct. And you are also correct that counting allocations/deallocations is a good technique for estimating performance.
  • gansub
    gansub over 4 years
    @HowardHinnant In this article - ibm.com/developerworks/community/blogs/… is the author actually recommending (2) ? You are saying it will crash. He is saying it works just fine. Am I incorrect ?
  • Howard Hinnant
    Howard Hinnant over 4 years
    The author says after this code: "(Note: We should not use this way in the real development, because it is a reference to a local object. Here just show how to make RVO happened.)." I agree that the author's wording is easy to misinterpret when skimming.
  • azerila
    azerila almost 2 years
    why can't we use the first example '& rval_ref' without non-const?