Lambdas and capture by reference local variables : Accessing after the scope

21,071

Solution 1

Yes, this causes undefined behavior. The lambdas will reference stack-allocated objects that have gone out of scope. (Technically, as I understand it, the behavior is defined until the lambdas access a and/or b. If you never invoke the returned lambdas then there is no UB.)

This is undefined behavior the same way that it's undefined behavior to return a reference to a stack-allocated local and then use that reference after the local goes out of scope, except that in this case it's being obfuscated a bit by the lambda.

Further, note that the order in which the lambdas are invoked is unspecified -- the compiler is free to invoke f.second() before f.first() because both are part of the same full-expression. Therefore, even if we fix the undefined behavior caused by using references to destroyed objects, both 2 0 and 2 1 are still valid outputs from this program, and which you get depends on the order in which your compiler decides to execute the lambdas. Note that this is not undefined behavior, because the compiler can't do anything at all, rather it simply has some freedom in deciding the order in which to do some things.

(Keep in mind that << in your main() function is invoking a custom operator<< function, and the order in which function arguments are evaluated is unspecified. Compilers are free to emit code that evaluates all of the function arguments within the same full-expression in any order, with the constraint that all arguments to a function must be evaluated before that function is invoked.)

To fix the first problem, use std::shared_ptr to create a reference-counted object. Capture this shared pointer by value, and the lambdas will keep the pointed-to object alive as long as they (and any copies thereof) exist. This heap-allocated object is where we will store the shared state of a and b.

To fix the second problem, evaluate each lambda in a separate statement.

Here is your code rewritten with the undefined behavior fixed, and with f.first() guaranteed to be invoked before f.second():

std::pair<std::function<int()>, std::function<int()>> addSome() {
    // We store the "a" and "b" ints instead in a shared_ptr containing a pair.
    auto numbers = std::make_shared<std::pair<int, int>>(0, 0);

    // a becomes numbers->first
    // b becomes numbers->second

    // And we capture the shared_ptr by value.
    return std::make_pair(
        [numbers] {
            ++numbers->first;
            ++numbers->second;
            return numbers->first + numbers->second;
        },
        [numbers] {
            return numbers->first;
        }
    );
}

int main() {
    auto f = addSome();
    // We break apart the output into two statements to guarantee that f.first()
    // is evaluated prior to f.second().
    std::cout << f.first();
    std::cout << " " << f.second();
    return 0;
}

(See it run.)

Solution 2

Unfortunately C++ lambdas can capture by reference but don't solve the "upwards funarg problem".

Doing so would require allocating captured locals in "cells" and garbage collection or reference counting for deallocation. C++ is not doing it and unfortunately this make C++ lambdas a lot less useful and more dangerous than in other languages like Lisp, Python or Javascript.

More specifically in my experience you should avoid at all costs implicit capture by reference (i.e. using the [&](…){…} form) for lambda objects that survive the local scope because that's a recipe for random segfaults later during maintenance.

Always plan carefully about what to capture and how and about the lifetime of captured references.

Of course it's safe to capture everything by reference with [&] if all you are doing is simply using the lambda in the same scope to pass code for example to algorithms like std::sort without having to define a named comparator function outside of the function or as locally used utility functions (I find this use very readable and nice because you can get a lot of context implicitly and there is no need to 1. make up a global name for something that will never be reused anywhere else, 2. pass a lot of context or creating extra classes just for that context).

An approach that can work sometimes is capturing by value a shared_ptr to a heap-allocated state. This is basically implementing by hand what Python does automatically (but pay attention to reference cycles to avoid memory leaks: Python has a garbage collector, C++ doesn't).

Share:
21,071
Ashish Negi
Author by

Ashish Negi

i am here to learn

Updated on July 09, 2022

Comments

  • Ashish Negi
    Ashish Negi almost 2 years

    I am passing my local-variables by reference to two lambda. I call these lambdas outside of the function scope. Is this undefined ?

    std::pair<std::function<int()>, std::function<int()>> addSome() {
        int a = 0, b = 0;
        return std::make_pair([&a,&b] {
            ++a; ++b;
            return a+b;
            }, [&a, &b] {
                return a;
            });
    }
    
    int main() {
        auto f = addSome();
        std::cout << f.first() << " " << f.second();
        return 0;
    }
    

    If it is not, however, changes in one lambda are not reflected in other lambda.

    Am i misunderstanding pass-by-reference in context of lambdas ?

    I am writing to the variables and it seems to be working fine with no runtime-errors with output

    2 0. If it works then i would expect output 2 1.