What does the thread_local mean in C++11?

c++ multithreading c++11 thread-local thread-local-storage

105,706

Solution 1

Thread-local storage duration is a term used to refer to data that is seemingly global or static storage duration (from the viewpoint of the functions using it) but in actual fact, there is one copy per thread.

It adds to the current automatic (exists during a block/function), static (exists for the program duration) and dynamic (exists on the heap between allocation and deallocation).

Something that is thread-local is brought into existence at thread creation and disposed of when the thread stops.

Some examples follow.

Think of a random number generator where the seed must be maintained on a per-thread basis. Using a thread-local seed means that each thread gets its own random number sequence, independent of other threads.

If your seed was a local variable within the random function, it would be initialised every time you called it, giving you the same number each time. If it was a global, threads would interfere with each other's sequences.

Another example is something like strtok where the tokenisation state is stored on a thread-specific basis. That way, a single thread can be sure that other threads won't screw up its tokenisation efforts, while still being able to maintain state over multiple calls to strtok - this basically renders strtok_r (the thread-safe version) redundant.

Both these examples allow for the thread local variable to exist within the function that uses it. In pre-threaded code, it would simply be a static storage duration variable within the function. For threads, that's modified to thread local storage duration.

Yet another example would be something like errno. You don't want separate threads modifying errno after one of your calls fails but before you can check the variable, and yet you only want one copy per thread.

This site has a reasonable description of the different storage duration specifiers.

Solution 2

When you declare a variable thread_local then each thread has its own copy. When you refer to it by name, then the copy associated with the current thread is used. e.g.

thread_local int i=0;

void f(int newval){
    i=newval;
}

void g(){
    std::cout<<i;
}

void threadfunc(int id){
    f(id);
    ++i;
    g();
}

int main(){
    i=9;
    std::thread t1(threadfunc,1);
    std::thread t2(threadfunc,2);
    std::thread t3(threadfunc,3);

    t1.join();
    t2.join();
    t3.join();
    std::cout<<i<<std::endl;
}

This code will output "2349", "3249", "4239", "4329", "2439" or "3429", but never anything else. Each thread has its own copy of i, which is assigned to, incremented and then printed. The thread running main also has its own copy, which is assigned to at the beginning and then left unchanged. These copies are entirely independent, and each has a different address.

It is only the name that is special in that respect --- if you take the address of a thread_local variable then you just have a normal pointer to a normal object, which you can freely pass between threads. e.g.

thread_local int i=0;

void thread_func(int*p){
    *p=42;
}

int main(){
    i=9;
    std::thread t(thread_func,&i);
    t.join();
    std::cout<<i<<std::endl;
}

Since the address of i is passed to the thread function, then the copy of i belonging to the main thread can be assigned to even though it is thread_local. This program will thus output "42". If you do this, then you need to take care that *p is not accessed after the thread it belongs to has exited, otherwise you get a dangling pointer and undefined behaviour just like any other case where the pointed-to object is destroyed.

thread_local variables are initialized "before first use", so if they are never touched by a given thread then they are not necessarily ever initialized. This is to allow compilers to avoid constructing every thread_local variable in the program for a thread that is entirely self-contained and doesn't touch any of them. e.g.

struct my_class{
    my_class(){
        std::cout<<"hello";
    }
    ~my_class(){
        std::cout<<"goodbye";
    }
};

void f(){
    thread_local my_class unused;
}

void do_nothing(){}

int main(){
    std::thread t1(do_nothing);
    t1.join();
}

In this program there are 2 threads: the main thread and the manually-created thread. Neither thread calls f, so the thread_local object is never used. It is therefore unspecified whether the compiler will construct 0, 1 or 2 instances of my_class, and the output may be "", "hellohellogoodbyegoodbye" or "hellogoodbye".

Solution 3

Thread-local storage is in every aspect like static (= global) storage, only that each thread has a separate copy of the object. The object's life time starts either at thread start (for global variables) or at first initialization (for block-local statics), and ends when the thread ends (i.e. when join() is called).

Consequently, only variables that could also be declared static may be declared as thread_local, i.e. global variables (more precisely: variables "at namespace scope"), static class members, and block-static variables (in which case static is implied).

As an example, suppose you have a thread pool and want to know how well your work load was being balanced:

thread_local Counter c;

void do_work()
{
    c.increment();
    // ...
}

int main()
{
    std::thread t(do_work);   // your thread-pool would go here
    t.join();
}

This would print thread usage statistics, e.g. with an implementation like this:

struct Counter
{
     unsigned int c = 0;
     void increment() { ++c; }
     ~Counter()
     {
         std::cout << "Thread #" << std::this_thread::id() << " was called "
                   << c << " times" << std::endl;
     }
};

105,706

polapts

Updated on August 01, 2020

Comments

polapts over 3 years

I am confused with the description of thread_local in C++11. My understanding is, each thread has unique copy of local variables in a function. The global/static variables can be accessed by all the threads (possibly synchronized access using locks). And the thread_local variables are visible to all the threads but can only modified by the thread for which they are defined? Is it correct?
James Kanze over 11 years

Using thread local doesn't solve the problems with strtok. strtok is broken even in a single threaded environment.
paxdiablo over 11 years

Sorry, let me rephrase that. It doesn't introduce any new problems with strtok :-)
Kerrek SB over 11 years

Actually, the r stands for "re-entrant", which has nothing to do with thread safety. It's true that you can make some things work thread-safely with thread-local storage, but you can't make them re-entrant.
MSalters over 11 years

In a single-threaded environment, functions need to be re-entrant only if they are part of a cycle in the call graph. A leaf function (one that doesn't call other functions) is by definition not part of a cycle, and there is no good reason why strtok should call other functions.
japreiss almost 10 years

this would mess it up: while (something) { char *next = strtok(whatever); someFunction(next); // someFunction calls strtok }
Tim Čas about 9 years

@MSalters: You get problems if you (try to) intertwine two strtok sequences in one thread; say, if you're processing two strings at the same time. That's where the reentrant variants come in handy (plus it's cleaner --- no globals are accessed).
Dr. Jekyll about 7 years

Does a thread_local object calls its deallocator at the end of the thread ?
Mark H almost 7 years

I think it is important to note that the thread-local copy of the variable is a newly initialized copy of variable. That is, if you add a g() call to the beginning of threadFunc, then the output will be 0304029 or some other permutation of the pairs 02, 03, and 04. That is, even though 9 is assigned to i before the threads are created, the threads get a freshly constructed copy of i where i=0. If i is assigned with thread_local int i = random_integer(), then each thread gets a new random integer.
Hongxu Chen over 5 years

Not exactly a permutation of 02, 03, 04, there may be other sequences like 020043
haxpor over 4 years

+1 Great example for strtok. I checked glibc from the tip, the implementation of strtok is by two lines and calls strtok_r.
jwd almost 4 years

Interesting tidbit I just found: GCC supports using the address of a thread_local variable as template argument, but other compilers do not (as of this writing; tried clang, vstudio). I'm not sure what the standard has to say about that, or if this is a unspecified area.
Ayberk Özgür over 2 years

Some code samples would be nice