Why is creating a Thread said to be expensive?

java multithreading performance concurrency

50,634

Solution 1

Why is creating a Thread said to be expensive?

Because it >>is<< expensive.

Java thread creation is expensive because there is a fair bit of work involved:

A large block of memory has to be allocated and initialized for the thread stack.
System calls need to be made to create / register the native thread with the host OS.
Descriptors need to be created, initialized and added to JVM-internal data structures.

It is also expensive in the sense that the thread ties down resources as long as it is alive; e.g. the thread stack, any objects reachable from the stack, the JVM thread descriptors, the OS native thread descriptors.

The costs of all of these things are platform specific, but they are not cheap on any Java platform I've ever come across.

A Google search found me an old benchmark that reports a thread creation rate of ~4000 per second on a Sun Java 1.4.1 on a 2002 vintage dual processor Xeon running 2002 vintage Linux. A more modern platform will give better numbers ... and I can't comment on the methodology ... but at least it gives a ballpark for how expensive thread creation is likely to be.

Peter Lawrey's benchmarking indicates that thread creation is significantly faster these days in absolute terms, but it is unclear how much of this is due improvements in Java and/or the OS ... or higher processor speeds. But his numbers still indicate a 150+ fold improvement if you use a thread pool versus creating/starting a new thread each time. (And he makes the point that this is all relative ...)

The above assumes native threads rather than green threads, but modern JVMs all use native threads for performance reasons. Green threads are possibly cheaper to create, but you pay for it in other areas.

Update: The OpenJDK Loom project aims to provide a light-weight alternative to standard Java threads, among other things. The are proposing virtual threads which are a hybrid of native threads and green threads. In simple terms, a virtual thread is rather like a green thread implementation that uses native threads underneath when parallel execution is required.

As of now (Jan 2021) the Project Loom work is still at the prototyping stage, with (AFAIK) no Java version targeted for the release.

I've done a bit of digging to see how a Java thread's stack really gets allocated. In the case of OpenJDK 6 on Linux, the thread stack is allocated by the call to pthread_create that creates the native thread. (The JVM does not pass pthread_create a preallocated stack.)

Then, within pthread_create the stack is allocated by a call to mmap as follows:

mmap(0, attr.__stacksize, 
     PROT_READ|PROT_WRITE|PROT_EXEC, 
     MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)

According to man mmap, the MAP_ANONYMOUS flag causes the memory to be initialized to zero.

Thus, even though it might not be essential that new Java thread stacks are zeroed (per the JVM spec), in practice (at least with OpenJDK 6 on Linux) they are zeroed.

Solution 2

Others have discussed where the costs of threading come from. This answer covers why creating a thread is not that expensive compared to many operations, but relatively expensive compared to task execution alternatives, which are relatively less expensive.

The most obvious alternative to running a task in another thread is to run the task in the same thread. This is difficult to grasp for those assuming that more threads are always better. The logic is that if the overhead of adding the task to another thread is greater than the time you save, it can be faster to perform the task in the current thread.

Another alternative is to use a thread pool. A thread pool can be more efficient for two reasons. 1) it reuses threads already created. 2) you can tune/control the number of threads to ensure you have optimal performance.

The following program prints....

Time for a task to complete in a new Thread 71.3 us
Time for a task to complete in a thread pool 0.39 us
Time for a task to complete in the same thread 0.08 us
Time for a task to complete in a new Thread 65.4 us
Time for a task to complete in a thread pool 0.37 us
Time for a task to complete in the same thread 0.08 us
Time for a task to complete in a new Thread 61.4 us
Time for a task to complete in a thread pool 0.38 us
Time for a task to complete in the same thread 0.08 us

This is a test for a trivial task which exposes the overhead of each threading option. (This test task is the sort of task that is actually best performed in the current thread.)

final BlockingQueue<Integer> queue = new LinkedBlockingQueue<Integer>();
Runnable task = new Runnable() {
    @Override
    public void run() {
        queue.add(1);
    }
};

for (int t = 0; t < 3; t++) {
    {
        long start = System.nanoTime();
        int runs = 20000;
        for (int i = 0; i < runs; i++)
            new Thread(task).start();
        for (int i = 0; i < runs; i++)
            queue.take();
        long time = System.nanoTime() - start;
        System.out.printf("Time for a task to complete in a new Thread %.1f us%n", time / runs / 1000.0);
    }
    {
        int threads = Runtime.getRuntime().availableProcessors();
        ExecutorService es = Executors.newFixedThreadPool(threads);
        long start = System.nanoTime();
        int runs = 200000;
        for (int i = 0; i < runs; i++)
            es.execute(task);
        for (int i = 0; i < runs; i++)
            queue.take();
        long time = System.nanoTime() - start;
        System.out.printf("Time for a task to complete in a thread pool %.2f us%n", time / runs / 1000.0);
        es.shutdown();
    }
    {
        long start = System.nanoTime();
        int runs = 200000;
        for (int i = 0; i < runs; i++)
            task.run();
        for (int i = 0; i < runs; i++)
            queue.take();
        long time = System.nanoTime() - start;
        System.out.printf("Time for a task to complete in the same thread %.2f us%n", time / runs / 1000.0);
    }
}
}

As you can see, creating a new thread only costs ~70 µs. This could be considered trivial in many, if not most, use cases. Relatively speaking it is more expensive than the alternatives and for some situations a thread pool or not using threads at all is a better solution.

Solution 3

In theory, this depends on the JVM. In practice, every thread has a relatively large amount of stack memory (256 KB per default, I think). Additionally, threads are implemented as OS threads, so creating them involves an OS call, i.e. a context switch.

Do realize that "expensive" in computing is always very relative. Thread creation is very expensive relative to the creation of most objects, but not very expensive relative to a random harddisk seek. You don't have to avoid creating threads at all costs, but creating hundreds of them per second is not a smart move. In most cases, if your design calls for lots of threads, you should use a limited-size thread pool.

Solution 4

There are two kinds of threads:

Proper threads: these are abstractions around the underlying operating system's threading facilities. Thread creation is, therefore, as expensive as the system's -- there's always an overhead.
"Green" threads: created and scheduled by the JVM, these are cheaper, but no proper paralellism occurs. These behave like threads, but are executed within the JVM thread in the OS. They are not often used, to my knowledge.

The biggest factor I can think of in the thread-creation overhead, is the stack-size you have defined for your threads. Thread stack-size can be passed as a parameter when running the VM.

Other than that, thread creation is mostly OS-dependent, and even VM-implementation-dependent.

Now, let me point something out: creating threads is expensive if you're planning on firing 2000 threads per second, every second of your runtime. The JVM is not designed to handle that. If you'll have a couple of stable workers that won't be fired and killed over and over, relax.

Solution 5

Creating Threads requires allocating a fair amount of memory since it has to make not one, but two new stacks (one for java code, one for native code). Use of Executors/Thread Pools can avoid the overhead, by reusing threads for multiple tasks for Executor.

View more solutions

50,634

kachanov

Updated on February 07, 2021

Comments

kachanov about 3 years

The Java tutorials say that creating a Thread is expensive. But why exactly is it expensive? What exactly is happening when a Java Thread is created that makes its creation expensive? I'm taking the statement as true, but I'm just interested in mechanics of Thread creation in JVM.

Thread lifecycle overhead. Thread creation and teardown are not free. The actual overhead varies across platforms, but thread creation takes time, introducing latency into request processing, and requires some processing activity by the JVM and OS. If requests are frequent and lightweight, as in most server applications, creating a new thread for each request can consume significant computing resources.

From Java Concurrency in Practice
By Brian Goetz, Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes, Doug Lea
Print ISBN-10: 0-321-34960-1
- Nanne about 13 years
  
  I don't know the context in which the tutorials you've read say this: do they imply that the creation itself is expensive, or that "creating a thread" is expensive. The difference I try to show is between the pure action of making the thread (lets call it instantiating it or something), or the fact that you have a thread (so using a thread: obviously having overhead). Which one is claimed // which one do you wish to ask about?
- Rémi Vennereau about 13 years
  
  @typoknig - Expensive compared to NOT creating a new thread :)
- Paul Draper almost 10 years
  
  possible duplicate of Java thread creation overhead
- Alexander Mills over 8 years
  
  threadpools for the win. no need to always to create new threads for tasks.
- Basil Bourque about 3 years
  
  Alternatively, the virtual threads feature (also known as fibers) coming to Java via Project Loom are not expensive. Loom maps many virtual threads to one actual platform/host thread to greatly improve performance in situations where threads often block. For more info, see the most recent presentations and interviews by Ron Pressler of Oracle. Early access to Loom-enabled JVMs is available now.
Stephen C about 13 years

"... a couple of stable workers that won't be fired and killed ..." Why did I start thinking about workplace conditions? :-)
Stephen C about 13 years

@Raedwald - it is the initialization part that is expensive. Somewhere, something (e.g. the GC, or the OS) will zero the bytes before the block is turned into a thread stack. That takes physical memory cycles on typical hardware.
Raedwald about 13 years

"Somewhere, something (e.g. the GC, or the OS) will zero the bytes". It will? The OS will if it requires allocation of a new memory page, for security reasons. But that will be uncommon. And the OS might keep a cache of already zero-ed pages (IIRC, Linux does so). Why would the GC bother, given that the JVM will prevent any Java program reading its content? Note that the standard C malloc() function, which the JVM might well use, does not guarantee that allocated memory is zero-ed (presumably to avoid just such performance problems).
Raedwald about 13 years

stackoverflow.com/questions/2117072/… concurs that "One major factor is the stack memory allocated to each thread".
Stephen C about 13 years

@Raedwald - see updated answer for info on how the stack is actually allocated.
Raedwald about 13 years

It is possible (probable even) that the memory pages allocated by the mmap() call are copy-on-write mapped to a zero page, so their initailisation happens not within mmap() itself, but when the pages are first written to, and then only one page at a time. That is, when the thread starts execution, with the cost bourne by the created thread rather than the creator thread.
bestsss almost 13 years

@Raedwald, the usual behavior of the newly create threads is starting off on the same CPU even if a lot cores/CPUS (on different sockets) are available. That creates local latency at the very least, also it can starve the creation thread off CPU cycles. (there have been some heuristics to help that thread, so it can actually spawn enough and delegate work). Registering the host OS requires kernel hop(s), separate calls to mmap on current linux kernels impose extra overhead.
Stephen C almost 13 years

@bestsss - that's OS / scheduler specific. @Raedwald, it is largely immaterial which thread bears the cost. The big picture is that the zeroing uses memory bandwidth, which potentially affects every thread / processor.
bestsss almost 13 years

@Stephen, it's an example how it works on a commodity OS (I think windows is practically the same except thread priorities work in different way). Perhaps it was unclear, the threads just start on the same core and after some time they are rescheduled to a different one.
bestsss almost 13 years

@Raedwald, what is the jvm that uses separate stacks?
Philip JF almost 13 years

As far as I know, all JVMs allocate two stacks per thread. It is helpful for garbage collection to treat Java code (even when JITed) differently from free-casting c.
Nicholas about 11 years

That's a great piece of code there. Concise, to the point and clearly displays its jist.
Victor Grazi over 10 years

In the last block, I believe the result is skewed, because in the first two blocks the main thread is removing in parallel as the worker threads are putting. However in the last block, the action of taking is all performed serially, so it is dilating the value. You could probably use queue.clear() and use a CountDownLatch instead to wait for the threads to complete.
Vishy over 10 years

@VictorGrazi I am assuming you want to collect the results centrally. It is doing the same amount of queuing work in each case. A count down latch would be slightly faster.
Victor Grazi over 10 years

Actually, why not just have it do something consistently fast, like incrementing a counter; drop the whole BlockingQueue thing. Check the counter at the end to prevent the compiler from optimizing out the increment operation
Vishy over 10 years

@grazi you could do that in this case but you wouldn't in most realistic cases as waiting on a counter might be inefficient. If you did that the difference between the examples would be even greater.
Vishy over 10 years

Btw kb = kilo-bit , kB = kilo byte. Gb = giga bit , GB = giga byte.
David T. over 10 years

@StephenC you mention "Descriptors needs to be created, initialized and added to JVM internal data structures." what does this mean? file descriptors? and what does internal data structures refer to? are you basically saying that now the OS creates certain object locks?
Stephen C over 10 years

@DavidT. Private descriptors in the JVM (user-space) that (for example) hold the native thread identifier corresponding to a Thread. (Check the OpenJDK codebase ...). Also there are native thread descriptors in kernel space. No I'm not referring to object locks. (The object lock descriptors typically only get created when an object lock is used. That's independent from this.)
Jack almost 8 years

@PeterLawrey do we capitalize the 'k' in 'kb' and 'kB', so there's symmetry to 'Gb' and 'GB' ? These things bug me.
Vishy almost 8 years

@Jack There is a K = 1024 and k = 1000. ;) en.wikipedia.org/wiki/Kibibyte
dagnelies over 6 years

I believe this is a bad example. The worse performance has most likely to do with fighting over the blocking queue, and not at all with the thread creation cost/overhead.
Gurinder about 6 years

@Philip JF Can you please elaborate? What do you mean by 2 stacks one for Java code and one for native code? What does it do?
Stephen C over 4 years

"As far as I know, all JVMs allocate two stacks per thread." - I have never seen any evidence that would support this. Perhaps you are misunderstanding the true nature of the opstack in the JVM spec. (It is a way of modelling the behavior of bytecodes, not something that needs to be used at runtime to execute them.)