Forcing multiple threads to use multiple CPUs when they are available

81,624

Solution 1

When I run it, it only seems to use one CPU until it needs more then it uses another CPU - is there anything I can do in Java to force different threads to run on different cores/CPUs?

I interpret this part of your question as meaning that you have already addressed the problem of making your application multi-thread capable. And despite that, it doesn't immediately start using multiple cores.

The answer to "is there any way to force ..." is (AFAIK) not directly. Your JVM and/or the host OS decide how many 'native' threads to use, and how those threads are mapped to physical processors. You do have some options for tuning. For example, I found this page which talks about how to tune Java threading on Solaris. And this page talks about other things that can slow down a multi-threaded application.

Solution 2

There are two basic ways to multi-thread in Java. Each logical task you create with these methods should run on a fresh core when needed and available.

Method one: define a Runnable or Thread object (which can take a Runnable in the constructor) and start it running with the Thread.start() method. It will execute on whatever core the OS gives it -- generally the less loaded one.

Tutorial: Defining and Starting Threads

Method two: define objects implementing the Runnable (if they don't return values) or Callable (if they do) interface, which contain your processing code. Pass these as tasks to an ExecutorService from the java.util.concurrent package. The java.util.concurrent.Executors class has a bunch of methods to create standard, useful kinds of ExecutorServices. Link to Executors tutorial.

From personal experience, the Executors fixed & cached thread pools are very good, although you'll want to tweak thread counts. Runtime.getRuntime().availableProcessors() can be used at run-time to count available cores. You'll need to shut down thread pools when your application is done, otherwise the application won't exit because the ThreadPool threads stay running.

Getting good multicore performance is sometimes tricky, and full of gotchas:

  • Disk I/O slows down a LOT when run in parallel. Only one thread should do disk read/write at a time.
  • Synchronization of objects provides safety to multi-threaded operations, but slows down work.
  • If tasks are too trivial (small work bits, execute fast) the overhead of managing them in an ExecutorService costs more than you gain from multiple cores.
  • Creating new Thread objects is slow. The ExecutorServices will try to re-use existing threads if possible.
  • All sorts of crazy stuff can happen when multiple threads work on something. Keep your system simple and try to make tasks logically distinct and non-interacting.

One other problem: controlling work is hard! A good practice is to have one manager thread that creates and submits tasks, and then a couple working threads with work queues (using an ExecutorService).

I'm just touching on key points here -- multithreaded programming is considered one of the hardest programming subjects by many experts. It's non-intuitive, complex, and the abstractions are often weak.


Edit -- Example using ExecutorService:

public class TaskThreader {
    class DoStuff implements Callable {
       Object in;
       public Object call(){
         in = doStep1(in);
         in = doStep2(in);
         in = doStep3(in); 
         return in;
       }
       public DoStuff(Object input){
          in = input;
       }
    }

    public abstract Object doStep1(Object input);    
    public abstract Object doStep2(Object input);    
    public abstract Object doStep3(Object input);    

    public static void main(String[] args) throws Exception {
        ExecutorService exec = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
        ArrayList<Callable> tasks = new ArrayList<Callable>();
        for(Object input : inputs){
           tasks.add(new DoStuff(input));
        }
        List<Future> results = exec.invokeAll(tasks);
        exec.shutdown();
        for(Future f : results) {
           write(f.get());
        }
    }
}

Solution 3

First, you should prove to yourself that your program would run faster on multiple cores. Many operating systems put effort into running program threads on the same core whenever possible.

Running on the same core has many advantages. The CPU cache is hot, meaning that data for that program is loaded into the CPU. The lock/monitor/synchronization objects are in CPU cache which means that other CPUs do not need to do cache synchronization operations across the bus (expensive!).

One thing that can very easily make your program run on the same CPU all the time is over-use of locks and shared memory. Your threads should not talk to each other. The less often your threads use the same objects in the same memory, the more often they will run on different CPUs. The more often they use the same memory, the more often they must block waiting for the other thread.

Whenever the OS sees one thread block for another thread, it will run that thread on the same CPU whenever it can. It reduces the amount of memory that moves over the inter-CPU bus. That is what I guess is causing what you see in your program.

Solution 4

First, I'd suggest reading "Concurrency in Practice" by Brian Goetz.

alt text

This is by far the best book describing concurrent java programming.

Concurrency is 'easy to learn, difficult to master'. I'd suggest reading plenty about the subject before attempting it. It's very easy to get a multi-threaded program to work correctly 99.9% of the time, and fail 0.1%. However, here are some tips to get you started:

There are two common ways to make a program use more than one core:

  1. Make the program run using multiple processes. An example is Apache compiled with the Pre-Fork MPM, which assigns requests to child processes. In a multi-process program, memory is not shared by default. However, you can map sections of shared memory across processes. Apache does this with it's 'scoreboard'.
  2. Make the program multi-threaded. In a multi-threaded program, all heap memory is shared by default. Each thread still has it's own stack, but can access any part of the heap. Typically, most Java programs are multi-threaded, and not multi-process.

At the lowest level, one can create and destroy threads. Java makes it easy to create threads in a portable cross platform manner.

As it tends to get expensive to create and destroy threads all the time, Java now includes Executors to create re-usable thread pools. Tasks can be assigned to the executors, and the result can be retrieved via a Future object.

Typically, one has a task which can be divided into smaller tasks, but the end results need to be brought back together. For example, with a merge sort, one can divide the list into smaller and smaller parts, until one has every core doing the sorting. However, as each sublist is sorted, it needs to be merged in order to get the final sorted list. Since this is "divide-and-conquer" issue is fairly common, there is a JSR framework which can handle the underlying distribution and joining. This framework will likely be included in Java 7.

Solution 5

There is no way to set CPU affinity in Java. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4234402

If you have to do it, use JNI to create native threads and set their affinity.

Share:
81,624
Nosrama
Author by

Nosrama

Updated on July 05, 2022

Comments

  • Nosrama
    Nosrama almost 2 years

    I'm writing a Java program which uses a lot of CPU because of the nature of what it does. However, lots of it can run in parallel, and I have made my program multi-threaded. When I run it, it only seems to use one CPU until it needs more then it uses another CPU - is there anything I can do in Java to force different threads to run on different cores/CPUs?

  • Michael Borgwardt
    Michael Borgwardt almost 15 years
    How would a multi-process application be easier to implement than a multi-threaded one?
  • Bastien Léonard
    Bastien Léonard almost 15 years
    @Michael: I agree with you, but this is subject to debate. The advantages of processes is that they are entirely separated, so you have to do extra effort in order to make them communicate -- which means that if a process crashes, others processes may recover more easily.
  • user1066101
    user1066101 almost 15 years
    multi-process application done via a GNU/Linux pipeline is trivial. Linux scripts do this all the time with almost no effort. Read from stdin, write to stdout and it will use cores effectively.
  • Bastien Léonard
    Bastien Léonard almost 15 years
    @S. Lott: I can't find a trivial way to use this when, say, a server uses a process/thread for each client, and shares data structures which can be modified by any process/thread.
  • user1066101
    user1066101 almost 15 years
    Most "shared" data structures are read frequently and updated rarely. Enqueue update requests to a DB-writer process, but read the source file(s) directly as needed. This is very simple to implement.
  • BobMcGee
    BobMcGee almost 15 years
    I would agree if this were C, but not for java. Java's handling of streams is slow due to synchronization, and its stdin/stdout I/O is a pain to work with. If this were a NIX C or C++ program, this would be the way to go. Java offers good synchronization/multithreading tools though, so why not use them?
  • Neil Coffey
    Neil Coffey almost 15 years
    Not sure multiple processes will necessarily help anyway-- depending on your OS, it probably schedules at thread level anyway.
  • user1066101
    user1066101 almost 15 years
    "Java offers good synchronization/multithreading tools" True. The point is that reading System.stdin and writing System.stdout is easier. Not faster. Easier.
  • BobMcGee
    BobMcGee almost 15 years
    @Lott: that doesn't do you much good if your goal is performance though, does it? You're basically making a slower version of a message passing interface. I agree with separating processing stages, but why do it via Stream when you can use work queues and worker threads?
  • user1066101
    user1066101 almost 15 years
    @BobMcGee: The unix pipeline isn't a "slower" version of a message passing interface. Try it. It's shared buffers and runs at an amazingly fast speed. You can use Queues and Workers, but it's not as simple. The point is that reading System.stdin and writing System.stdout is easier. Not faster. Easier.
  • BobMcGee
    BobMcGee almost 15 years
    @Lott Again, fast only in C -- the issue is Java's stream I/O being synchronized and checked on every I/O call, not the pipeline. Nor is it easier-- if you use stdout/stdin you need to define a communications protocol and work with parsing potentially. Don't forget exceptions writing into the StdOut too! Using a manager thread, ExecutorServices, and Runnable/Callable tasks is much simpler to implement. It's do-able in <100 lines of very simple code (with error checking), potentially very fast, and performs well.
  • BobMcGee
    BobMcGee almost 15 years
    @Lott: Look, see my code? It's longer, but it provides the complete application structure (minus input and output), and THREE steps. Your code gives a ONE step worker. It doesn't provide for task creation or overall process handling. You didn't specify communications protocol between processes, how multiple processes get started or communications set up or anything. How is your code still easier?
  • user1066101
    user1066101 almost 15 years
    @BobMcGee: It's easier because unix folks have been doing it this way for 40 years. There are numerous widely established precedents for doing it this way. The protocol for passing messages is -- 80% of the time -- simple serialization. The other 20% it's a simple text stream. The point is to be easier.
  • user1066101
    user1066101 almost 15 years
    @BobMcGee: "Your code gives a ONE step worker" Correct. They're all identical, so providing the other two seems specious to me. I could copy and paste two other clones, if you think that clarifies things.
  • Todd Gamblin
    Todd Gamblin almost 15 years
    I wish I could downvote this answer more than once. The question is about multithreaded programs, and you're putting forth stdin/stdout as some kind of general model of parallelism. This is a good approach for command-line tools that only need a FIFO, but it gets complicated if you have to transport anything of significance over the pipe. There are plenty of data structures in java.util.concurrent for exactly this kind of thing. They aren't hard to use if all you need is a pipeline, and you don't need to serialize, spawn multiple JVMs, or any of that to use them.
  • BobMcGee
    BobMcGee almost 15 years
    @Tgamplin: Agreed. I probably should drop the argument because it is obvious. @SLott: They're not identical -- you haven't provided code to set up the system, spawn and manage the processes or implement a messaging format to communicate tasks and pass out work / receive results and errors. UNIX does it this way because it made sense at the time & because multiple cores weren't that common. The "One Task One Program" philosophy also predominated. Java provides better concurrency primitives, and that approach is silly; why use IPC when the VM provides all you need?
  • user1066101
    user1066101 almost 15 years
    @BobMcGee: Agreed. They're not identical. Pipelines can be simpler because most of what you need is already provided. Protocol is serialization. Pipeline construction is allocated to the shell. Errors go to stderr. It's simple. Not equivalent. Not better. Simple.
  • BobMcGee
    BobMcGee almost 15 years
    @Lott: Perhaps you could flesh out your example to include more complete structure? That is to say, provide the main method, with process creation and destruction + very basic Serialized object communication for job start/end. I'd really like to see what the final code looks like, since I'm not as familiar with process-level concurrency. I think process parallelism should lend itself well to data-flow programming, which works better with concurrency than procedural. I'd just like to see the comparable structure. Also, maybe we should make a commmunity wiki post to discuss this?
  • BobMcGee
    BobMcGee almost 15 years
    @Lott: on anything but the latest Java releases from Sun, VM startup overhead is going to eat your performance if you use it like that. Simple, yes, but I thought you had a more elegant way of doing it.
  • user1066101
    user1066101 almost 15 years
    @BobMcGee: VM startup is only a penalty if your workload is poorly designed and spends a lot of time starting subprocesses. If your workload is well-designed, your processes run for so long that startup overhead is amortized over long running times.
  • buzz3791
    buzz3791 almost 12 years
    The JSR 166y framework has been included in Java 7 in the java.util.concurrent package's classes ForkJoinPool and ForkJoinTask docs.oracle.com/javase/tutorial/essential/concurrency/…
  • Sundeep
    Sundeep almost 12 years
    +1 for the reference. The link to PDF seems to be broken. Can you share the title if you still have that PDF?
  • toto_tico
    toto_tico over 9 years
    Brilliant! I went and read more about the topic because I was not clear about the advantage of the Executors. I am not yet sure about the others but the FixedThreadPool seems great because it limits the number of threads running (1) avoiding overloads of changing tasks, and (2) making sure that some threads finish first (and getting some results quick). This is specially useful for running experiments.