Using 100% of all cores with the multiprocessing module

103,052

Solution 1

To use 100% of all cores, do not create and destroy new processes.

Create a few processes per core and link them with a pipeline.

At the OS-level, all pipelined processes run concurrently.

The less you write (and the more you delegate to the OS) the more likely you are to use as many resources as possible.

python p1.py | python p2.py | python p3.py | python p4.py ...

Will make maximal use of your CPU.

Solution 2

You can use psutil to pin each process spawned by multiprocessing to a specific CPU:

import multiprocessing as mp
import psutil


def spawn():
    procs = list()
    n_cpus = psutil.cpu_count()
    for cpu in range(n_cpus):
        affinity = [cpu]
        d = dict(affinity=affinity)
        p = mp.Process(target=run_child, kwargs=d)
        p.start()
        procs.append(p)
    for p in procs:
        p.join()
        print('joined')


def run_child(affinity):
    proc = psutil.Process()  # get self pid
    print(f'PID: {proc.pid}')
    aff = proc.cpu_affinity()
    print(f'Affinity before: {aff}')
    proc.cpu_affinity(affinity)
    aff = proc.cpu_affinity()
    print(f'Affinity after: {aff}')


if __name__ == '__main__':
    spawn()

Note: As commented, psutil.Process.cpu_affinity is not available on macOS.

Solution 3

Minimum example in pure Python:

def f(x):
    while 1:
        # ---bonus: gradually use up RAM---
        x += 10000  # linear growth; use exponential for faster ending: x *= 1.01
        y = list(range(int(x))) 
        # ---------------------------------
        pass  # infinite loop, use up CPU

if __name__ == '__main__':  # name guard to avoid recursive fork on Windows
    import multiprocessing as mp
    n = mp.cpu_count() * 32  # multiply guard against counting only active cores
    with mp.Pool(n) as p:
        p.map(f, range(n))

Usage: to warm up on a cold day (but feel free to change the loop to something less pointless.)

Warning: to exit, don't pull the plug or hold the power button, Ctrl-C instead.

Solution 4

Regarding code snippet 1: How many cores / processors do you have on your test machine? It isn't doing you any good to run 50 of these processes if you only have 2 CPU cores. In fact you're forcing the OS to spend more time context switching to move processes on and off the CPU than do actual work.

Try reducing the number of spawned processes to the number of cores. So "for i in range(50):" should become something like:

import os;
# assuming you're on windows:
for i in range(int(os.environ["NUMBER_OF_PROCESSORS"])):
    ...

Regarding code snippet 2: You're using a multiprocessing.Lock which can only be held by a single process at a time so you're completely limiting all the parallelism in this version of the program. You've serialized things so that process 1 through 50 start, a random process (say process 7) acquires the lock. Processes 1-6, and 8-50 all sit on the line:

l.acquire()

While they sit there they are just waiting for the lock to be released. Depending on the implementation of the Lock primitive they are probably not using any CPU, they're just sitting there using system resources like RAM but are doing no useful work with the CPU. Process 7 counts and prints to 1000 and then releases the lock. The OS then is free to schedule randomly one of the remaining 49 processes to run. Whichever one it wakes up first will acquire the lock next and run while the remaining 48 wait on the Lock. This'll continue for the whole program.

Basically, code snippet 2 is an example of what makes concurrency hard. You have to manage access by lots of processes or threads to some shared resource. In this particular case there really is no reason that these processes need to wait on each other though.

So of these two, Snippet 1 is closer to more efficiently utilitizing the CPU. I think properly tuning the number of processes to match the number of cores will yield a much improved result.

Solution 5

I'd recommend using the Joblib library, it's a good library for multiprocessing, used in many ML applications, in sklearn etc.

from joblib import Parallel, delayed

Parallel(n_jobs=-1, prefer="processes", verbose=6)(
    delayed(function_name)(parameter1, parameter2, ...)
    for parameter1, parameter2, ... in object
)

Where n_jobs is the number of concurrent jobs. Set n=-1 if you want to use all available cores on the machine that you're running your code.

More details on parameters here: https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html

In your case, a possible implementation would be:

def worker(i):
    print('worker ', i)
    x = 0
    while x < 1000:
        print(x)
        x += 1

Parallel(n_jobs=-1, prefer="processes", verbose=6)(
        delayed(worker)(num)
        for num in range(50)
    )
Share:
103,052
Ggggggg
Author by

Ggggggg

Updated on July 09, 2022

Comments

  • Ggggggg
    Ggggggg almost 2 years

    I have two pieces of code that I'm using to learn about multiprocessing in Python 3.1. My goal is to use 100% of all the available processors. However, the code snippets here only reach 30% - 50% on all processors.

    Is there anyway to 'force' python to use all 100%? Is the OS (windows 7, 64bit) limiting Python's access to the processors? While the code snippets below are running, I open the task manager and watch the processor's spike, but never reach and maintain 100%. In addition to that, I can see multiple python.exe processes created and destroyed along the way. How do these processes relate to processors? For example, if I spawn 4 processes, each process isn't using it's own core. Instead, what are the processes using? Are they sharing all cores? And if so, is it the OS that is forcing the processes to share the cores?

    code snippet 1

    import multiprocessing
    
    def worker():
        #worker function
        print ('Worker')
        x = 0
        while x < 1000:
            print(x)
            x += 1
        return
    
    if __name__ == '__main__':
        jobs = []
        for i in range(50):
            p = multiprocessing.Process(target=worker)
            jobs.append(p)
            p.start()
    

    code snippet 2

    from multiprocessing import Process, Lock
    
    def f(l, i):
        l.acquire()
        print('worker ', i)
        x = 0
        while x < 1000:
            print(x)
            x += 1
        l.release()
    
    if __name__ == '__main__': 
        lock = Lock()
        for num in range(50):
            Process(target=f, args=(lock, num)).start()
    
    • Spike Gronim
      Spike Gronim about 13 years
      Remove the print statements. They force your process to pause and do IO instead of using pure CPU.
    • Rudy Garcia
      Rudy Garcia about 13 years
      The OS is responsible for scheduling your processes across all available cores. Processes aren't tied to specific cores and can (and will) be switched between cores by the OS. That's kind of the point of this whole "multitasking" thing that the OS is helping you do. However, if you have 4 cores, and 4 CPU bound processes you should be able to utilize all 4 cores.
    • MB.
      MB. about 7 years
      Spike Gronim's comment is the pertinent point here. There are several confounding problems coming into play here. One of them is properly setting CPU affinity as others have mentioned it, but more importantly, if your code is blocking on IO (in this case print), it will not be utilizing the CPU. You may be thinking of REALTIME_PRIORITY_CLASS on windows. But this is not what you want to do and simply won't solve your problem, as all it guarantees is that your thread will not be pre-empted. But blocking on IO will still result in the same underutilization of CPU.
  • ktdrv
    ktdrv about 13 years
    What you have above is what Python's multiprocessing module will do for you anyways.
  • user1066101
    user1066101 about 13 years
    A few processes which start once and move data through them is often more efficient than trying to start a large number of processes. Also, the OS schedules this very, very nicely, since it's built on OS API's directly with no wrappers or helpers.
  • Rudy Garcia
    Rudy Garcia about 13 years
    multiprocessing.Process create a separate process with an API similar to that of threading.Thread, so those "python "processes"" are really new processes.
  • Ggggggg
    Ggggggg about 13 years
    I don't completely understand what you mean by 'pipelined processes', but it's enough to get me searching in the right direction. If you could post a code snippet illustrating this pipelined approach - I'd be eternally grateful. :)
  • user1066101
    user1066101 about 13 years
    I did provide a code snippet. What more do you want? You'd have to provide some kind of concrete problem. But each stage simply reads stdin and writes stdout. Not much to it. It's standard Unix/Linux design philosophy. Been around for decades. Still works.
  • Ggggggg
    Ggggggg about 13 years
    Ah, it's making more sense to me now. Sorry for bothering you.
  • Andy
    Andy over 12 years
    An alternative to using the environment variable from multiprocessing import cpu_count; for i in xrange(cpu_count()): ...
  • Chilli
    Chilli over 6 years
    I have spent the last two days trying to figure why the combined cpu usage during multiprocessing or multithreading in Python NEVER exceeded 100%. This answer saved my life. Calling each program separately did the trick and got me to maximum usage of all the available CPUs, so much that it crashed my computer. So I would advise that you don't create more jobs than you have CPUs especially if your processes are extremely intense. But thank you for saving my life @S.Lott
  • fips
    fips over 6 years
    proc.cpu_affinity works on Windows, Linux and FreeBSD only.
  • EricP
    EricP almost 5 years
    In python 3.6 32-bit on Windows 10 that code gives: "RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase."
  • EricP
    EricP almost 5 years
    Thank you. I think adding if name == 'main' is what fixed it when I was working on this two days ago.
  • Jamie Nicholl-Shelley
    Jamie Nicholl-Shelley over 3 years
    I can't get this to user anything more than one core, much like pool.map. Any hints on how to perhaps split the tasks up further and acheive max cpu would be great.
  • Robert Cinca
    Robert Cinca over 3 years
    Try using backend="multiprocessing" instead of prefer="processes". You can also control the number of jobs using n_jobs, setting a specific number of cores if you'd like.
  • Jamie Nicholl-Shelley
    Jamie Nicholl-Shelley over 3 years
    Thanks, I'll give it a go does this work on Linux and windows?
  • Robert Cinca
    Robert Cinca over 3 years
    It should work, though there have been known issues in the past for Windows, and there can be weird interactions with other programs that also use concurrency. I'd recommend having a look here, as it describes some of the issues with multi-processing.
  • Jamie Nicholl-Shelley
    Jamie Nicholl-Shelley over 3 years
    I'm testing today and will have results. Currently only able to get 4 concurrent threads on linux (32 cores available)