Python code performance decreases with threading
Solution 1
This is sadly how things are in CPython, mainly due to the Global Interpreter Lock (GIL). Python code that's CPU-bound simply doesn't scale across threads (I/O-bound code, on the other hand, might scale to some extent).
There is a highly informative presentation by David Beazley where he discusses some of the issues surrounding the GIL. The video can be found here (thanks @Ikke!)
My recommendation would be to use the multiprocessing
module instead of multiple threads.
Solution 2
The threading library does not actually utilize multiple cores simultaneously for computation. You should use the multiprocessing library instead for computational threading.
dpitch40
Updated on July 26, 2020Comments
-
dpitch40 almost 4 years
I've written a working program in Python that basically parses a batch of binary files, extracting data into a data structure. Each file takes around a second to parse, which translates to hours for thousands of files. I've successfully implemented a threaded version of the batch parsing method with an adjustable number of threads. I tested the method on 100 files with a varying number of threads, timing each run. Here are the results (0 threads refers to my original, pre-threading code, 1 threads to the new version run with a single thread spawned).
0 threads: 83.842 seconds 1 threads: 78.777 seconds 2 threads: 105.032 seconds 3 threads: 109.965 seconds 4 threads: 108.956 seconds 5 threads: 109.646 seconds 6 threads: 109.520 seconds 7 threads: 110.457 seconds 8 threads: 111.658 seconds
Though spawning a thread confers a small performance increase over having the main thread do all the work, increasing the number of threads actually decreases performance. I would have expected to see performance increases, at least up to four threads (one for each of my machine's cores). I know threads have associated overhead, but I didn't think this would matter so much with single-digit numbers of threads.
I've heard of the "global interpreter lock", but as I move up to four threads I do see the corresponding number of cores at work--with two threads two cores show activity during parsing, and so on.
I also tested some different versions of the parsing code to see if my program is IO bound. It doesn't seem to be; just reading in the file takes a relatively small proportion of time; processing the file is almost all of it. If I don't do the IO and process an already-read version of a file, I adding a second thread damages performance and a third thread improves it slightly. I'm just wondering why I can't take advantage of my computer's multiple cores to speed things up. Please post any questions or ways I could clarify.