Is the max thread limit actually a non-relevant issue for Python / Linux?

18,453

Solution 1

  1. "One thread is running at a time because of the GIL." Well, sort of. The GIL means that only one thread can be executing Python code at a time. However, any number of threads could be doing IO, various other syscalls, or other code that doesn't hold the GIL.

    It sounds like your threads will be doing mostly network I/O, and any number of threads can do I/O simultaneously. The GIL competition might be pretty fierce with 1000 threads, but you can always create multiple Python processes and divide the I/O threads between them (i.e., fork a couple times before you start).

  2. "The Linux kernel by default only sets aside 8Meg for threads." I'm not sure where you heard that. Maybe what you actually heard was "On Linux, the default stack size is often 8 MiB," which is true. Each thread will use up 8 MiB of address space for stack (no problem on 64-bit) plus kernel resources for the additional memory maps and the thread process itself. You can change the stack size using the threading.stack_size library function, which helps if you have a lot of threads that don't make deep calls.

    >>> import threading
    >>> threading.stack_size()
    0 # platform default, probably 8 MiB
    >>> threading.stack_size(64*1024) # 64 KiB stack size for future threads
    
  3. Others in this thread have suggested using an asynchronous / nonblocking framework. Well, you can do that. However, on the modern Linux kernel, a multithreaded model is competitive with asynchronous (select/poll/epoll) I/O multiplexing techniques. Rewriting your code to use an asynchronous model is a non-trivial amount of work, so I'd only do it if I couldn't get the required performance from a threaded model. If your threads are really trying to simulate human latency (e.g., spend most of their time sleeping), there are a lot of scenarios in which the asynchronous approach is actually slower. I'm not sure if this applies to Python, where the reduced GIL contention alone might merit the switch.

Solution 2

Both of those are partially true:

  • Each thread does have a stack, and you can run out of address space for the stack if you create enough threads.

  • Python also does have something called a GIL, which only allows one Python thread to run at a time. However, once Python code calls into C code, that C code can run while a different Python thread runs. However, threads in Python are still physical, and there is still the stack space limit.

If you're planning on having many connections, rather than using many threads, consider using an asynchronous design. Twisted would probably work well here.

Share:
18,453

Related videos on Youtube

TedBurrows
Author by

TedBurrows

Updated on June 04, 2022

Comments

  • TedBurrows
    TedBurrows almost 2 years

    The current Python application that I'm working on has a need to utilize 1000+ threads (Pythons threading module). Not that any single thread is working at max cpu cycles, this is just a web server load test app I'm creating. I.E. emulate 200 firefox clients all longing into web server and downloading small web components, basically emulating humans that operate in seconds as opposed to microseconds.

    So, I was reading through the various topics such as "how many threads does python support on Linux / windows, etc, and I saw a lot of varied answers. One users said its all about memory and the Linux kernel by default only sets aside 8Meg for threads, if it exceeds that then threads start being killed by the Kernel.

    One guy stated this is a non issue for CPython because only 1 thread is running at a time anyway (because of the GIL) so we can specify a gazillion threads??? What's the actual truth on this?

    • Amber
      Amber about 12 years
      Have you considered using something like Tornado which can do many async HTTP requests in a single thread?
    • josh3736
      josh3736 about 12 years
      ...or just use something that has already solved HTTP load testing.
  • icktoofay
    icktoofay about 12 years
    Should you decide to use Twisted, it has an HTTP client module (docs).
  • icktoofay
    icktoofay about 12 years
    gevent looks like it could be an alternative to Twisted.