A clean, lightweight alternative to Python's twisted?

57,973

Solution 1

I liked the concurrence Python module which relies on either Stackless Python microthreads or Greenlets for light-weight threading. All blocking network I/O is transparently made asynchronous through a single libevent loop, so it should be nearly as efficient as an real asynchronous server.

I suppose it's similar to Eventlet in this way.

The downside is that its API is quite different from Python's sockets/threading modules; you need to rewrite a fair bit of your application (or write a compatibility shim layer)

Edit: It seems that there's also cogen, which is similar, but uses Python 2.5's enhanced generators for its coroutines, instead of Greenlets. This makes it more portable than concurrence and other alternatives. Network I/O is done directly with epoll/kqueue/iocp.

Solution 2

Twisted is complex, you're right about that. Twisted is not bloated.

If you take a look here: http://twistedmatrix.com/trac/browser/trunk/twisted you'll find an organized, comprehensive, and very well tested suite of many protocols of the internet, as well as helper code to write and deploy very sophisticated network applications. I wouldn't confuse bloat with comprehensiveness.

It's well known that the Twisted documentation isn't the most user-friendly from first glance, and I believe this turns away an unfortunate number of people. But Twisted is amazing (IMHO) if you put in the time. I did and it proved to be worth it, and I'd recommend to others to try the same.

Solution 3

gevent is eventlet cleaned up.

API-wise it follows the same conventions as the standard library (in particular, threading and multiprocessing modules) where it makes sense. So you have familiar things like Queue and Event to work with.

It only supports libevent (update: libev since 1.0) as reactor implementation but takes full advantage of it, featuring a fast WSGI server based on libevent-http and resolving DNS queries through libevent-dns as opposed to using a thread pool like most other libraries do. (update: since 1.0 c-ares is used to make async DNS queries; threadpool is also an option.)

Like eventlet, it makes the callbacks and Deferreds unnecessary by using greenlets.

Check out the examples: concurrent download of multiple urls, long polling webchat.

Solution 4

A really interesting comparison of such frameworks was compiled by Nicholas Piël on his blog: it's well worth a read!

Solution 5

None of these solutions will avoid that fact that the GIL prevents CPU parallelism - they are just better ways of getting IO parallelism that you already have with threads. If you think you can do better IO, by all means pursue one of these, but if your bottleneck is in processing the results nothing here will help except for the multiprocessing module.

Share:
57,973

Related videos on Youtube

jkp
Author by

jkp

of Spotify {Stockholm|London} fame

Updated on August 12, 2020

Comments

  • jkp
    jkp almost 4 years

    A (long) while ago I wrote a web-spider that I multithreaded to enable concurrent requests to occur at the same time. That was in my Python youth, in the days before I knew about the GIL and the associated woes it creates for multithreaded code (IE, most of the time stuff just ends up serialized!)...

    I'd like to rework this code to make it more robust and perform better. There are basically two ways I could do this: I could use the new multiprocessing module in 2.6+ or I could go for a reactor / event-based model of some sort. I would rather do the later since it's far simpler and less error-prone.

    So the question relates to what framework would be best suited to my needs. The following is a list of the options I know about so far:

    • Twisted: The granddaddy of Python reactor frameworks: seems complex and a bit bloated however. Steep learning curve for a small task.
    • Eventlet: From the guys at lindenlab. Greenlet based framework that's geared towards these kinds of tasks. I had a look at the code though and it's not too pretty: non-pep8 compliant, scattered with prints (why do people do this in a framework!?), API seems a little inconsistent.
    • PyEv: Immature, doesn't seem to be anyone using it right now though it is based on libevent so it's got a solid backend.
    • asyncore: From the stdlib: über low-level, seems like a lot of legwork involved just to get something off the ground.
    • tornado: Though this is a server oriented product designed to server dynamic websites it does feature an async HTTP client and a simple ioloop. Looks like it could get the job done but not what it was intended for. [edit: doesn't run on Windows unfortunately, which counts it out for me - its a requirement for me to support this lame platform]

    Is there anything I have missed at all? Surely there must be a library out there that fits the sweet-spot of a simplified async networking library!

    [edit: big thanks to intgr for his pointer to this page. If you scroll to the bottom you will see there is a really nice list of projects that aim to tackle this task in one way or another. It seems actually that things have indeed moved on since the inception of Twisted: people now seem to favour a co-routine based solution rather than a traditional reactor / callback oriented one. The benefits of this approach are clearer more direct code: I've certainly found in the past, especially when working with boost.asio in C++ that callback based code can lead to designs that can be hard-to-follow and are relatively obscure to the untrained eye. Using co-routines allows you to write code that looks a little more synchronous at least. I guess now my task is to work out which one of these many libraries I like the look of and give it a go! Glad I asked now...]

    [edit: perhaps of interest to anyone who followed or stumbled on this this question or cares about this topic in any sense: I found a really great writeup of the current state of the available tools for this job]

    • intgr
      intgr over 14 years
      Python is multithreaded, it just doesn't allow two threads to run Python code concurrently.
    • jkp
      jkp over 14 years
      @Intgr: indeed it is, so in theory since socket is a C module, if they are letting the GIL go before calling the underlying routine things might actually be concurrent. Even still, I think I want to go back to something single-threaded.
    • Denis Otkidach
      Denis Otkidach over 14 years
      I've learned much more from your question than from answers to it.
    • jkp
      jkp over 14 years
      @Denis: heh, thanks I guess! There have been some good pointers in the answers too, specifically intgr's. I knew about a lot of the options out there and I didn't just want the answers packed with those so I thought I'd go to the trouble of spelling out what I knew :)
    • Jean-Paul Calderone
      Jean-Paul Calderone over 14 years
      > people now seem to favour a co-routine based solution rather than a traditional reactor / callback oriented one This is not a sensible comparison. "co-routine based solutions" and "reactor oriented" solutions are orthogonal. (Ignoring the fact that Python does not have coroutines) Take a look at Twisted's inlineCallbacks to see how you can have the programming style you seem to prefer with a robust, mature networking layer that's not going to expose you to complex platform idiosyncrasies.
    • jkp
      jkp over 14 years
      @Jean-Paul Calderone: OK, I will look at inlineCallbacks, which are new to me. As I've said before, there is a steep learning curve with twisted, where do I start for a the kind of small task I want to perform? Anyway, I guess why some people see co-routines as a good alternative is summarised here: weightless.io/background but you are right: I'm sure any of the co-routine implementations for Python could be used with Twisted in theory. And yes, Python doesn't co-routines out of the box, but you can add them easily enough.
    • intgr
      intgr over 14 years
      @Jean-Paul Calderone: You can implement coroutines in plain Python using generator functions (yes it's a hack but they are still coroutines).
    • L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳
      L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ about 14 years
      I don't understand Twisted. It seems to want to implement every protocol in the world, oh yeah, and wants to work with any event loop in the world without the user setting it up (GTK, etc). Would have been nice if there was something asynchronous that just lets you stack protocols however you want.
    • Glyph
      Glyph about 13 years
      @Longpoke: Twisted is something asynchronous that just lets you stack protocols however you want :). I gave a talk about, among other things, how lightweight it is. See here pycon.tv/#/video/58
    • blueberryfields
      blueberryfields over 10 years
      Any chance for an update to this question and/or answers to reflect the current state-of-the-art?
    • schlamar
      schlamar over 10 years
      A few points to add: 1. Tornado runs very well on Windows. It's just not as performant and scalable because it uses select for the I/O multiplexing. But you should be able to get a decent performance out of it with tornado-pyuv. 2. There is now asyncio in Python 3.3+ and its backport trollius which allows to run any Tornado application in its event loop (Twisted will be supported soon).
  • jkp
    jkp over 14 years
    @intgr: great links. I had seen both of those before once upon-a-time, those are the sorts of things I was hoping to see flushed out. +1
  • jkp
    jkp over 14 years
    @clemesha: maybe you are right, and it's not bloadted, but it does feel like there is rather a bit too much to get my head around to do something simple. I understand async programming, I've worked in C++ with boost::asio so the concepts are not new, but its all the gumph that compes with doing twisted stuff: it's a whole new world, much like django is for web stuff. Again when i'm doing web stuff I work with lightweight WSGI code and plug together only what i need. Horses for courses I guess.
  • jkp
    jkp over 14 years
    @clemesha: erm, I took the plunge today to have a look: twisted weighs in at 20MB! Even the core is 12MB....if that isn't bloated, I'm not sure quite what is.
  • daf
    daf over 14 years
    The basic Twisted APIs are pretty small (reactor, deferred, protocol). Most of the Twisted code is async protocol implementations using those basics. "Bloat" is not a useful adjective here (or indeed in most cases). Twisted's size is reasonable for the amount of stuff it does.
  • clemesha
    clemesha over 14 years
    While I agree that article was an interesting read, I think it's worthwhile to consider the validity of the presented benchmarks. See the comments here: reddit.com/r/programming/comments/ahepg/…
  • Emil Ivanov
    Emil Ivanov over 14 years
    What's wrong with using multiple processes?
  • Peter Hansen
    Peter Hansen about 14 years
    @clemesha, while the point in that reddit page is worth noting, the benchmark was done on a dual core machine and likely was not suffering from fatal flaw described. I suppose it's possible both the client and server ran on the same core, but it doesn't seem likely.
  • Adam Hupp
    Adam Hupp about 14 years
    Nothing at all, hence the suggestion to use the multiprocessing module.
  • Ben Ford
    Ben Ford over 13 years
    I used kamaelia for an app - it was extremely painful. IMHO there are other, better options for concurrenct in python (most of which are mentioned above)
  • Martin Tournoij
    Martin Tournoij over 12 years
    I'll second gevent -- After reviewing many of the solutions, gevent worked very well for me. It allowed me to retain the better part of my existing program, and the changes that were required were trivial -- Best of all, if the code needs to be maintained in 3, 4, 5, ... years time, it still makes sense to anyone not familiar with gevent, the biggest showstopper for Twisted is the strong learning curve, this causes problems not just when implementing, but also further down the line during maintenance...
  • schlamar
    schlamar about 11 years
    If you ever looked up how the gtk reactor is implemented under Windows (hardcore polling every 10ms: twistedmatrix.com/trac/browser/trunk/twisted/internet/…), you wouldn't call that "mature"...
  • Gewthen
    Gewthen almost 11 years
    Looks like concurrence is a dead project with their being the last update four years ago.
  • Glyph
    Glyph almost 11 years
    Hi @schlamar. This nasty hack was implemented as a workaround for some pretty serious bugs in GTK+, back in the day when there was much less concern about power efficiency :). But, the beauty of Twisted is that we can have this bug once, fix it in the framework, and our users don't need to be concerned about it. Would you like to contribute a fix that addresses this problem and gets rid of (deprecates and then later removes) PortableGtkReactor?
  • Robert Siemer
    Robert Siemer over 10 years
    I am a big Python fan. – Check out “Javascript – The good parts” from Douglas Crockford (3, 4 videos). And peek at CoffeeScript. It turns out JS has things Python should have, except the Syntax, haha. CS tried to mitigate that, but is a little clumsy on that...
  • schlamar
    schlamar over 10 years
    @Glyph I added helpful advice on twistedmatrix.com/trac/ticket/4744#comment:2 if someone else wants to tackle this issue, because some of theses issues still exist. BTW, you could have solved this much more efficiently by scheduling callbacks between the two event loops.
  • schlamar
    schlamar over 10 years
    While this is interesting and might be suitable for some tasks, using threads for networking performs bad (especially on Python due to the GIL). And this was exactly the question: an evented framework or with multiprocessing. So your answer is clearly out of scope...
  • Bahadir Cambel
    Bahadir Cambel over 10 years
    project is dead, so does Hyves!
  • Joseph Sheedy
    Joseph Sheedy over 7 years
    A lot has happened since Python 2.5. asyncio in Python 3.5 is great.