Python urllib2 with keep alive

37,501

Solution 1

Use the urlgrabber library. This includes an HTTP handler for urllib2 that supports HTTP 1.1 and keepalive:

>>> import urllib2
>>> from urlgrabber.keepalive import HTTPHandler
>>> keepalive_handler = HTTPHandler()
>>> opener = urllib2.build_opener(keepalive_handler)
>>> urllib2.install_opener(opener)
>>> 
>>> fo = urllib2.urlopen('http://www.python.org')

Note: you should use urlgrabber version 3.9.0 or earlier, as the keepalive module has been removed in version 3.9.1

There is a port of the keepalive module to Python 3.

Solution 2

Try urllib3 which has the following features:

  • Re-use the same socket connection for multiple requests (HTTPConnectionPool and HTTPSConnectionPool) (with optional client-side certificate verification).
  • File posting (encode_multipart_formdata).
  • Built-in redirection and retries (optional).
  • Supports gzip and deflate decoding.
  • Thread-safe and sanity-safe.
  • Small and easy to understand codebase perfect for extending and building upon. For a more comprehensive solution, have a look at Requests.

or a much more comprehensive solution - Requests - which supports keep-alive from version 0.8.0 (by using urllib3 internally) and has the following features:

  • Extremely simple HEAD, GET, POST, PUT, PATCH, DELETE Requests.
  • Gevent support for Asyncronous Requests.
  • Sessions with cookie persistience.
  • Basic, Digest, and Custom Authentication support.
  • Automatic form-encoding of dictionaries
  • A simple dictionary interface for request/response cookies.
  • Multipart file uploads.
  • Automatc decoding of Unicode, gzip, and deflate responses.
  • Full support for unicode URLs and domain names.

Solution 3

Or check out httplib's HTTPConnection.

Solution 4

Unfortunately keepalive.py was removed from urlgrabber on 25 Sep 2009 by the following change after urlgrabber was changed to depend on pycurl (which supports keep-alive):

http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=commit;h=f964aa8bdc52b29a2c137a917c72eecd4c4dda94

However, you can still get the last revision of keepalive.py here:

http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=blob_plain;f=urlgrabber/keepalive.py;hb=a531cb19eb162ad7e0b62039d19259341f37f3a6

Solution 5

Note that urlgrabber does not entirely work with python 2.6. I fixed the issues (I think) by making the following modifications in keepalive.py.

In keepalive.HTTPHandler.do_open() remove this

     if r.status == 200 or not HANDLE_ERRORS:
         return r

And insert this

     if r.status == 200 or not HANDLE_ERRORS:
         # [speedplane] Must return an adinfourl object
         resp = urllib2.addinfourl(r, r.msg, req.get_full_url())
         resp.code = r.status
         resp.msg = r.reason
         return resp
Share:
37,501
ibz
Author by

ibz

Updated on July 09, 2022

Comments

  • ibz
    ibz almost 2 years

    How can I make a "keep alive" HTTP request using Python's urllib2?

  • btk
    btk about 13 years
    In the second line, it seems it should be from urlgrabber.keepalive import HTTPHandler
  • msanders
    msanders about 13 years
    Thanks @btk - I've now corrected the code accordingly. I've also added a note re which version of urlgrabber to use as per @jwatt's answer.
  • bgw
    bgw almost 13 years
  • msanders
    msanders over 12 years
    Thanks @PiPeep - I've added a link to your port in my answer.
  • Andriy Tylychko
    Andriy Tylychko over 12 years
    how to enable keep-alive for HTTPConnection? I tried adding Connection: Keep-Alive to both requests and response headers, but httplib still reconnects on each request
  • 2371
    2371 over 12 years
    Thanks but it would be nice if you explained what this fixed instead of that useless tagged comment.
  • 2371
    2371 over 12 years
    The original r and your resp are both <type 'instance'> and both have the same attributes. addinfourl says "class to add info() and geturl() methods to an open file." but the original already has info() and geturl(). Couldn't work out the benefit.
  • 2371
    2371 over 12 years
    This library has some issues with headers and lacks cookie support. You can fix it by copying from urllib2 and httplib but I'd recommend trying another library.
  • Jefferson Hudson
    Jefferson Hudson almost 10 years
    I am working on some NTLM authentication and the Requests NTLM library doesn't work correctly for it. However, the urllib2 NTLM library does work correctly. This question was therefore helpful to me.
  • Prof. Falken
    Prof. Falken almost 10 years
    @JeffersonHudson, I was not aware of that. You might have better luck with github.com/requests/requests-ntlm
  • Piotr Dobrogost
    Piotr Dobrogost over 9 years
    I have already proposed Requests in my answer posted over a year before this one...
  • Prof. Falken
    Prof. Falken over 9 years
    @PiotrDobrogost, fair enough, but what I propose is, let Requests be the default choice.
  • speedplane
    speedplane almost 8 years
    @bgw Does the Python 3 port also support python 2?
  • bgw
    bgw over 7 years
    @speedplane, I believe it does, however, instead of using that pastie link, you should use github.com/wikier/keepalive, which is more actively maintained. I've updated the post. The edit should be visible after it's peer reviewed.