Python urllib2 with keep alive
Solution 1
Use the urlgrabber library. This includes an HTTP handler for urllib2 that supports HTTP 1.1 and keepalive:
>>> import urllib2
>>> from urlgrabber.keepalive import HTTPHandler
>>> keepalive_handler = HTTPHandler()
>>> opener = urllib2.build_opener(keepalive_handler)
>>> urllib2.install_opener(opener)
>>>
>>> fo = urllib2.urlopen('http://www.python.org')
Note: you should use urlgrabber version 3.9.0 or earlier, as the keepalive module has been removed in version 3.9.1
There is a port of the keepalive module to Python 3.
Solution 2
Try urllib3 which has the following features:
- Re-use the same socket connection for multiple requests (HTTPConnectionPool and HTTPSConnectionPool) (with optional client-side certificate verification).
- File posting (encode_multipart_formdata).
- Built-in redirection and retries (optional).
- Supports gzip and deflate decoding.
- Thread-safe and sanity-safe.
- Small and easy to understand codebase perfect for extending and building upon. For a more comprehensive solution, have a look at Requests.
or a much more comprehensive solution - Requests - which supports keep-alive from version 0.8.0 (by using urllib3 internally) and has the following features:
- Extremely simple HEAD, GET, POST, PUT, PATCH, DELETE Requests.
- Gevent support for Asyncronous Requests.
- Sessions with cookie persistience.
- Basic, Digest, and Custom Authentication support.
- Automatic form-encoding of dictionaries
- A simple dictionary interface for request/response cookies.
- Multipart file uploads.
- Automatc decoding of Unicode, gzip, and deflate responses.
- Full support for unicode URLs and domain names.
Solution 3
Or check out httplib's HTTPConnection.
Solution 4
Unfortunately keepalive.py was removed from urlgrabber on 25 Sep 2009 by the following change after urlgrabber was changed to depend on pycurl (which supports keep-alive):
http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=commit;h=f964aa8bdc52b29a2c137a917c72eecd4c4dda94
However, you can still get the last revision of keepalive.py here:
Solution 5
Note that urlgrabber does not entirely work with python 2.6. I fixed the issues (I think) by making the following modifications in keepalive.py.
In keepalive.HTTPHandler.do_open() remove this
if r.status == 200 or not HANDLE_ERRORS:
return r
And insert this
if r.status == 200 or not HANDLE_ERRORS:
# [speedplane] Must return an adinfourl object
resp = urllib2.addinfourl(r, r.msg, req.get_full_url())
resp.code = r.status
resp.msg = r.reason
return resp
ibz
Updated on July 09, 2022Comments
-
ibz almost 2 years
How can I make a "keep alive" HTTP request using Python's urllib2?
-
btk about 13 yearsIn the second line, it seems it should be
from urlgrabber.keepalive import HTTPHandler
-
msanders about 13 yearsThanks @btk - I've now corrected the code accordingly. I've also added a note re which version of urlgrabber to use as per @jwatt's answer.
-
bgw almost 13 yearsA quick port of it I made to python 3. Hope if helps someone.
-
msanders over 12 yearsThanks @PiPeep - I've added a link to your port in my answer.
-
Andriy Tylychko over 12 yearshow to enable keep-alive for HTTPConnection? I tried adding
Connection: Keep-Alive
to both requests and response headers, buthttplib
still reconnects on each request -
2371 over 12 yearsThanks but it would be nice if you explained what this fixed instead of that useless tagged comment.
-
2371 over 12 yearsThe original r and your resp are both <type 'instance'> and both have the same attributes. addinfourl says "class to add info() and geturl() methods to an open file." but the original already has info() and geturl(). Couldn't work out the benefit.
-
2371 over 12 yearsThis library has some issues with headers and lacks cookie support. You can fix it by copying from urllib2 and httplib but I'd recommend trying another library.
-
Jefferson Hudson almost 10 yearsI am working on some NTLM authentication and the Requests NTLM library doesn't work correctly for it. However, the urllib2 NTLM library does work correctly. This question was therefore helpful to me.
-
Prof. Falken almost 10 years@JeffersonHudson, I was not aware of that. You might have better luck with github.com/requests/requests-ntlm
-
Piotr Dobrogost over 9 yearsI have already proposed Requests in my answer posted over a year before this one...
-
Prof. Falken over 9 years@PiotrDobrogost, fair enough, but what I propose is, let Requests be the default choice.
-
speedplane almost 8 years@bgw Does the Python 3 port also support python 2?
-
bgw over 7 years@speedplane, I believe it does, however, instead of using that pastie link, you should use github.com/wikier/keepalive, which is more actively maintained. I've updated the post. The edit should be visible after it's peer reviewed.