urllib2.urlopen will hang forever despite of timeout

10,667

Solution 1

Looks like you are experiencing the proxy issue. Here's a great explanation on how to workaround it: Trying to access the Internet using urllib2 in Python.

I've executed your code on my ubuntu with python 2.7.3 and haven't seen any errors.

Also, consider using requests:

import requests

response = requests.get("http://casacinema.eu/movie-film-Matrix+trilogy+123+streaming-6165.html", timeout=5)
print response.status_code

See also:

Solution 2

The original poster stated they did not understand why it would hang, but they also wanted a way to keep urllib.request.urlopen from hanging. I can not say how to keep it from hanging but if it helps someone this is why it can hang.

The Python-urllib/3.6 client is picky. It expects, for example, the server to return HTTP/1.1 200 OK not HTTP 200 OK. It also expects the server to close the connection when it sends connection: close in the headers.

The best way to diagnose this is to get the raw output of the server response and compare it with another server response that you know works. Then, if you must create a server and manipulate the response to determine exactly what difference is the cause. Perhaps, that can lead at least to change on the server and allow it to not hang.

Solution 3

Can try using socket.setdefaulttimeout(5) as alecxe suggested.

More details in urllib2 doc

Sockets and Layers

The Python support for fetching resources from the web is layered. urllib2 uses the httplib library, which in turn uses the socket library.

As of Python 2.3 you can specify how long a socket should wait for a response before timing out. This can be useful in applications which have to fetch web pages. By default the socket module has no timeout and can hang. Currently, the socket timeout is not exposed at the httplib or urllib2 levels. However, you can set the default timeout globally for all sockets using

import socket
import urllib2

# timeout in seconds
timeout = 10
socket.setdefaulttimeout(timeout)
Share:
10,667
Matteo Monti
Author by

Matteo Monti

Updated on June 04, 2022

Comments

  • Matteo Monti
    Matteo Monti almost 2 years

    Hope this is quite a simple question, but it's driving me crazy. I'm using Python 2.7.3 on an out of the box installation of ubuntu 12.10 server. I kept zooming on the problem until I got to this snippet:

    import urllib2
    x=urllib2.urlopen("http://casacinema.eu/movie-film-Matrix+trilogy+123+streaming-6165.html", timeout=5)
    

    It simply hangs forever, never goes on timeout. I'm evidently doing something wrong. Anybody could please help? Thank you very much indeed!

    Matteo