Python Requests hanging/freezing

26,044

Solution 1

Seems like setting a (read) timeout might help you.

Something along the lines of:

response = response.get(url, timeout=5)

(This will set both connect and read timeout to 5 seconds.)

In requests, unfortunately, neither connect nor read timeouts are set by default, even though the docs say it's good to set it:

Most requests to external servers should have a timeout attached, in case the server is not responding in a timely manner. By default, requests do not time out unless a timeout value is set explicitly. Without a timeout, your code may hang for minutes or more.

Just for completeness, the connect timeout is the number of seconds requests will wait for your client to establish a connection to a remote machine, and the read timeout is the number of seconds the client will wait between bytes sent from the server.

Solution 2

Patching the documented "send" function will fix this for all requests - even in many dependent libraries and sdk's. When patching libs, be sure to patch supported/documented functions, otherwise you may wind up silently losing the effect of your patch.

import requests

DEFAULT_TIMEOUT = 180

old_send = requests.Session.send

def new_send(*args, **kwargs):
     if kwargs.get("timeout", None) is None:
         kwargs["timeout"] = DEFAULT_TIMEOUT
     return old_send(*args, **kwargs)

requests.Session.send = new_send

The effects of not having any timeout are quite severe, and the use of a default timeout can almost never break anything - because TCP itself has timeouts as well.

On Windows the default TCP timeout is 240 seconds, TCP RFC recommend a minimum of 100 seconds for RTO*retry. Somewhere in that range is a safe default.

Solution 3

To set timeout globally instead of specifying in every request:


from requests.adapters import TimeoutSauce

REQUESTS_TIMEOUT_SECONDS = float(os.getenv("REQUESTS_TIMEOUT_SECONDS", 5))

class CustomTimeout(TimeoutSauce):
    def __init__(self, *args, **kwargs):
        if kwargs["connect"] is None:
            kwargs["connect"] = REQUESTS_TIMEOUT_SECONDS
        if kwargs["read"] is None:
            kwargs["read"] = REQUESTS_TIMEOUT_SECONDS
        super().__init__(*args, **kwargs)


# Set it globally, instead of specifying ``timeout=..`` kwarg on each call.
requests.adapters.TimeoutSauce = CustomTimeout


sess = requests.Session()
sess.get(...)
sess.post(...)
Share:
26,044
Hobbit36
Author by

Hobbit36

Updated on August 11, 2022

Comments

  • Hobbit36
    Hobbit36 over 1 year

    I'm using the requests library to get a lot of webpages from somewhere. He's the pertinent code:

    response = requests.Session()
    retries = Retry(total=5, backoff_factor=.1)
    response.mount('http://', HTTPAdapter(max_retries=retries))
    response = response.get(url)
    

    After a while it just hangs/freezes (never on the same webpage) while getting the page. Here's the traceback when I interrupt it:

    File "/Users/Student/Hockey/Scrape/html_pbp.py", line 21, in get_pbp
      response = r.read().decode('utf-8')
    File "/anaconda/lib/python3.6/http/client.py", line 456, in read
      return self._readall_chunked()
    File "/anaconda/lib/python3.6/http/client.py", line 566, in _readall_chunked
      value.append(self._safe_read(chunk_left))
    File "/anaconda/lib/python3.6/http/client.py", line 612, in _safe_read
      chunk = self.fp.read(min(amt, MAXAMOUNT))
    File "/anaconda/lib/python3.6/socket.py", line 586, in readinto
      return self._sock.recv_into(b)
    KeyboardInterrupt
    

    Does anybody know what could be causing it? Or (more importantly) does anybody know a way to stop it if it takes more than a certain amount of time so that I could try again?