Why doesn't requests.get() return? What is the default timeout that requests.get() uses?
Solution 1
What is the default timeout that get uses?
The default timeout is None
, which means it'll wait (hang) until the connection is closed.
Just specify a timeout value, like this:
r = requests.get(
'http://www.example.com',
proxies={'http': '222.255.169.74:8080'},
timeout=5
)
Solution 2
From requests documentation:
You can tell Requests to stop waiting for a response after a given number of seconds with the timeout parameter:
>>> requests.get('http://github.com', timeout=0.001) Traceback (most recent call last): File "<stdin>", line 1, in <module> requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)
Note:
timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds).
It happens a lot to me that requests.get() takes a very long time to return even if the timeout
is 1 second. There are a few way to overcome this problem:
1. Use the TimeoutSauce
internal class
From: https://github.com/kennethreitz/requests/issues/1928#issuecomment-35811896
import requests from requests.adapters import TimeoutSauce class MyTimeout(TimeoutSauce): def __init__(self, *args, **kwargs): if kwargs['connect'] is None: kwargs['connect'] = 5 if kwargs['read'] is None: kwargs['read'] = 5 super(MyTimeout, self).__init__(*args, **kwargs) requests.adapters.TimeoutSauce = MyTimeout
This code should cause us to set the read timeout as equal to the connect timeout, which is the timeout value you pass on your Session.get() call. (Note that I haven't actually tested this code, so it may need some quick debugging, I just wrote it straight into the GitHub window.)
2. Use a fork of requests from kevinburke: https://github.com/kevinburke/requests/tree/connect-timeout
From its documentation: https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst
If you specify a single value for the timeout, like this:
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:
r = requests.get('https://github.com', timeout=(3.05, 27))
NOTE: The change has since been merged to the main Requests project.
3. Using evenlet
or signal
as already mentioned in the similar question:
Timeout for python requests.get entire response
Solution 3
I wanted a default timeout easily added to a bunch of code (assuming that timeout solves your problem)
This is the solution I picked up from a ticket submitted to the repository for Requests.
credit: https://github.com/kennethreitz/requests/issues/2011#issuecomment-477784399
The solution is the last couple of lines here, but I show more code for better context. I like to use a session for retry behaviour.
import requests
import functools
from requests.adapters import HTTPAdapter,Retry
def requests_retry_session(
retries=10,
backoff_factor=2,
status_forcelist=(500, 502, 503, 504),
session=None,
) -> requests.Session:
session = session or requests.Session()
retry = Retry(
total=retries,
read=retries,
connect=retries,
backoff_factor=backoff_factor,
status_forcelist=status_forcelist,
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
# set default timeout
for method in ('get', 'options', 'head', 'post', 'put', 'patch', 'delete'):
setattr(session, method, functools.partial(getattr(session, method), timeout=30))
return session
then you can do something like this:
requests_session = requests_retry_session()
r = requests_session.get(url=url,...
Solution 4
Reviewed all the answers and came to conclusion that the problem still exists. On some sites requests may hang infinitely and using multiprocessing seems to be overkill. Here's my approach(Python 3.5+):
import asyncio
import aiohttp
async def get_http(url):
async with aiohttp.ClientSession(conn_timeout=1, read_timeout=3) as client:
try:
async with client.get(url) as response:
content = await response.text()
return content, response.status
except Exception:
pass
loop = asyncio.get_event_loop()
task = loop.create_task(get_http('http://example.com'))
loop.run_until_complete(task)
result = task.result()
if result is not None:
content, status = task.result()
if status == 200:
print(content)
UPDATE
If you receive a deprecation warning about using conn_timeout and read_timeout, check near the bottom of THIS reference for how to use the ClientTimeout data structure. One simple way to apply this data structure per the linked reference to the original code above would be:
async def get_http(url):
timeout = aiohttp.ClientTimeout(total=60)
async with aiohttp.ClientSession(timeout=timeout) as client:
try:
etc.
Solution 5
In my case, the reason of "requests.get never returns" is because requests.get()
attempt to connect to the host resolved with ipv6 ip first. If something went wrong to connect that ipv6 ip and get stuck, then it retries ipv4 ip only if I explicit set timeout=<N seconds>
and hit the timeout.
My solution is monkey-patching the python socket
to ignore ipv6(or ipv4 if ipv4 not working), either this answer or this answer are works for me.
You might wondering why curl
command is works, because curl
connect ipv4 without waiting for ipv6 complete. You can trace the socket syscalls with strace -ff -e network -s 10000 -- curl -vLk '<your url>'
command. For python, strace -ff -e network -s 10000 -- python3 <your python script>
command can be used.
Related videos on Youtube
Nawaz
Following Rust and Haskell isocpp.org/wiki/faq Contact me on LinkedIn. Religion of C Correcting Grammar for Microsoft Products and Technology
Updated on May 06, 2021Comments
-
Nawaz about 3 years
In my script,
requests.get
never returns:import requests print ("requesting..") # This call never returns! r = requests.get( "http://www.some-site.com", proxies = {'http': '222.255.169.74:8080'}, ) print(r.ok)
What could be the possible reason(s)? Any remedy? What is the default timeout that
get
uses?-
Nawaz almost 11 years@user2357112: Does it matter? I doubt.
-
user2357112 almost 11 yearsIt definitely matters. If you provide the URL you're trying to access and the proxy you're trying to use, we can see what happens when we try to send similar requests.
-
Nawaz almost 11 years@user2357112: Alright. Edited the question.
-
Ian Stapleton Cordasco almost 11 yearsYour proxy is also incorrect. You must specify it like so:
proxies={'http': 'http://222.255.169.74:8080'}
. That could be why it isn't completing without a timeout.
-
-
Nawaz almost 11 yearsI think you're right.
None
means infinite (or "wait until the connection is close"). If I pass timeout myself, it returns! -
User almost 10 yearsYou never answered what the default is
-
jaapz over 9 years@User timeout works just as fine with https as it does with http
-
DDay almost 7 yearsQuote:You can tell Requests to stop waiting for a response after a given number of seconds with the timeout parameter. Nearly all production code should use this parameter in nearly all requests. Failure to do so can cause your program to hang indefinitely: Note timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do not time out.
-
wordsforthewise over 6 yearsThis seems really hard to find in the docs by googling or otherwise. Anyone know where this shows up in the docs?
-
ron rothman over 6 years@wordsforthewise docs.python-requests.org/en/master/user/quickstart/#timeouts
-
wordsforthewise over 6 yearsThanks, doing
print(requests.request.__doc__)
in IPython is more of what I was looking for though. I was wondering what other optional arguments torequest.get()
there were. -
Alex Polekha over 6 years@Nawaz Python 3.5+. Thank you for the question, updated the answer with Python version. It's legal Python code. Please take a look at aiohttp documentation aiohttp.readthedocs.io/en/stable/index.html
-
Sinan Çetinkaya over 5 yearsCode has a typo: import requests<new line here> from requests.adapters import TimeoutSauce
-
Ehsan88 over 4 yearsIsn't this a bad design?!
-
ron rothman about 4 years@Ehsan88 Huh? No. What are you talking about?
-
ron rothman about 4 years@Ehsan88 Even with the
timeout
parameter? -
Ehsan88 about 4 years@ronrothman I'm just saying that it makes sense if the request assume a default timeout like 30s instead of going forever when no timeout is provided. Just like other request libraries.
-
ron rothman about 4 years@Ehsan88 I see, thanks for clarifying. Your comment just says “this,” which makes it sound like my answer is a bad design; when what you really mean is that
requests.get
is a bad design. -
Thom Ives about 4 yearsThis solved my issues when other methods would not. Py 3.7. Due to deprications, had to use ... timeout = aiohttp.ClientTimeout(total=60) async with aiohttp.ClientSession(timeout=timeout) as client:
-
smm about 3 yearsDocs for timeout: docs.python-requests.org/en/master/user/advanced/#timeouts
-
André C. Andersen almost 2 yearsDocs seem to have moved, the above domains fail, new location: requests.readthedocs.io/en/latest/user/advanced/#timeouts