ReadTimeout: HTTPSConnectionPool(host='', port=443): Read timed out. (read timeout=10)

python python-3.x web-scraping beautifulsoup python-requests

37,880

Solution 1

I was helped by increasing the timeout, immediately set 120 seconds. It turned out that the response from the server comes within 40 seconds.

Solution 2

Why do you have the timeout parameter in there? I would just eliminate the timeout parameter. The reason you get that error is because you set it to 10 which says if you don't receive a response from the server in 10 seconds, raise and error. So it's not necessarily the server calling you out. If no timeout is specified explicitly, requests do not time out (at least on your end).

page_one = requests.get(url, headers=headers)  #< --- don't use the timeout parameter

Solution 3

This exception might occurs due to timeout or the available memory:

The response from the server takes longer than the specified timeout. So to solve it you need to set a higher timeout.
The file your are trying to read is large and the socket buffer is not enough to handle it. So you can try increasing the buffer size based on your machine's capacity.

        import urllib3, socket
        from urllib3.connection import HTTPConnection
    
        HTTPConnection.default_socket_options = ( 
            HTTPConnection.default_socket_options + [
            (socket.SOL_SOCKET, socket.SO_SNDBUF, 1000000), #1MB in byte
            (socket.SOL_SOCKET, socket.SO_RCVBUF, 1000000)
        ])

37,880

Author by

JB_

Student of software development. Back-end [C#, PHP] Front-end [HTML, CSS and JS]

Updated on May 28, 2021

Comments

JB_ almost 3 years

I'm doing a webscraping on a site and sometimes when running the script I get this error:

ReadTimeout: HTTPSConnectionPool(host='...', port=443): Read timed out. (read timeout=10)

My code:

url = 'mysite.com'
all_links_page = []
page_one = requests.get(url, headers=getHeaders(), timeout=10)
sleep(2)
if page_one.status_code == requests.codes.ok:
    soup_one = BeautifulSoup(page_one.content.decode('utf-8'), 'lxml')
    page_links_one = soup_one.select("ul.product_list") 

    for links_one in page_links_one:
        for li in links_one.select("li"):
            all_links_page.append(li.a.get("href").strip())

The answers I found was not satisfactory

JB_ over 4 years

Am I using this parameter to prevent site blocking, or am I wrong?
wishmaster over 4 years

I believe it is always better to set timeout, server can keep a request hanging for quite a while specially if it suspects a bot, thus storing the link and requesting it later or using a proxy to ask again might solve it.
chitown88 almost 2 years

@wishmaster thats a good point. Probably better to increase the timeout parameter here then.