requests library https get via proxy leads to error

16,577

The answer is that the HTTP case is bugged. The expected behaviour in that case is the same as the HTTPS case: that is, you provide your authentication credentials in the proxy URL.

The reason the header option doesn't work for HTTPS is that HTTPS via proxies is totally different to HTTP via proxies. When you route a HTTP request via a proxy, you essentially just send a standard HTTP request to the proxy with a path that indicates a totally different host, like this:

GET http://www.google.com/ HTTP/1.1
Host: www.google.com

The proxy then basically forwards this on.

For HTTPS that can't possibly work, because you need to negotiate an SSL connection with the remote server. Rather than doing anything like the HTTP case, you use the CONNECT verb. The proxy server connects to the remote end on behalf of the client, and from them on just proxies the TCP data. (More information here.)

When you attach a Proxy-Authorization header to the HTTPS request, we don't put it on the CONNECT message, we put it on the tunnelled HTTPS message. This means the proxy never sees it, so refuses your connection. We special-case the authentication information in the proxy URL to make sure it attaches the header correctly to the CONNECT message.

Requests and urllib3 are currently in discussion about the right place for this bug fix to go. The GitHub issue is currently here. I expect that the fix will be in the next Requests release.

Share:
16,577
mingxiao
Author by

mingxiao

Updated on July 27, 2022

Comments

  • mingxiao
    mingxiao almost 2 years

    Trying to send a simple get request via a proxy. I have the 'Proxy-Authorization' and 'Authorization' headers, don't think I needed the 'Authorization' header, but added it anyway.

    import requests
    URL = 'https://www.google.com'
    sess = requests.Session()
    user = 'someuser'
    password = 'somepass'
    token = base64.encodestring('%s:%s'%(user,password)).strip()
    sess.headers.update({'Proxy-Authorization':'Basic %s'%token})
    sess.headers['Authorization'] = 'Basic %s'%token
    resp = sess.get(URL)
    

    I get the following error:

    requests.packages.urllib3.exceptions.ProxyError: Cannot connect to proxy. Socket error: Tunnel connection failed: 407 Proxy Authentication Required.
    

    However when I change the URL to simple http://www.google.com, it works fine.

    Do proxies use Basic, Digest, or some other sort of authentication for https? Is it proxy server specific? How do I discover that info? I need to achieve this using the requests library.

    UPDATE

    Its seems that with HTTP requests we have to pass in a Proxy-Authorization header, but with HTTPS requests, we need to format the proxy URL with the username and password

    #HTTP
    import requests, base64
    URL = 'http://www.google.com'
    user = <username>
    password = <password>
    proxy = {'http': 'http://<IP>:<PORT>}
    token = base64.encodestring('%s:%s' %(user, password)).strip()
    myheader = {'Proxy-Authorization': 'Basic %s' %token}
    r = requests.get(URL, proxies = proxies, headers = myheader)
    print r.status_code # 200
    
    
    #HTTPS
    import requests
    URL = 'https://www.google.com'
    user = <username>
    password = <password>
    proxy = {'http': 'http://<user>:<password>@<IP>:<PORT>}
    r = requests.get(URL, proxies = proxy)
    print r.status_code  # 200
    

    When sending an HTTP request, if I leave out the header and pass in a proxy formatted with user/pass, I get a 407 response.

    When sending an HTTPS request, if I pass in the header and leave the proxy unformatted I get a ProxyError mentioned earlier.

    I am using requests 2.0.0, and a Squid proxy-caching web server. Why doesn't the header option work for HTTPS? Why does the formatted proxy not work for HTTP?