Proxies with Python 'Requests' module

496,089

Solution 1

The proxies' dict syntax is {"protocol": "scheme://ip:port", ...}. With it you can specify different (or the same) proxie(s) for requests using http, https, and ftp protocols:

http_proxy  = "http://10.10.1.10:3128"
https_proxy = "https://10.10.1.11:1080"
ftp_proxy   = "ftp://10.10.1.10:3128"

proxies = { 
              "http"  : http_proxy, 
              "https" : https_proxy, 
              "ftp"   : ftp_proxy
            }

r = requests.get(url, headers=headers, proxies=proxies)

Deduced from the requests documentation:

Parameters:
method – method for the new Request object.
url – URL for the new Request object.
...
proxies – (optional) Dictionary mapping protocol to the URL of the proxy.
...


On linux you can also do this via the HTTP_PROXY, HTTPS_PROXY, and FTP_PROXY environment variables:

export HTTP_PROXY=10.10.1.10:3128
export HTTPS_PROXY=10.10.1.11:1080
export FTP_PROXY=10.10.1.10:3128

On Windows:

set http_proxy=10.10.1.10:3128
set https_proxy=10.10.1.11:1080
set ftp_proxy=10.10.1.10:3128

Solution 2

You can refer to the proxy documentation here.

If you need to use a proxy, you can configure individual requests with the proxies argument to any request method:

import requests

proxies = {
  "http": "http://10.10.1.10:3128",
  "https": "https://10.10.1.10:1080",
}

requests.get("http://example.org", proxies=proxies)

To use HTTP Basic Auth with your proxy, use the http://user:[email protected]/ syntax:

proxies = {
    "http": "http://user:[email protected]:3128/"
}

Solution 3

I have found that urllib has some really good code to pick up the system's proxy settings and they happen to be in the correct form to use directly. You can use this like:

import urllib

...
r = requests.get('http://example.org', proxies=urllib.request.getproxies())

It works really well and urllib knows about getting Mac OS X and Windows settings as well.

Solution 4

The accepted answer was a good start for me, but I kept getting the following error:

AssertionError: Not supported proxy scheme None

Fix to this was to specify the http:// in the proxy url thus:

http_proxy  = "http://194.62.145.248:8080"
https_proxy  = "https://194.62.145.248:8080"
ftp_proxy   = "10.10.1.10:3128"

proxyDict = {
              "http"  : http_proxy,
              "https" : https_proxy,
              "ftp"   : ftp_proxy
            }

I'd be interested as to why the original works for some people but not me.

Edit: I see the main answer is now updated to reflect this :)

Solution 5

If you'd like to persisist cookies and session data, you'd best do it like this:

import requests

proxies = {
    'http': 'http://user:[email protected]:3128',
    'https': 'https://user:[email protected]:3128',
}

# Create the session and set the proxies.
s = requests.Session()
s.proxies = proxies

# Make the HTTP request through the session.
r = s.get('http://www.showmemyip.com/')
Share:
496,089

Related videos on Youtube

Piotr Dobrogost
Author by

Piotr Dobrogost

se2021 at p.dobrogost.net

Updated on July 17, 2022

Comments

  • Piotr Dobrogost
    Piotr Dobrogost almost 2 years

    Just a short, simple one about the excellent Requests module for Python.

    I can't seem to find in the documentation what the variable 'proxies' should contain. When I send it a dict with a standard "IP:PORT" value it rejected it asking for 2 values. So, I guess (because this doesn't seem to be covered in the docs) that the first value is the ip and the second the port?

    The docs mention this only:

    proxies – (optional) Dictionary mapping protocol to the URL of the proxy.

    So I tried this... what should I be doing?

    proxy = { ip: port}
    

    and should I convert these to some type before putting them in the dict?

    r = requests.get(url,headers=headers,proxies=proxy)
    
  • chown
    chown over 12 years
    @cigar I knew because urllib2 uses the exact same format for their proxies dict, and when I saw docs.python-requests.org/en/latest/api/#module-requests say "proxies – (optional) Dictionary mapping protocol to the URL of the proxy.", I knew right away.
  • Admin
    Admin over 12 years
    ahhh i see, never used proxies with urllib2 because of the advice to get rid of it obtained from here, replaced 2 pages of code with 8 lines :/ re:shoulder :))) great stay here, you have already saved me hours in total! if you ever need any help with music gimme a shout, that i can give advice on, otherwise cant think of way to repay other than massive thanks or cups of tea!
  • dzen
    dzen over 12 years
    It seems requests and moreover urllib3 can't do a CONNECT when using a proxy :(
  • chown
    chown over 12 years
    @dzen I have not yet used urllib3 so I'll have to look into that. Thanks for the heads up.
  • dzen
    dzen over 12 years
    request is a wrapper on urllib3 which is bundle into this module. github.com/kennethreitz/requests/tree/develop/requests/packa‌​ges/…
  • Jay
    Jay about 10 years
    changed with 2.0.0: Proxy URLs now must have an explicit scheme. A MissingSchema exception will be raised if they don't.
  • Jay
    Jay about 10 years
    @chown the syntax changed with requests 2.0.0. You'll need to add a schema to the url: docs.python-requests.org/en/latest/user/advanced/#proxies It'd nice if you could add this to your answer here
  • Johannes Charra
    Johannes Charra over 9 years
    @Jay: I added the URL schema.
  • Jonas Lejon
    Jonas Lejon over 9 years
    Does it work without a proxy? Some of our users has no proxy and some has.
  • Shravan
    Shravan over 8 years
    @jonasl Yes, it does work even when there's no system proxy defined. In that case, it's just an empty dict.
  • loretoparisi
    loretoparisi about 8 years
    This will not work for socks5 proxy: 'http' : "socks5://myproxy:9191",
  • jrwren
    jrwren over 7 years
    Does it include no_proxy and does requests respect no_proxy? Nevermind, it seems there are solutions: github.com/kennethreitz/requests/issues/879
  • Zahra
    Zahra about 7 years
    getting err: module 'urllib' has no attribute 'getproxies'
  • oliche
    oliche about 7 years
    Greenish: urllib.request.getproxies()
  • jamshid
    jamshid almost 6 years
    Those are bad examples of the linux environment variables HTTP_PROXY and HTTPS_PROXY. The protocol should always be included (not just host:port), and either proxy can itself be "https" or "http". HTTPS_PROXY=myhttpsproxy:8080 is valid, it just means proxy "https" requests using myhttpsproxy:8080 instead of the value of HTTP_PROXY. If you don't define HTTPS_PROXY linux apps typically use CONNECT over the HTTP_PROXY.
  • rleelr
    rleelr almost 5 years
    @Zahra try urllib2.getproxies()
  • MasayoMusic
    MasayoMusic almost 5 years
    What if you want multiple proxies per protocol. Currently you just have one for each.
  • merc1er
    merc1er about 4 years
  • the_economist
    the_economist over 3 years
    @Zahra: use import urllib.request and afterwards urllib.request.getproxies(). Source: stackoverflow.com/questions/37042152/…
  • shekhar chander
    shekhar chander about 3 years
    Do they allow unlimited scraping?
  • MrKsn
    MrKsn about 3 years
    Not only protocol, but host is possible as well: proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}
  • bfontaine
    bfontaine over 2 years
    This has nothing to do with OP's question
  • Simplecode
    Simplecode over 2 years
    Do we have to send the "Proxy-Connection: Keep-alive" header manually in the python requests ?
  • QUEEN
    QUEEN about 2 years
    @chown I tried out the exact same code with my IP and port number but it still blocks the website that I use to scrape data from(craiglist.com). Any idea about this?
  • QUEEN
    QUEEN about 2 years
    @chown I replaced the 'http_proxy ' with my IP address and the port as 8888 because that's where my localhost was running. Should the port values like '3128,1080' be the same for all devices? What about the IP addresses then? if the website gets hits from the same IPs every time, it will surely block!
  • t3chb0t
    t3chb0t almost 2 years
    I like this last resort solution that no one else mentioned here. It just saved my day as there was no other way of passing proxy settings to a 3rd party library I'm using.