Python Requests - managing cookies

43,739

Solution 1

I had a similar problem and found help in this question. The session jar was empty and to actually get the cookie I needed to use a session.

session = requests.session()
p = session.post("http://example.com", {'user':user,'password':password})
print 'headers', p.headers
print 'cookies', requests.utils.dict_from_cookiejar(session.cookies)
print 'html',  p.text

Solution 2

You should be reusing the whole session object, not the associated cookiejar. Use self.s for all requests you make.

If your requests are still failing when reusing the session, they will be failing for a different reason, not because you are not properly returning cookies.

Note that if you need to use auth=('username', 'password') then the authentication is HTTPAuth-based, not cookie-based. You need to pass in the same authentication for all calls. The requests session can do that for you too:

s = requests.session(auth=('username', 'password'))

If, however, the login page is a form with a username and password field, you'll need to call the form target instead. Check if the form is POST or GET, and check the fieldnames:

s.post(loginTarget, {usernamefield=username, passwordfield=password, otherfield=othervalue})

and not use HTTP authentication at all.

Share:
43,739
Jay Gattuso
Author by

Jay Gattuso

I'm not a coder by any stretch of the imagination. I'm trying to learn, solo, and I always really appreciate any help!

Updated on October 02, 2020

Comments

  • Jay Gattuso
    Jay Gattuso over 3 years

    I'm trying to get some content automatically from a site using requests (and bs4)

    I have a script that gets a cookie:

    def getCookies(self):
        username = 'username'
        password = 'password'
        URL = 'logonURL'
        r = requests.get(URL, auth=('username', 'password'))
        cookies = r.cookies
    

    dump of the cookies looks like:

    <<class 'requests.cookies.RequestsCookieJar'>[<Cookie ASP.NET_SessionId=yqokjr55ezarqbijyrwnov45 for URL.com/>, <Cookie BIGipServerPE_Journals.lww.com_80=1440336906.20480.0000 for URL.com/>, <Cookie JournalsLockCookie=id=a5720750-3f20-4207-a500-93ae4389213c&ip=IP address for URL.com/>]>
    

    But when I pass the cookie object to the next URL:

     soup = Soup(s.get(URL, cookies = cookies).content)
    

    its not working out - I can see by dumping the soup that I'm not giving the webserver my credentials properly

    I tried running a requests session:

    def getCookies(self):
        self.s = requests.session()
        username = 'username'
        password = 'password'
        URL = 'logURL'
        r = self.s.get(URL, auth=('username', 'password'))
    

    and I get the same no joy.

    I looked at the header via liveHttp in FF when I visit the 2nd page, and see a very different form:

    Cookie: WT_FPC=id=264b0aa85e0247eb4f11355304127862:lv=1355317068013:ss=1355314918680; UserInfo=Username=username; BIGipServerPE_Journals.lww.com_80=1423559690.20480.0000; PlatformAuthCookie=true; Institution=ReferrerUrl=http://logonURL.com/?wa=wsignin1.0&wtrealm=urn:adis&wctx=http://URL.com/_layouts/Authenticate.aspx?Source=%252fpecnews%252ftoc%252f2012%252f06440&token=method|ExpireAbsolute; counterSessionGuidId=6e2bd57f-b6da-4dd4-bcb0-742428e08b5e; MyListsRefresh=12/13/2012 12:59:04 AM; ASP.NET_SessionId=40a04p45zppozc45wbadah45; JournalsLockCookie=id=85d1f38f-dcbb-476a-bc2e-92f7ac1ae493&ip=10.204.217.84; FedAuth=77u/PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz48U2VjdXJpdHlDb250ZXh0VG9rZW4gcDE6SWQ9Il9mMGU5N2M3Zi1jNzQ5LTQ4ZjktYTUxNS1mODNlYjJiNGNlYzUtNEU1MDQzOEY0RTk5QURCNDFBQTA0Mjc0RDE5QzREMEEiIHhtbG5zOnAxPSJodHRwOi8vZG9jcy5vYXNpcy1vcGVuLm9yZy93c3MvMjAwNC8wMS9vYXNpcy0yMDA0MDEtd3NzLXdzc2VjdXJpdHktdXRpbGl0eS0xLjAueHNkIiB4bWxucz0iaHR0cDovL2RvY3Mub2FzaXMtb3Blbi5vcmcvd3Mtc3gvd3Mtc2VjdXJlY29udmVyc2F0aW9uLzIwMDUxMiI+PElkZW50aWZpZXI+dXJuOnV1aWQ6ZjJmNGY5MGItMmE4Yy00OTdlLTkwNzktY2EwYjM3MTBkN2I1PC9JZGVudGlmaWVyPjxJbnN0YW5jZT51cm46dXVpZDo2NzMxN2U5Ny1lMWQ3LTQ2YzUtOTg2OC05ZGJhYjA3NDkzOWY8L0luc3RhbmNlPjwvU2VjdXJpdHlDb250ZXh0VG9rZW4+
    

    I have redacted the username, password, and URLS from the question for obvious reasons.

    Am I missing something obvious? is there a different / proper way to capture the cookie - the current method I'm using is not working.

    EDIT:

    This is a self standing version of the sessioned code:

    s = requests.session()
    username = 'username'
    password = 'password'
    URL = 'logonURL.aspx'
    r = s.get(URL, auth=('username', 'password'))
    URL = r"URL.aspx"
    soup = Soup(s.get(URL).content)
    

    reading a dump of the soup, I can see in the html that its telling me I don't have access - this string only appears via browser when you're not logged in.

    • voithos
      voithos over 11 years
      Are you sure that the authentication process is in fact a GET request, and not a POST?
    • Jay Gattuso
      Jay Gattuso over 11 years
      A very good question. How would I tell? (liveHTTP says: GET /ig_res/default/ig_dialogwindow.css HTTP/1.1)
    • voithos
      voithos over 11 years
      If you're trying to "login" to a site, try logging in using a normal browser, and observe the requests that are sent to the server. If you've got Chrome, you can use the built-in developer tools. If you're using Firefox, you can use Firebug. As a side note, most form-based submissions (e.g. a login form) are POST requests.
    • voithos
      voithos over 11 years
      No, that's a CSS (stylesheet) file. Try just using a POST and see what happens. As in, r = s.post(URL, auth=('username', 'password')) in your session-based request.
    • Jay Gattuso
      Jay Gattuso over 11 years
      I logged out, then logged in again. The first header I see is a GET /ig_res/default/ig_dialogwindow.css HTTP/1.1 that has an object called Cookie: WT_FPC=id= etc its the same string I offered previously. I guess that makes it a POST header...
    • Jay Gattuso
      Jay Gattuso over 11 years
      OK. I tried that, same thing. "You currently do not have access to this article"
    • voithos
      voithos over 11 years
      No, it doesn't. A post request begins with POST. What you are seeing is a GET request for the stylesheet (your browser is asking the server for the stylesheet, so that it can display the page nicely).
    • voithos
      voithos over 11 years
      Are you trying to POST to the page that you want to retrieve? You need to POST to the login URL first, then GET the page that you need (using the same session).
    • Jay Gattuso
      Jay Gattuso over 11 years
      Aye, I POSTed the credentials. I'll hit it again, just to make sure! Yupe. No joy. Thanks though! I appreciate your pointers.
  • Jay Gattuso
    Jay Gattuso over 11 years
    I added a self standing example of the sessioned version - is that correct?