Logging into SAML/Shibboleth authenticated server using python

26,444

Solution 1

Basically what you have to understand is the workflow behind a SAML authentication process. Unfortunately, there is no PDF out there which seems to really provide a good help in finding out what kind of things the browser does when accessing to a SAML protected website.

Maybe you should take a look to something like this: http://www.docstoc.com/docs/33849977/Workflow-to-Use-Shibboleth-Authentication-to-Sign and obviously to this: http://en.wikipedia.org/wiki/Security_Assertion_Markup_Language. In particular, focus your attention to this scheme:

enter image description here

What I did when I was trying to understand SAML way of working, since documentation was so poor, was writing down (yes! writing - on the paper) all the steps the browser was doing from the first to the last. I used Opera, setting it in order to not allow automatic redirects (300, 301, 302 response code, and so on), and also not enabling Javascript. Then I wrote down all the cookies the server was sending me, what was doing what, and for what reason.

Maybe it was way too much effort, but in this way I was able to write a library, in Java, which is suited for the job, and incredibily fast and efficient too. Maybe someday I will release it public...

What you should understand is that, in a SAML login, there are two actors playing: the IDP (identity provider), and the SP (service provider).

A. FIRST STEP: the user agent request the resource to the SP

I'm quite sure that you reached the link you reference in your question from another page clicking to something like "Access to the protected website". If you make some more attention, you'll notice that the link you followed is not the one in which the authentication form is displayed. That's because the clicking of the link from the IDP to the SP is a step for the SAML. The first step, actally. It allows the IDP to define who are you, and why you are trying to access its resource. So, basically what you'll need to do is making a request to the link you followed in order to reach the web form, and getting the cookies it'll set. What you won't see is a SAMLRequest string, encoded into the 302 redirect you will find behind the link, sent to the IDP making the connection.

I think that it's the reason why you can't mechanize the whole process. You simply connected to the form, with no identity identification done!

B. SECOND STEP: filling the form, and submitting it

This one is easy. Please be careful! The cookies that are now set are not the same of the cookies above. You're now connecting to a utterly different website. That's the reason why SAML is used: different website, same credentials. So you may want to store these authentication cookies, provided by a successful login, to a different variable. The IDP now is going to send back you a response (after the SAMLRequest): the SAMLResponse. You have to detect it getting the source code of the webpage to which the login ends. In fact, this page is a big form containing the response, with some code in JS which automatically subits it, when the page loads. You have to get the source code of the page, parse it getting rid of all the HTML unuseful stuff, and getting the SAMLResponse (encrypted).

C. THIRD STEP: sending back the response to the SP

Now you're ready to end the procedure. You have to send (via POST, since you're emulating a form) the SAMLResponse got in the previous step, to the SP. In this way, it will provide the cookies needed to access to the protected stuff you want to access.

Aaaaand, you're done!

Again, I think that the most precious thing you'll have to do is using Opera and analyzing ALL the redirects SAML does. Then, replicate them in your code. It's not that difficult, just keep in mind that the IDP is utterly different than the SP.

Solution 2

Selenium with the headless PhantomJS webkit will be your best bet to login into Shibboleth, because it handles cookies and even Javascript for you.

Installation:

$ pip install selenium
$ brew install phantomjs

from selenium import webdriver
from selenium.webdriver.support.ui import Select # for <SELECT> HTML form

driver = webdriver.PhantomJS()
# On Windows, use: webdriver.PhantomJS('C:\phantomjs-1.9.7-windows\phantomjs.exe')

# Service selection
# Here I had to select my school among others 
driver.get("http://ent.unr-runn.fr/uPortal/")
select = Select(driver.find_element_by_name('user_idp'))
select.select_by_visible_text('ENSICAEN')
driver.find_element_by_id('IdPList').submit()

# Login page (https://cas.ensicaen.fr/cas/login?service=https%3A%2F%2Fshibboleth.ensicaen.fr%2Fidp%2FAuthn%2FRemoteUser)
# Fill the login form and submit it
driver.find_element_by_id('username').send_keys("myusername")
driver.find_element_by_id('password').send_keys("mypassword")
driver.find_element_by_id('fm1').submit()

# Now connected to the home page
# Click on 3 links in order to reach the page I want to scrape
driver.find_element_by_id('tabLink_u1240l1s214').click()
driver.find_element_by_id('formMenu:linknotes1').click()
driver.find_element_by_id('_id137Pluto_108_u1240l1n228_50520_:tabledip:0:_id158Pluto_108_u1240l1n228_50520_').click()

# Select and print an interesting element by its ID
page = driver.find_element_by_id('_id111Pluto_108_u1240l1n228_50520_:tableel:tbody_element')
print page.text

Note:

  • during development, use Firefox to preview what you are doing driver = webdriver.Firefox()
  • this script is provided as-is and with the corresponding links, so you can compare each line of code with the actual source code of the pages (until login at least).

Solution 3

Extending the answer from Stéphane Bruckert above, once you have used Selenium to get the auth cookies, you can still switch to requests if you want to:

import requests
cook = {i['name']: i['value'] for i in driver.get_cookies()}
driver.quit()
r = requests.get("https://protected.ac.uk", cookies=cook)

Solution 4

You can find here a more detailed description of the Shibboleth authentication process.

Solution 5

Though already answered , hopefully this helps someone.I had a task of downloading files from an SAML Website and got help from Stéphane Bruckert's answer.

If headless is used then the wait time would need to be specified at the required intervals of redirection for login. Once the browser logged in I used the cookies from that and used it with the requests module to download the file - Got help from this.

This is how my code looks like-

from selenium import webdriver
from selenium.webdriver.chrome.options import Options  #imports

things_to_download= [a,b,c,d,e,f]     #The values changing in the url
options = Options()
options.headless = False
driver = webdriver.Chrome('D:/chromedriver.exe', options=options)
driver.get('https://website.to.downloadfrom.com/')
driver.find_element_by_id('username').send_keys("Your_username") #the ID would be different for different website/forms
driver.find_element_by_id('password').send_keys("Your_password")
driver.find_element_by_id('logOnForm').submit()
session = requests.Session()
cookies = driver.get_cookies()
for things in things_to_download:    
    for cookie in cookies: 
        session.cookies.set(cookie['name'], cookie['value'])
    response = session.get('https://website.to.downloadfrom.com/bla/blabla/' + str(things_to_download))
    with open('Downloaded_stuff/'+str(things_to_download)+'.pdf', 'wb') as f:
        f.write(response.content)            # saving the file
driver.close()
Share:
26,444

Related videos on Youtube

David Perlaza
Author by

David Perlaza

Updated on April 28, 2022

Comments

  • David Perlaza
    David Perlaza about 2 years

    I'm trying to login my university's server via python, but I'm entirely unsure of how to go about generating the appropriate HTTP POSTs, creating the keys and certificates, and other parts of the process I may be unfamiliar with that are required to comply with the SAML spec. I can login with my browser just fine, but I'd like to be able to login and access other contents within the server using python.

    For reference, here is the site

    I've tried logging in by using mechanize (selecting the form, populating the fields, clicking the submit button control via mechanize.Broswer.submit(), etc.) to no avail; the login site gets spat back each time.

    At this point, I'm open to implementing a solution in whichever language is most suitable to the task. Basically, I want to programatically login to SAML authenticated server.

  • Stéphane Bruckert
    Stéphane Bruckert almost 10 years
    thanks for these information. Would you please share your whole script? It would save me a lot of time!
  • chladni
    chladni almost 10 years
    I added the script to my post. That may get you started. However Shibb Login can vary. Compare with the results you see using a browser. May wante to turn off Javascript to simplify what your browset does behind the scenes.
  • Stéphane Bruckert
    Stéphane Bruckert almost 10 years
    Thank you so much. Gotta throw myself into it. I will keep you informed.
  • chladni
    chladni almost 10 years
    Let me know how it goes. There may an intermediate step between providing username and password.
  • Stéphane Bruckert
    Stéphane Bruckert almost 10 years
    I tried my best but I couldn't succeed to get the right cookies (compared to the ones shown on Live HTTP Headers). I finally succeeded to login and get the data I needed using Selenium with PhantomJS. It was so much easier! Anyway, thank you very much for your kind help.
  • chladni
    chladni almost 10 years
    Stéphane, sorry it did not work in Python. I appreciate hearing that Selenium worked for this. I need to start using that tool for automated testing, and would like to build Shibboleth testing into my suit. Any links on making Selenium work work with Shib would be helpful for me.
  • Stéphane Bruckert
    Stéphane Bruckert almost 10 years
    Here it is: stackoverflow.com/a/23929939/1515819. It's really easy with Selenium! I think you will have no problem doing what you want. If you need help, I'm here.
  • chladni
    chladni almost 10 years
    Stéphane: Thanks for the Selenium example; looking forward to working with Selenium + Python.
  • turbo
    turbo almost 10 years
    Hi Stéphane, I have implemented this snippet in Java, and it functions correctly if I use the Firefox web driver. However, when I use the HTML driver, it returns the page text of one of the authentication redirects. Is there a way to tell the driver to not grab the page source until it hits a certain URL, or to add some sort of delay? Thanks for any advice you can offer.
  • Stéphane Bruckert
    Stéphane Bruckert almost 10 years
    I would say this is the reason why I like to use browser drivers like Firefox or Chrome. Why don't you content yourself with that? I don't know the Java API nor its use with the HTML driver, so if you really want to achieve this I suggest you to create a new question because this is a bit out of the subject here. Good luck!
  • turbo
    turbo almost 10 years
    I'm doing some proof of concept for what will become a mobile application, probably not using Selenium in the end, but this makes it easier to test. Thank you anyway!
  • bjw
    bjw over 9 years
    Once you have the auth cookies you can also switch to requests (which is faster and easier to use IME)... Ive added a snipped below...
  • roiberg
    roiberg almost 8 years
    Hi Gianluca, I came across a similar problem and would very much like to avoid having to implement the same library. Any chance you release it publicly?
  • draysams
    draysams over 4 years
    This is the code for this stackoverflow.com/a/58598520/7831858 . Thanks for the help @Gianluca. This post was helped me figure the SAML login out.