Scraping a website with python 3 that requires login

11,392

Solution 1

I see you're using requests. The syntax for logging in to a site is as follows:

import requests
page = requests.get("http://localhost:8080/login?from=%2F", auth=
('username', 'password'))

Hope this helps! You can read more about authentication here: http://docs.python-requests.org/en/master/user/authentication/

Solution 2

With MechanicalSoup, you first need to specify the form you want to fill-in and submit. If you have only one form, use:

browser.select_form()

Then, after filling-in the form, you need to submit it:

browser.submit_selected()

You may read the (newly written) MechanicalSoup tutorial or look at examples like logging in into GitHub with MechanicalSoup.

Share:
11,392

Related videos on Youtube

Josh
Author by

Josh

Updated on June 04, 2022

Comments

  • Josh
    Josh almost 2 years

    Just a question regarding some scraping authentication. Using BeautifulSoup:

    #importing the requests lib  
    import requests
    from bs4 import BeautifulSoup
    
    #specifying the page
    page = requests.get("http://localhost:8080/login?from=%2F")
    #parsing through the api
    soup = BeautifulSoup(page.content, 'html.parser')
    print(soup.prettify())
    

    From here the output, I think would be important:

     <table>
       <tr>
        <td>
         User:
        </td>
        <td>
         <input autocapitalize="off" autocorrect="off" id="j_username" name="j_username" type="text"/>
        </td>
       </tr>
       <tr>
        <td>
         Password:
        </td>
        <td>
         <input name="j_password" type="password"/>
        </td>
       </tr>
       <tr>
        <td align="right">
         <input id="remember_me" name="remember_me" type="checkbox"/>
        </td>
        <td>
         <label for="remember_me">
          Remember me on this computer
         </label>
        </td>
       </tr>
      </table>
    

    This scrapes the website fine, but it requires a login. Here I am using the mechanicalsoup library:

    import mechanicalsoup
    
    browser = mechanicalsoup.StatefulBrowser()
    browser.open("http://localhost:8080/login?from=%2F")
    browser.get_url()
    browser.get_current_page()
    browser.get_current_page().find_all('form')
    browser["j_username"] = "admin"
    browser ["j_password"] = "password"
    browser.launch_browser()
    

    However it still won't let me login.

    Has anyone used a scraping tool for python 3 that lets them scrape a site that has authentication?

    • Mitchell van Zuylen
      Mitchell van Zuylen over 6 years
      You could into using something like Selenium. It allows you to interact with the website to do things like a login.
    • Lupanoide
      Lupanoide over 6 years
      have you tried the requests methods for auth? docs.python-requests.org/en/master/user/authentication