Scraping a website with python 3 that requires login
Solution 1
I see you're using requests. The syntax for logging in to a site is as follows:
import requests
page = requests.get("http://localhost:8080/login?from=%2F", auth=
('username', 'password'))
Hope this helps! You can read more about authentication here: http://docs.python-requests.org/en/master/user/authentication/
Solution 2
With MechanicalSoup, you first need to specify the form you want to fill-in and submit. If you have only one form, use:
browser.select_form()
Then, after filling-in the form, you need to submit it:
browser.submit_selected()
You may read the (newly written) MechanicalSoup tutorial or look at examples like logging in into GitHub with MechanicalSoup.
Related videos on Youtube
Josh
Updated on June 04, 2022Comments
-
Josh almost 2 years
Just a question regarding some scraping authentication. Using
BeautifulSoup
:#importing the requests lib import requests from bs4 import BeautifulSoup #specifying the page page = requests.get("http://localhost:8080/login?from=%2F") #parsing through the api soup = BeautifulSoup(page.content, 'html.parser') print(soup.prettify())
From here the output, I think would be important:
<table> <tr> <td> User: </td> <td> <input autocapitalize="off" autocorrect="off" id="j_username" name="j_username" type="text"/> </td> </tr> <tr> <td> Password: </td> <td> <input name="j_password" type="password"/> </td> </tr> <tr> <td align="right"> <input id="remember_me" name="remember_me" type="checkbox"/> </td> <td> <label for="remember_me"> Remember me on this computer </label> </td> </tr> </table>
This scrapes the website fine, but it requires a login. Here I am using the
mechanicalsoup
library:import mechanicalsoup browser = mechanicalsoup.StatefulBrowser() browser.open("http://localhost:8080/login?from=%2F") browser.get_url() browser.get_current_page() browser.get_current_page().find_all('form') browser["j_username"] = "admin" browser ["j_password"] = "password" browser.launch_browser()
However it still won't let me login.
Has anyone used a scraping tool for python 3 that lets them scrape a site that has authentication?
-
Mitchell van Zuylen over 6 yearsYou could into using something like Selenium. It allows you to interact with the website to do things like a login.
-
Lupanoide over 6 yearshave you tried the requests methods for auth? docs.python-requests.org/en/master/user/authentication
-