How to parse the website using Beautifulsoup
16,799
Problem is not BeautifulSoup
but server which needs more information in requests to give you access to this page. Now it sends JavaScript code which redirects you to login page.
You need User-Agent
header to get this page.
You can use http://httpbin.org/get to see User-Agent
in your browser.
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0'}
url = "https://linkedin.com/company/1005"
r = requests.get(url, headers=headers)
print(r.text)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.prettify())
Author by
Sohaib Asif
Updated on June 04, 2022Comments
-
Sohaib Asif almost 2 years
I am new to web scraping and i want to get the html of the page.But when i run the program i get html empty and console show the javascript
from bs4 import BeautifulSoup import requests import urllib url = "https://linkedin.com/company/1005" r = requests.get(url) html_content = r.text soup = BeautifulSoup(html_content,'html.parser') print (soup.prettify())