How to parse the website using Beautifulsoup

16,799

Problem is not BeautifulSoup but server which needs more information in requests to give you access to this page. Now it sends JavaScript code which redirects you to login page.

You need User-Agent header to get this page.

You can use http://httpbin.org/get to see User-Agent in your browser.

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0'}

url = "https://linkedin.com/company/1005"

r = requests.get(url, headers=headers)
print(r.text)

soup = BeautifulSoup(r.text, 'html.parser')
print(soup.prettify())
Share:
16,799
Sohaib Asif
Author by

Sohaib Asif

Updated on June 04, 2022

Comments

  • Sohaib Asif
    Sohaib Asif almost 2 years

    I am new to web scraping and i want to get the html of the page.But when i run the program i get html empty and console show the javascript

    from bs4 import BeautifulSoup
    import requests
    import urllib
    
    url = "https://linkedin.com/company/1005"
    
    r = requests.get(url)
    html_content = r.text
    soup = BeautifulSoup(html_content,'html.parser')
    print (soup.prettify())
    

    error