Scraper in Python gives "Access Denied"

10,932

Solution 1

As was mentioned in comments, you need to specify allowable user-agent and pass it as headers:

def extract_source(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}
    source=requests.get(url, headers=headers).text
    return source

Solution 2

Try this:

import bs4
import requests

def extract_source(url):
     agent = {"User-Agent":"Mozilla/5.0"}
     source=requests.get(url, headers=agent).text
     return source

def extract_data(source):
     soup=bs4.BeautifulSoup(source, 'lxml')
     names=soup.findAll('title')
     for i in names:
     print i

extract_data(extract_source('https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585'))

I added 'lxml' to potentially avoid parse error.

Share:
10,932
duca
Author by

duca

Updated on June 29, 2022

Comments

  • duca
    duca almost 2 years

    I'm trying to code a scraper in Python to get some info from a page. Like the title of the offers that appear on this page:
    https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585

    By now I use this code :

    import bs4
    import requests
    
    def extract_source(url):
        source=requests.get(url).text
        return source
    
    def extract_data(source):
        soup=bs4.BeautifulSoup(source)
        names=soup.findAll('title')
        for i in names:
            print i
    
    extract_data(extract_source('https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585'))
    

    But when I execute this code, it gives me an error:

    <titlee> Access Denied</titlee>
    

    What can I do to solve this?