Access denied while scraping

13,967

Solution 1

import requests
from bs4 import BeautifulSoup as BS

url = 'https://www.size.co.uk/'
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
page = requests.get(url, headers=agent)
print (BS(page.content, 'lxml'))

Solution 2

try this :

  headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 
  Firefox/50.0'}
  source=requests.get(url, headers=headers).text
  print(source)
Share:
13,967
V.Anh
Author by

V.Anh

Updated on June 04, 2022

Comments

  • V.Anh
    V.Anh almost 2 years

    I want to create a script to go on to https://www.size.co.uk/featured/footwear/ and scrape the content but somehow when i run the script, i got access denied. Here is the code:

    from urllib import urlopen
    from bs4 import BeautifulSoup as BS
    url = urlopen('https://www.size.co.uk/')
    print BS(url, 'lxml')
    

    The output is

    <html><head>
    <title>Access Denied</title>
    </head><body>
    <h1>Access Denied</h1>
    
    You don't have permission to access "http://www.size.co.uk/" on this server.
    <p>
    Reference #18.6202655f.1498945327.11002828
    </p></body>
    </html>
    

    When i try it with other websites, the code works fine and also when i use Selenium, nothing happens but i still want to know how to bypass this error without using Selenium. But when i use Selenium on different website like http://www.footpatrol.co.uk/shop i got the same Access Denied error, here is the code for footpatrol:

    from selenium import webdriver
    
    driver = webdriver.PhantomJS('C:\Users\V\Desktop\PY\web_scrape\phantomjs.exe')
    driver.get('http://www.footpatrol.com')
    pageSource = driver.page_source
    soup = BS(pageSource, 'lxml')
    print soup
    

    Output is:

    <html><head>
    <title>Access Denied</title>
    </head><body>
    <h1>Access Denied</h1>
    
    You don't have permission to access "http://www.footpatrol.co.uk/" on this 
    server.<p>
    Reference #18.6202655f.1498945644.110590db
    
    
    </p></body></html>