How to get round the HTTP Error 403: Forbidden with urllib.request using Python 3
14,146
Solution 1
This is probably due to mod_security. You need to spoof by opening the URL as a browser, not as python urllib.
Here, I corrected your code:
import urllib.request
url = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"
# Open the URL as Browser, not as python urllib
page=urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'})
infile=urllib.request.urlopen(page).read()
data = infile.decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1
print(data) # Print the data to the screen
Next, you can use BeautifulSoup to scrape the HTML.
Solution 2
You are being rate limited it seems. Try putting a sleep in and retrying. For example:
import urllib
import urllib.request
from time import sleep
LSE_URL = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"
WAIT_PERIOD = 15
def stock_data_reader():
stock_data = get_stock_data()
while True:
if not stock_data:
sleep(WAIT_PERIOD) # sleep for a while until next retry
stock_data = get_stock_data()
else:
break
print(stock_data) # do something with stock data
def get_stock_data():
try:
infile = urllib.request.urlopen(LSE_URL) # Open the URL
except urllib.error.HTTPError as http_err:
print("Error: %s" % http_err)
return None
else:
data = infile.read().decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1
return data
stock_data_reader()
Author by
Admin
Updated on June 04, 2022Comments
-
Admin almost 2 years
Hi not every time but sometimes when trying to gain access to the LSE code I am thrown the every annoying HTTP Error 403: Forbidden message.
Anyone know how I can overcome this issue only using standard python modules (so sadly no beautiful soup).
import urllib.request url = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html" infile = urllib.request.urlopen(url) # Open the URL data = infile.read().decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1 print(data) # Print the data to the screen
However every now and then this is the error I am shown:
Traceback (most recent call last): File "/home/ubuntu/workspace/programming_practice/Assessment/Summative/removingThe403Error.py", line 5, in <module> webpage = urlopen(req).read().decode('ISO-8859-1') File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.4/urllib/request.py", line 469, in open response = meth(req, response) File "/usr/lib/python3.4/urllib/request.py", line 579, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.4/urllib/request.py", line 507, in error return self._call_chain(*args) File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain result = func(*args) File "/usr/lib/python3.4/urllib/request.py", line 587, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden Process exited with code: 1
Link to a list of all the modules that are okay: https://docs.python.org/3.4/py-modindex.html
Many thanks in advance.