Python get raises HTTPError 400 Client Error, but after manually accessing URL, get works temporarily
HTTPError: 400 Client Error: Bad Request
means the request you made has error. And I think the server may check some headers in the HTTP request, for example the user-agent
.
So I tried setting the User-Agent header to mimic Firefox:
# No User-Agent
>>> _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203082, 'Season':'2015-16', 'SeasonType':'Regular Season'})
>>> _get.raise_for_status()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\requests\models.py", line 840, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://stats.nba.com/stats/playergamelog?PlayerID=203082&Season=2015-16&SeasonType=Regular+Season
# This time, set user-agent to mimic a desktop browser
>>> headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0'}
>>> _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203082, 'Season':'2015-16', 'SeasonType':'Regular Season'}, headers=headers)
>>> _get.raise_for_status()
>>>
# no error
The reason it can work after you visiting the URL in browser is caching.
According to Alastair McCormack, stats.nba.com
is fronted by Akamai CDN, so the caching is probably happening at the edge, "varied" by the query string/URI rather than extranous headers. Once a valid response has been made for that URI, it is cached by the CDN edge node serving that client.
So when you run code after visited url in browser, CDN will return you the cached response. no 400 will be raised in such situation.
andingo
Updated on June 24, 2022Comments
-
andingo almost 2 years
When I run this code in iPython (Python 2.7):
from requests import get _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203083, 'Season':'2015-16', 'SeasonType':'Regular Season'}) print _get.url _get.raise_for_status() _get.json()
I am getting:
http://stats.nba.com/stats/playergamelog?PlayerID=203083&Season=2015-16&SeasonType=Regular+Season --------------------------------------------------------------------------- HTTPError Traceback (most recent call last) <ipython-input-5-8f8343b2c4cd> in <module>() 1 _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203083, 'Season':'2015-16', 'SeasonType':'Regular Season'}) 2 print _get.url ----> 3 _get.raise_for_status() 4 _get.json() /Library/Python/2.7/site-packages/requests/models.pyc in raise_for_status(self) 849 850 if http_error_msg: --> 851 raise HTTPError(http_error_msg, response=self) 852 853 def close(self): HTTPError: 400 Client Error: Bad Request
However, if I go to the url in my browser, it works. Then, when I come back to the code and run it again after manually visiting the URL in my browser (Chrome which iPython is running in), the code runs with no error. However, it may go back to raising the error in sequential executions.
This code has worked for me hundreds if not thousands of times with no issue. How do I fix this error?
Thanks.