Python get raises HTTPError 400 Client Error, but after manually accessing URL, get works temporarily

17,542

HTTPError: 400 Client Error: Bad Request means the request you made has error. And I think the server may check some headers in the HTTP request, for example the user-agent.

So I tried setting the User-Agent header to mimic Firefox:

# No User-Agent
>>> _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203082, 'Season':'2015-16', 'SeasonType':'Regular Season'})
>>> _get.raise_for_status()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\requests\models.py", line 840, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://stats.nba.com/stats/playergamelog?PlayerID=203082&Season=2015-16&SeasonType=Regular+Season

# This time, set user-agent to mimic a desktop browser
>>> headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0'}
>>> _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203082, 'Season':'2015-16', 'SeasonType':'Regular Season'}, headers=headers)
>>> _get.raise_for_status()
>>>
# no error

The reason it can work after you visiting the URL in browser is caching.

According to Alastair McCormack, stats.nba.com is fronted by Akamai CDN, so the caching is probably happening at the edge, "varied" by the query string/URI rather than extranous headers. Once a valid response has been made for that URI, it is cached by the CDN edge node serving that client.

So when you run code after visited url in browser, CDN will return you the cached response. no 400 will be raised in such situation.

Share:
17,542
andingo
Author by

andingo

Updated on June 24, 2022

Comments

  • andingo
    andingo almost 2 years

    When I run this code in iPython (Python 2.7):

    from requests import get
    _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203083, 'Season':'2015-16', 'SeasonType':'Regular Season'})
    print _get.url
    _get.raise_for_status()
    _get.json()
    

    I am getting:

    http://stats.nba.com/stats/playergamelog?PlayerID=203083&Season=2015-16&SeasonType=Regular+Season
    ---------------------------------------------------------------------------
    HTTPError                                 Traceback (most recent call last)
    <ipython-input-5-8f8343b2c4cd> in <module>()
          1 _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203083, 'Season':'2015-16', 'SeasonType':'Regular Season'})
          2 print _get.url
    ----> 3 _get.raise_for_status()
          4 _get.json()
    
    /Library/Python/2.7/site-packages/requests/models.pyc in raise_for_status(self)
        849 
        850         if http_error_msg:
    --> 851             raise HTTPError(http_error_msg, response=self)
        852 
        853     def close(self):
    
    HTTPError: 400 Client Error: Bad Request
    

    However, if I go to the url in my browser, it works. Then, when I come back to the code and run it again after manually visiting the URL in my browser (Chrome which iPython is running in), the code runs with no error. However, it may go back to raising the error in sequential executions.

    This code has worked for me hundreds if not thousands of times with no issue. How do I fix this error?

    Thanks.