urllib2.urlopen() vs urllib.urlopen() - urllib2 throws 404 while urllib works! WHY?

21,035

That URL does indeed result in a 404, but with lots of HTML content. urllib2 is handling it (correctly) as an error condition. You can recover the content of that site's 404 page like so:

import urllib2
try:
    print urllib2.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read()
except urllib2.HTTPError, e:
    print e.code
    print e.msg
    print e.headers
    print e.fp.read()
Share:
21,035
Admin
Author by

Admin

Updated on July 09, 2022

Comments

  • Admin
    Admin almost 2 years
    import urllib
    
    print urllib.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read()
    

    The above script works and returns the expected results while:

    import urllib2
    
    print urllib2.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read()
    

    throws the following error:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.5/urllib2.py", line 124, in urlopen
        return _opener.open(url, data)
      File "/usr/lib/python2.5/urllib2.py", line 387, in open
        response = meth(req, response)
      File "/usr/lib/python2.5/urllib2.py", line 498, in http_response
        'http', request, response, code, msg, hdrs)
      File "/usr/lib/python2.5/urllib2.py", line 425, in error
        return self._call_chain(*args)
      File "/usr/lib/python2.5/urllib2.py", line 360, in _call_chain
        result = func(*args)
      File "/usr/lib/python2.5/urllib2.py", line 506, in http_error_default
        raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
    urllib2.HTTPError: HTTP Error 404: Not Found
    

    Does anyone know why this is? I'm running this from laptop on my home network with no proxy settings - just straight from my laptop to the router then to the www.