Getting a file from an authenticated site (with python urllib, urllib2)

13,068

Solution 1

Urllib is generally eschewed these days for Requests.

This would do what you want:

import requests
from requests.auth import HTTPBasicAuth

theurl= 'myLink_queriedResult/result.xls'
username = 'myUsername'
password = 'myPassword'

r=requests.get(theurl, auth=HTTPBasicAuth(username, password))

Here you can find more information on authentication using request.

Solution 2

You may try through this way with Python 3,

    import requests
    #import necessary Authentication Method 
    from requests_ntlm import HttpNtlmAuth
    from xlrd import open_workbook
    import pandas as pd
    from io import BytesIO
    r = requests.get("http://example.website",auth=HttpNtlmAuth('acc','password'))
    xd = pd.read_excel(BytesIO(r.content))

Ref:

  1. https://medium.com/ibm-data-science-experience/excel-files-loading-from-object-storage-python-a54a2cbf4609

  2. http://www.python-requests.org/en/latest/user/authentication/#basic-authentication

  3. Pandas read_csv from url
Share:
13,068
Admin
Author by

Admin

Updated on June 04, 2022

Comments

  • Admin
    Admin almost 2 years

    I'm trying to get a queried-excel file from a site. When I enter the direct link, it will lead to a login page and once I've entered my username and password, it will proceed to download the excel file automatically. I am trying to avoid installing additional module that's not part of the standard python (This script will be running on a "standardize machine" and it won't work if the module is not installed)

    I've tried the following but I see a "page login" information in the excel file itself :-|

    import urllib
    
    url = "myLink_queriedResult/result.xls"
    urllib.urlretrieve(url,"C:\\test.xls")
    

    SO.. then I looked into using urllib2 with password authentication but then I'm stuck.

    I have the following code:

    import urllib2
    import urllib
    
    theurl = 'myLink_queriedResult/result.xls'
    username = 'myName'
    password = 'myPassword'
    
    passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
    passman.add_password(None, theurl, username, password)
    
    authhandler = urllib2.HTTPBasicAuthHandler(passman)
    opener = urllib2.build_opener(authhandler)
    urllib2.install_opener(opener)
    pagehandle = urllib2.urlopen(theurl)
    pagehandle.read()  ##but seems like it still only contain a 'login page'   
    

    Appreciate any advice in advance. :)