How to read SharePoint Online (Office365) Excel files in Python with Work or School Account?

57,719

Solution 1

As suggested by Niels V try using the Office365-REST-Python-Client.

The client implements the Sharepoint REST API. Here's an example of what you are trying to do:

from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File

url = 'https://yoursharepointsite.com/sites/documentsite'
username = 'yourusername'
password = 'yourpassword'
relative_url = '/sites/documentsite/Documents/filename.xlsx'

This section is straight from the github README.md using the ClientContext approach and gets you authenticated on your SharePoint server

ctx_auth = AuthenticationContext(url)
if ctx_auth.acquire_token_for_user(username, password):
  ctx = ClientContext(url, ctx_auth)
  web = ctx.web
  ctx.load(web)
  ctx.execute_query()
  print "Web title: {0}".format(web.properties['Title'])

else:
  print ctx_auth.get_last_error()

If you just want to download the file then using File.open_binary() all you need is:

filename = 'filename.xlsx'
with open(filename, 'wb') as output_file:
    response = File.open_binary(ctx, relative_url)
    output_file.write(response.content)

However if you want to analyze the contents of the file you can download the file to memory then directly use Pandas or your python '.xlsx' tool of choice:

import io
import pandas as pd

response = File.open_binary(ctx, relative_url)

#save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) #set file object to start

#read file into pandas dataframe
df = pd.read_excel(bytes_file_obj)

You can take it from here. I hope this helps!

Solution 2

To read the file from the command line, you can do the following:

curl -O -L --ntlm  --user username:password "https://yoursharepointsite.com/sites/documentsite/sites/documentsite/Documents/filename.xlsx"

The simplest method, for automating this with python, is based on request_nmtl:

conda install requests_ntlm --channel conda-forge

Code to download filename.xlsx from Sharepoint (python 3):

# Paste here the path to your file on sharepoint
url = 'https://yoursharepointsite.com/sites/documentsite/sites/documentsite/Documents/filename.xlsx'

import getpass

domain = 'ADMIN' # adapt to your domain to in which the user exists
user = getpass.getuser() 
pwd = getpass.getpass(prompt='What is your windows AD password?')

import requests
from requests_ntlm import HttpNtlmAuth
from urllib.parse import unquote
from pathlib import Path

filename = unquote(Path(url).name)

resp = requests.get(url, auth=HttpNtlmAuth(f'{domain}\\{user}', pwd ))
open(filename, 'wb').write(resp.content)
Share:
57,719
hsluoyz
Author by

hsluoyz

Updated on October 23, 2021

Comments

  • hsluoyz
    hsluoyz over 2 years

    I am a university student and I have registered as an Office 365 Education user via my university Email address. I usually log into https://www.office.com with my Email account: [email protected]. The path to my profile is like: https://abcedu-my.sharepoint.com/personal/alice_abc_edu

    I have an Excel (.xlsx) file in my Office 365. And I want to use Python to programmatically access (or download) the Excel file. I have googled about some solutions. But most of them require a NTLM credential. But I only have my Email account and password. I don't know my NTLM credential. Is it [email protected] or alice_abc_edu? Or the Email username and NTLM are totally different authentication ways. And I can't use NTLM?

    It seems that my Email address that is used to log in is officially called Work or School Account or Azure Active Directory Credential. But I don't know how to use such an account to realize my requirement? Moreover, I need to do it in Python. RESTful would also be OK. But I just got stuck in the first authentication step. Thanks!

    I have followed the Microsoft Graph tutorial here and it told me to register a Python app. Then I got a App ID and App Secret. But when I use the official python-sample-send-mail

    """send-email sample for Microsoft Graph"""
    # Copyright (c) Microsoft. All rights reserved. Licensed under the MIT license.
    # See LICENSE in the project root for license information.
    import base64
    import mimetypes
    import os
    import pprint
    import uuid
    
    import flask
    from flask_oauthlib.client import OAuth
    
    import config
    
    APP = flask.Flask(__name__, template_folder='static/templates')
    APP.debug = True
    APP.secret_key = 'development'
    OAUTH = OAuth(APP)
    MSGRAPH = OAUTH.remote_app(
        'microsoft',
        consumer_key=config.CLIENT_ID,
        consumer_secret=config.CLIENT_SECRET,
        request_token_params={'scope': config.SCOPES},
        base_url=config.RESOURCE + config.API_VERSION + '/',
        request_token_url=None,
        access_token_method='POST',
        access_token_url=config.AUTHORITY_URL + config.TOKEN_ENDPOINT,
        authorize_url=config.AUTHORITY_URL + config.AUTH_ENDPOINT)
    
    @APP.route('/')
    def homepage():
        """Render the home page."""
        return flask.render_template('homepage.html')
    
    @APP.route('/login')
    def login():
        """Prompt user to authenticate."""
        flask.session['state'] = str(uuid.uuid4())
        return MSGRAPH.authorize(callback=config.REDIRECT_URI, state=flask.session['state'])
    
    @APP.route('/login/authorized')
    def authorized():
        """Handler for the application's Redirect Uri."""
        if str(flask.session['state']) != str(flask.request.args['state']):
            raise Exception('state returned to redirect URL does not match!')
        response = MSGRAPH.authorized_response()
        flask.session['access_token'] = response['access_token']
        return flask.redirect('/mailform')
    
    @APP.route('/mailform')
    def mailform():
        """Sample form for sending email via Microsoft Graph."""
    
        # read user profile data
        user_profile = MSGRAPH.get('me', headers=request_headers()).data
        user_name = user_profile['displayName']
    
        # get profile photo
        photo_data, _, profile_pic = profile_photo(client=MSGRAPH, save_as='me')
        # save photo data as config.photo for use in mailform.html/mailsent.html
        if profile_pic:
            config.photo = base64.b64encode(photo_data).decode()
        else:
            profile_pic = 'static/images/no-profile-photo.png'
            with open(profile_pic, 'rb') as fhandle:
                config.photo = base64.b64encode(fhandle.read()).decode()
    
        # upload profile photo to OneDrive
        upload_response = upload_file(client=MSGRAPH, filename=profile_pic)
        if str(upload_response.status).startswith('2'):
            # create a sharing link for the uploaded photo
            link_url = sharing_link(client=MSGRAPH, item_id=upload_response.data['id'])
        else:
            link_url = ''
    
        body = flask.render_template('email.html', name=user_name, link_url=link_url)
        return flask.render_template('mailform.html',
                                     name=user_name,
                                     email=user_profile['userPrincipalName'],
                                     profile_pic=profile_pic,
                                     photo_data=config.photo,
                                     link_url=link_url,
                                     body=body)
    
    @APP.route('/send_mail')
    def send_mail():
        """Handler for send_mail route."""
        profile_pic = flask.request.args['profile_pic']
    
        response = sendmail(client=MSGRAPH,
                            subject=flask.request.args['subject'],
                            recipients=flask.request.args['email'].split(';'),
                            body=flask.request.args['body'],
                            attachments=[flask.request.args['profile_pic']])
    
        # show results in the mailsent form
        response_json = pprint.pformat(response.data)
        response_json = None if response_json == "b''" else response_json
        return flask.render_template('mailsent.html',
                                     sender=flask.request.args['sender'],
                                     email=flask.request.args['email'],
                                     profile_pic=profile_pic,
                                     photo_data=config.photo,
                                     subject=flask.request.args['subject'],
                                     body_length=len(flask.request.args['body']),
                                     response_status=response.status,
                                     response_json=response_json)
    
    @MSGRAPH.tokengetter
    def get_token():
        """Called by flask_oauthlib.client to retrieve current access token."""
        return (flask.session.get('access_token'), '')
    
    def request_headers(headers=None):
        """Return dictionary of default HTTP headers for Graph API calls.
        Optional argument is other headers to merge/override defaults."""
        default_headers = {'SdkVersion': 'sample-python-flask',
                           'x-client-SKU': 'sample-python-flask',
                           'client-request-id': str(uuid.uuid4()),
                           'return-client-request-id': 'true'}
        if headers:
            default_headers.update(headers)
        return default_headers
    
    def profile_photo(*, client=None, user_id='me', save_as=None):
        """Get profile photo.
    
        client  = user-authenticated flask-oauthlib client instance
        user_id = Graph id value for the user, or 'me' (default) for current user
        save_as = optional filename to save the photo locally. Should not include an
                  extension - the extension is determined by photo's content type.
    
        Returns a tuple of the photo (raw data), content type, saved filename.
        """
        endpoint = 'me/photo/$value' if user_id == 'me' else f'users/{user_id}/$value'
        photo_response = client.get(endpoint)
        if str(photo_response.status).startswith('2'):
            # HTTP status code is 2XX, so photo was returned successfully
            photo = photo_response.raw_data
            metadata_response = client.get(endpoint[:-7]) # remove /$value to get metadata
            content_type = metadata_response.data.get('@odata.mediaContentType', '')
        else:
            photo = ''
            content_type = ''
    
        if photo and save_as:
            extension = content_type.split('/')[1]
            if extension == 'pjpeg':
                extension = 'jpeg' # to correct known issue with content type
            filename = save_as + '.' + extension
            with open(filename, 'wb') as fhandle:
                fhandle.write(photo)
        else:
            filename = ''
    
        return (photo, content_type, filename)
    
    def sendmail(*, client, subject=None, recipients=None, body='',
                 content_type='HTML', attachments=None):
        """Helper to send email from current user.
    
        client       = user-authenticated flask-oauthlib client instance
        subject      = email subject (required)
        recipients   = list of recipient email addresses (required)
        body         = body of the message
        content_type = content type (default is 'HTML')
        attachments  = list of file attachments (local filenames)
    
        Returns the response from the POST to the sendmail API.
        """
    
        # Verify that required arguments have been passed.
        if not all([client, subject, recipients]):
            raise ValueError('sendmail(): required arguments missing')
    
        # Create recipient list in required format.
        recipient_list = [{'EmailAddress': {'Address': address}}
                          for address in recipients]
    
        # Create list of attachments in required format.
        attached_files = []
        if attachments:
            for filename in attachments:
                b64_content = base64.b64encode(open(filename, 'rb').read())
                mime_type = mimetypes.guess_type(filename)[0]
                mime_type = mime_type if mime_type else ''
                attached_files.append( \
                    {'@odata.type': '#microsoft.graph.fileAttachment',
                     'ContentBytes': b64_content.decode('utf-8'),
                     'ContentType': mime_type,
                     'Name': filename})
    
        # Create email message in required format.
        email_msg = {'Message': {'Subject': subject,
                                 'Body': {'ContentType': content_type, 'Content': body},
                                 'ToRecipients': recipient_list,
                                 'Attachments': attached_files},
                     'SaveToSentItems': 'true'}
    
        # Do a POST to Graph's sendMail API and return the response.
        return client.post('me/microsoft.graph.sendMail',
                           headers=request_headers(),
                           data=email_msg,
                           format='json')
    
    def sharing_link(*, client, item_id, link_type='view'):
        """Get a sharing link for an item in OneDrive.
    
        client    = user-authenticated flask-oauthlib client instance
        item_id   = the id of the DriveItem (the target of the link)
        link_type = 'view' (default), 'edit', or 'embed' (OneDrive Personal only)
    
        Returns the sharing link.
        """
        endpoint = f'me/drive/items/{item_id}/createLink'
        response = client.post(endpoint,
                               headers=request_headers(),
                               data={'type': link_type},
                               format='json')
    
        if str(response.status).startswith('2'):
            # status 201 = link created, status 200 = existing link returned
            return response.data['link']['webUrl']
    
    def upload_file(*, client, filename, folder=None):
        """Upload a file to OneDrive for Business.
    
        client  = user-authenticated flask-oauthlib client instance
        filename = local filename; may include a path
        folder = destination subfolder/path in OneDrive for Business
                 None (default) = root folder
    
        File is uploaded and the response object is returned.
        If file already exists, it is overwritten.
        If folder does not exist, it is created.
    
        API documentation:
        https://developer.microsoft.com/en-us/graph/docs/api-reference/v1.0/api/driveitem_put_content
        """
        fname_only = os.path.basename(filename)
    
        # create the Graph endpoint to be used
        if folder:
            # create endpoint for upload to a subfolder
            endpoint = f'me/drive/root:/{folder}/{fname_only}:/content'
        else:
            # create endpoint for upload to drive root folder
            endpoint = f'me/drive/root/children/{fname_only}/content'
    
        content_type, _ = mimetypes.guess_type(fname_only)
        with open(filename, 'rb') as fhandle:
            file_content = fhandle.read()
    
        return client.put(endpoint,
                          headers=request_headers({'content-type': content_type}),
                          data=file_content,
                          content_type=content_type)
    
    if __name__ == '__main__':
        APP.run()
    

    It gave me an error:

    AADSTS65005: Using application 'My Python App' is currently not supported for your organization abc.edu because it is in an unmanaged state. An administrator needs to claim ownership of the company by DNS validation of abc.edu before the application My Python App can be provisioned. Request ID: 9a4874e0-7f8f-4eff-b6f9-9834765d8780, Timestamp: 01/25/2018 13:51:10 Trace ID: 8d1cc38e-3b5e-4bf1-a003-bda164e00b00 Correlation ID: 2033267e-98ec-4eb1-91e9-c0530ef97fb1 Timestamp: 2018-01-25 13:51:10Z&state=d94af98c-92d9-4016-b3da-afd8e8974f4b HTTP/1.1

    So it seems that the IT admin of my university doesn't enable the functionality of connecting an App with Microsoft Graph. But is this way the only way? I already have the valid Email account and password. I think there must be a way for me to log in Office 365 programmatically directly with my credential?

  • zwornik
    zwornik about 4 years
    When I run it as You suggested, I got an error: "NameError: name 'Path' is not defined". Why is "unquote" needed? I see it comes from library to parse URL. In my case Excel file is on Sharepoint but URL does not end with {file name} + {.xlsx}. Ending is some sort of random numbers.
  • cast42
    cast42 about 4 years
    I forgot to import Path, I correct this in the edit by adding "from pathlib import Path" The unquote is needed for special characters in the filename. Suppose the document is called "my report.xlsx". In the url this will show up as "my%20report.xlsx". Hence the file that you write to your filesystem will have as name "my%20report.xlsx". Most people prefer to see "my report.xslx", hence the unquote function to translates the %20 code for a space back to a real space in the filename.
  • zwornik
    zwornik about 4 years
    Thanks for additional info. Excel file is now being downloaded but it is corrupted or incomplete. I ma not able to open it.
  • cast42
    cast42 about 4 years
    How does the url looks like? Does it end with .xlsx or does it end with .xlsx?web=... or something. In latter case, the Python code downloads the webpage instead of the xlsx file.
  • zwornik
    zwornik about 4 years
    URL ends exactly like that: .../documents/Track_Changes_Testing.xlsx
  • cast42
    cast42 almost 4 years
    If you paste the url in a browser, does it download the .xlsx file ? If not, the URL is wrong. If you get a webpage, that what you probably downloaded as Excel file and could not open.
  • zwornik
    zwornik almost 4 years
    URL used in browser immediatley downloads Excel file . Dowloaded file is OK
  • Admin
    Admin almost 4 years
    I found I needed to pass just the host to AuthenticationContext, but the site to the ClientContext, i.e., ctx_auth = AuthenticationContext('https://yoursharepointsite.com') but ctx = ClientContext(url, ctx_auth)
  • yingnan liu
    yingnan liu almost 4 years
    @Dan , this looks great! I have a quick question on "url = 'yoursharepointsite.com/sites/documentsite' ", when I put my the url for my company's sharepoint, I got an error "AADSTS53003: Access has been blocked by Conditional Access policies. The access policy does not allow token issuance.". Does it mean I need to talk to my company's IT to unblock it ? thanks !!!
  • cast42
    cast42 about 3 years
    Does the curl command downloads your file correctly?
  • cookies
    cookies about 3 years
    Starting from version 2.2.0 File module is expected to be imported like this: from office365.sharepoint.files.file import File should this answer be edited to reflect version change? github reference
  • MarioTheHedgehog
    MarioTheHedgehog about 3 years
    thanks for .write(resp.content). .write(resp.text) was driving me crazy.
  • Erik Johnsson
    Erik Johnsson over 2 years
    Got ValueError: An error occurred while retrieving token from XML response: AADSTS50059: No tenant-identifying information found in either the request or implied by any provided credentials.