Download an excel file and read content with azure functions

16,246

Solution 1

Open XML SDK works fine in Azure Function. I tested it on my side. Here is the full code.

#r "DocumentFormat.OpenXml.dll"
#r "WindowsBase.dll"

using System.Net;
using System.IO;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

public static HttpResponseMessage Run(HttpRequestMessage req, TraceWriter log)
{
    log.Info($"C# HTTP trigger function processed a request. RequestUri={req.RequestUri}");

    WebClient client = new WebClient();

    byte[] buffer = client.DownloadData("http://amor-webapp-test.azurewebsites.net/Content/hello.xlsx");
    MemoryStream stream = new MemoryStream();
    stream.Write(buffer, 0, buffer.Length);
    stream.Position = 0;
    using (SpreadsheetDocument doc = SpreadsheetDocument.Open(stream, false))
    {
        WorkbookPart workbookPart = doc.WorkbookPart;
        SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
        SharedStringTable sst = sstpart.SharedStringTable;

        WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
        Worksheet sheet = worksheetPart.Worksheet;

        var cells = sheet.Descendants<Cell>();
        var rows = sheet.Descendants<Row>();

        log.Info(string.Format("Row count = {0}", rows.LongCount()));
        log.Info(string.Format("Cell count = {0}", cells.LongCount()));

        // One way: go through each cell in the sheet
        foreach (Cell cell in cells)
        {
            if ((cell.DataType != null) && (cell.DataType == CellValues.SharedString))
            {
                int ssid = int.Parse(cell.CellValue.Text);
                string str = sst.ChildElements[ssid].InnerText;
                log.Info(string.Format("Shared string {0}: {1}", ssid, str));
            }
            else if (cell.CellValue != null)
            {
                log.Info(string.Format("Cell contents: {0}", cell.CellValue.Text));
            }
        }
    }

    return req.CreateResponse(HttpStatusCode.OK, "Hello ");
}

enter image description here

To use Open XML, please make sure you have created a bin folder under your function folder and uploaded DocumentFormat.OpenXml.dll and WindowsBase.dll to it.

"File contains corrupted data".

Have you tried another excel file to check whether the issue is related to specific excel file. I suggest you create a new simple excel to test your code again.

"It didn't work on my file with the same "File contains corrupted data" message. "

I download your excel file and found that it is a older version(.xls) of excel file.

To fixed the exception, you could convert the excel to latest version(.xlsx) or choose another excel parse library. ExcelDataReader could work for any versions of excel file. You could install this library using NuGet by searching 'ExcelDataReader'. Following is the sample code of how to parse .xls format excel file. I tested it on Azure Function, it did worked fine.

#r "Excel.dll"
#r "System.Data"

using System.Net;
using System.IO;
using Excel;
using System.Data;

public static HttpResponseMessage Run(HttpRequestMessage req, TraceWriter log)
{
    log.Info($"C# HTTP trigger function processed a request. RequestUri={req.RequestUri}");

    WebClient client = new WebClient();

    byte[] buffer = client.DownloadData("http://amor-webapp-test.azurewebsites.net/Content/abcdefg.xls");
    MemoryStream stream = new MemoryStream();
    stream.Write(buffer, 0, buffer.Length);
    stream.Position = 0;

    IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);

    DataSet result = excelReader.AsDataSet();

    for (int i = 0; i < result.Tables.Count; i++)
    {
        log.Info(result.Tables[i].TableName +" has " + result.Tables[i].Rows.Count + " rows.");
    }

    return req.CreateResponse(HttpStatusCode.OK, "Hello ");
}

Please add "Excel.dll" file to the bin folder of your function before executing upper code.

Solution 2

If you do need to save a temporary file, Azure Functions has a %TEMP% environment variable with a path to a temporary folder. This is a folder that is local to the vm that runs your function and will not be persisted.

However, saving the file locally / in Azure Files is unnecessary. You should be able to get the stream from the response to your get request and pass it straight to OpenXML.

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(originalExcelUrl);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (Stream stream = response.GetResponseStream()) 
{
    var doc = SpreadsheetDocument.Open(stream, true);
    // etc
}
Share:
16,246

Related videos on Youtube

donquijote
Author by

donquijote

Experienced multi-disciplinary scientist. Most of my background is in statistics, engineering and economics. With appreciation for quality I ended up preferring to develop my own code and this has attracted me to learn more computer science. Most of my development experience is with OOP in .NET framework. I have also used Matlab extensively for my quantitative applications. I have developed objective-C iOS apps and asp.net MVC web apps.

Updated on June 04, 2022

Comments

  • donquijote
    donquijote almost 2 years

    I am trying to write a C# Azure Function to download and open an excel file using the OpenXml-SDK.

    Office Interop doesn't work here because office is not available to the Azure Function.

    I am trying to use OpenXml-SDK to open and read the file which seems to require a path to the saved file and not the url or a Stream downloaded from the remote url.

    Given I don't know of a way to temporary store the excel file in Azure Functions, I used Azure File Storage.

    I uploaded the excel file from the url to Azure File Storage, however I cannot open the excel file with OpenXML-SDK.

    I tested the excel file in Azure File Storage is working, however, when I try to open the OpenXML.SpreadsheetDocument form a MemoryStream I get error indicating the file is corrupt.

    If I try to open the SpreadsheetDocument passing the file Uri (https://docs.microsoft.com/en-us/azure/storage/storage-dotnet-how-to-use-files#develop-with-file-storage) then the address passes the 260 character limit.

    I'm open to using a library other than OpenXML and ideally I would prefer not to have to store the excel file.

  • donquijote
    donquijote about 7 years
    Thanks, I tried the code you pasted, but it crashed. I used var doc = SpreadsheetDocument.Open(stream, false); ("true" didn't work given the originalExcelUrl is a link in a website). The error message was: "Cannot operate on a stream that does not support seeking". I agree I would prefer not to store the file on a temp location. Have you seen this error before? Very appreciated
  • donquijote
    donquijote about 7 years
    With the Temp variable approach I was able to store the file locally, but then trying to open in OpenXML I would get the error: "File contains corrupted data". However I see the file in the local Temp folder is an excel file I can open ok. I had also been able to open&read the file when I used office interop COM locally. In this case I just used webclient.DownloadFile(theRemoteUrl, theLocalTempFile) and then SpreadsheetDocument.Open(theLocalTempFile, false) Thanks again!
  • donquijote
    donquijote about 7 years
    Thanks for the detailed code. It didn't work on my file with the same "File contains corrupted data" message. At this point the Azure Function aspect of my issue is clearly solved. I had been able to download the file and open it in excel. I had also been able to download, read and parse it correctly with Office interop COM locally. At this point the question may be with OpenXML. It would be great if you could test with my file: www2.nationalgrid.com/WorkArea/DownloadAsset.aspx?id=8589936‌​879 or maybe if you would recommend an alternative to OpenXML. Thanks vm
  • Amor
    Amor about 7 years
    Thanks for your feedback. I updated my reply based on your comment.
  • Wilko van der Veen
    Wilko van der Veen over 6 years
    How about fonts which are not installed for this?