Downloading contents of a Azure blob as a text string taking too long time

18,711

To speed up the process, one thing you could do is instead of reading the entire file in one go you read them in chunks. Take a look at DownloadRangeToStream method.

Essentially the idea is that you first create an empty file of 30 MB (size of your blob). Then in parallel you download 1MB (or whatever size you see fit) chunks using DownloadRangeToStream method. As and when these chunks are downloaded, you put the stream contents in appropriate places in the file.

I answered a similar question on SO a few days ago: StorageException when downloading a large file over a slow network. Take a look at my answer there. There the chunks are downloaded in sequence but it should give you some idea about how to implement chunked download.

Share:
18,711
Rohit
Author by

Rohit

Updated on June 14, 2022

Comments

  • Rohit
    Rohit almost 2 years

    I am developing an application that

    1. Upload a .CSV file on Azure blob storage from my local machine using simple HTTP web page (REST methods)

    2. Once, the .CSV file is uploaded, I fetch the stream in order to update my database

    The .CSV file is around 30 MB, it takes 2 minutes to upload to blob, but takes 30 minutes to read the stream. can you please provide inputs to improve the speed? Here is the code snippet being used to read stream from the file: https://azure.microsoft.com/en-in/documentation/articles/storage-dotnet-how-to-use-blobs/

    public string GetReadData(string filename)
            {
                // Retrieve storage account from connection string.
                CloudStorageAccount storageAccount = CloudStorageAccount.Parse(System.Web.Configuration.WebConfigurationManager.AppSettings["StorageConnectionString"]);
    
                // Create the blob client.
                CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
    
                // Retrieve reference to a previously created container.
                CloudBlobContainer container = blobClient.GetContainerReference(System.Web.Configuration.WebConfigurationManager.AppSettings["BlobStorageContainerName"]);
    
                // Retrieve reference to a blob named "filename"
                CloudBlockBlob blockBlob2 = container.GetBlockBlobReference(filename);
    
                string text;
                using (var memoryStream = new MemoryStream())
                {
                    blockBlob2.DownloadToStream(memoryStream);
                    text = System.Text.Encoding.UTF8.GetString(memoryStream.ToArray());
                }
    
                return text;
            }
    
  • Zhaoxing Lu
    Zhaoxing Lu over 8 years
    Gaurav has already answered the question perfectly, but personally I'd still suggest you not to place your application in your local machine. :) Honestly, taking 30min to download 30MB file in single thread is super terrible. Please consider moving your application into Azure (Web Role/Worker Role/Virtual Machine) or somewhere with better network environment.
  • GFoley83
    GFoley83 about 8 years
    @GauravMantri can you provide an example of how you would download in parallel please? I see you can set ParallelOperationThreadCount in BlobRequestOptions but this only works for uploads. Can't find any code on the topic. Thanks!
  • Gaurav Mantri
    Gaurav Mantri about 8 years
    @GFoley ... Did you take a look at the sample code I posted here: stackoverflow.com/questions/31128977/…?
  • GFoley83
    GFoley83 about 8 years
    I did. And up-voted it too. The problem is (as you mentioned above) it only demos how to upload the chunks in sequence, not in parallel.
  • Gaurav Mantri
    Gaurav Mantri about 8 years
    Aah .... I see (& thanks for upvoting :)). Can you please post a new question and I will try to hack some code for you and provide that as an answer?