Read a big Excel document

12,486

Solution 1

Here is a method that relies on using Microsoft.Office.Interop.Excel.

Please Note: The Excel file I used had only one column with data with 50,000 entries.

1) Open the file with Excel, save it as csv, and close Excel.

2) Use StreamReader to quickly read the data.

3) Split the data on carriage return line feed and add it to a string list.

4) Delete the csv file I created.

I used System.Diagnostics.StopWatch to time the execution and it took 1.5568 seconds for the function to run.

public static List<string> ExcelReader( string fileLocation )
{                       
    Microsoft.Office.Interop.Excel.Application excel = new Application();
    Microsoft.Office.Interop.Excel.Workbook workBook =
        excel.Workbooks.Open(fileLocation);
    workBook.SaveAs(
        fileLocation + ".csv",
        Microsoft.Office.Interop.Excel.XlFileFormat.xlCSVWindows
    );
    workBook.Close(true);
    excel.Quit();
    List<string> valueList = null;
    using (StreamReader sr = new StreamReader(fileLocation + ".csv")) {
        string content = sr.ReadToEnd();
        valueList = new List<string>(
            content.Split(
                new string[] {"\r\n"},
                StringSplitOptions.RemoveEmptyEntries
            )
        );
    }
    new FileInfo(fileLocation + ".csv").Delete();
    return valueList;
}

Resources:

http://www.codeproject.com/Articles/5123/Opening-and-Navigating-Excel-with-C

How to split strings on carriage return with C#?

Solution 2

Can you put your code for reading 50000 records using OLEDb provider. I have tried doing that, it took 4-5 seconds to read 50000 records with 3 columns. I have done in following way, just have a look, it may help you out. :)

       // txtPath.Text is the path to the excel file
        string conString = @"Provider=Microsoft.ACE.OLEDB.12.0;" + "Data Source=" + txtPath.Text + ";" + "Extended Properties=" + "\"" + "Excel 12.0;HDR=YES;" + "\"";

        OleDbConnection oleCon = new OleDbConnection(conString);

        OleDbCommand oleCmd = new OleDbCommand("SELECT field1, field2, field3 FROM [Sheet1$]", oleCon);

        DataTable dt = new DataTable();

        oleCon.Open(); 
        dt.Load(oleCmd.ExecuteReader());
        oleCon.Close();

If you can put your code here, so that I can try to rectify. :)

Solution 3

OLEDB will always take more time.

SQL Server 2005/2008 will make it faster.

For OLEDB connections, it takes 7 records per seconds while

For SQLServer , it takes 70 records per seconds.

There requires not much time in reading comma separated files, but time is required to insert the data.

I have literally experienced this thing.

Share:
12,486
MHeads
Author by

MHeads

Updated on July 19, 2022

Comments

  • MHeads
    MHeads almost 2 years

    I want to know what is the fastest way to read cells in Excel. I have an Excel file that contains 50000 rows and I wanna know how to read it fast. I just need to read the first column and with oledb connection it takes me like 15 seconds. Is there a faster way?

    Thanks

    • rene
      rene about 11 years
      Is 14 seconds fast enough? Can you skip oledb and convert the sheet to a csv fileand then read the lines from the file? How does your oledb query look like? Has that cell a lot of data? Is it excel OpenXml (aka xlsx)?
    • MHeads
      MHeads about 11 years
      Sorry the excel document is already in .csv.
    • Philipp Aumayr
      Philipp Aumayr about 11 years
      if it is already in csv, use a cvs reader: see here codeproject.com/Articles/9258/A-Fast-CSV-Reader
  • Belogix
    Belogix about 11 years
    How does this help? OP says they want to import CSV and not comparing with SQL Server?
  • Freelancer
    Freelancer about 11 years
    @Belogix I said, reading requires no time, time is required in inserting that file to database, I am already working on Stock exchange related project wherein i have to deal with large trade files in .csv format and i am doing same operation. Thats why i said it.
  • Sid Holland
    Sid Holland about 11 years
    @Freelancer Granted, but OP made no implication that this would be inserted into a database. Just wants to "read cells in Excel". I don't see how SQLServer even comes into the equation.
  • Bubo
    Bubo over 10 years
    Just wanted to say thank you for this, drastically improved my program
  • Chinh Phan
    Chinh Phan over 8 years
    For a excel file with 50k lines and they want to pick just the first column, this method has been beaten by copying to csv file and using stream reader. But if they want to pick few more columns in a excel file with a lot of columns. I believe people will come with this one.
  • mchar
    mchar almost 7 years
    I think that you can equally use this code string line = "" line = sr.ReadLine() instead of worrying about the carriage return line feed