How to force ADO.Net to use only the System.String DataType in the readers TableSchema

11,337

Solution 1

As you have discovered, OLEDB uses Jet which is limited in the manner in which it can be tweaked. If you are set on using an OleDbConnection to read from an Excel file, then you need to set the HKLM\...\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows value to zero so that the system will scan the entire resultset.

That said, if you are open to using an alternative engine to read from an Excel file, you might consider trying the ExcelDataReader. It reads all columns as strings but will let you use dataReader.Getxxx methods to get typed values. Here's a sample that fills a DataSet:

DataSet result;
const string path = @"....\Test.xlsx";
using ( var fileStream = new FileStream( path, FileMode.Open, FileAccess.Read ) )
{
    using ( var excelReader = ExcelReaderFactory.CreateOpenXmlReader( fileStream ) )
    {
        excelReader.IsFirstRowAsColumnNames = true;
        result = excelReader.AsDataSet();
    }
}

Solution 2

Note for 64bit OS it is here:

My Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel

Solution 3

Check out the final answer on this page.


Just noticed the page you refer to says the same thing ...


Update:

The problem seems to be with the JET engine itself and not ADO. Once JET decides on the type, it sticks to it. Anything done after that has no effect; like casting the values to string in the SQL (e.g. Cstr([Column])) just results in an empty string being returned.

At this point (if there are no other answers) I'd opt for other methods: modifying the spreadsheet; modifying registry (not ideal since you will be messing with the settings for every other app the uses JET); Excel automation or a third party component that does not use JET.

If Automation option is to slow then maybe just use it to save the spreadsheet in a different format which is easier to handle.

Share:
11,337
Keith Sirmons
Author by

Keith Sirmons

OO VB/C#, learning SharePoint Developer.

Updated on June 05, 2022

Comments

  • Keith Sirmons
    Keith Sirmons about 2 years

    I am using an OleDbConnection to query an Excel 2007 Spreadsheet. I want force the OleDbDataReader to use only string as the column datatype.

    The system is looking at the first 8 rows of data and inferring the data type to be Double. The problem is that on row 9 I have a string in that column and the OleDbDataReader is returning a Null value since it could not be cast to a Double.

    I have used these connection strings:

    Provider=Microsoft.ACE.OLEDB.12.0;Data Source="ExcelFile.xlsx";Persist Security Info=False;Extended Properties="Excel 12.0;IMEX=1;HDR=No"

    Provider=Microsoft.Jet.OLEDB.4.0;Data Source="ExcelFile.xlsx";Persist Security Info=False;Extended Properties="Excel 8.0;HDR=No;IMEX=1"

    Looking at the reader.GetSchemaTable().Rows[7].ItemArray[5], it's dataType is Double.

    Row 7 in this schema correlates with the specific column in Excel I am having issues with. ItemArray[5] is its DataType column

    Is it possible to create a custom TableSchema for the reader so when accessing the ExcelFiles, I can treat all cells as text instead of letting the system attempt to infer the datatype?


    I found some good info at this page: Tips for reading Excel spreadsheets using ADO.NET

    The main quirk about the ADO.NET interface is how datatypes are handled. (You'll notice I've been carefully avoiding the question of which datatypes are returned when reading the spreadsheet.) Are you ready for this? ADO.NET scans the first 8 rows of data, and based on that guesses the datatype for each column. Then it attempts to coerce all data from that column to that datatype, returning NULL whenever the coercion fails!

    Thank you,
    Keith


    Here is a reduced version of my code:

    using (OleDbConnection connection = new OleDbConnection(BuildConnectionString(dataMapper).ToString()))
    {
        connection.Open();
        using (OleDbCommand cmd = new OleDbCommand())
        {
            cmd.Connection = connection;
            cmd.CommandText = SELECT * from [Sheet1$];
            using (OleDbDataReader reader = cmd.ExecuteReader())
            {
                using (DataTable dataTable = new DataTable("TestTable"))
                {
                    dataTable.Load(reader);
                    base.SourceDataSet.Tables.Add(dataTable);
                }
            }
        }
    }
    
  • Keith Sirmons
    Keith Sirmons about 14 years
    Yep. This is the same information as the link I provided and it doesn't work for my situation. Thank you, Keith
  • Keith Sirmons
    Keith Sirmons about 14 years
    Thank you, I am testing it out now. It does have some issues where several cells that do have text are being seen as null.
  • Keith Sirmons
    Keith Sirmons about 14 years
    I am using ExcelDataReader v.2.0.1.0 but have the same issue as stackoverflow.com/questions/2249023. I applied the pending patch from exceldatareader.codeplex.com/Project/Download/… and this appears to have fixed the error. The ExcelDataReader is a working solution for my original issue. Thank you
  • Keith Sirmons
    Keith Sirmons about 14 years
    Found more issues with the ExcelDataReader. Details and fix here: exceldatareader.codeplex.com/WorkItem/View.aspx?WorkItemId=5‌​910
  • Thomas
    Thomas about 14 years
    @Keith Sirmons - Right on. Thanks!
  • aruno
    aruno almost 11 years
    then set TypeGuessRows to 0
  • Akmal Salikhov
    Akmal Salikhov over 7 years
    How can I specify particular column type using ExcelDataReader? I have Excel sheet which one column is contain date values, but it's type is string. I need to convert that string to date when generatin DataSet...
  • Thomas
    Thomas over 7 years
    @AkmalSalikhov - You should post that as its own question.