How do I read in a single column from an Excel spreadsheet?

10,952

Solution 1

The answer depends on whether you want to get the bounding range of the used cells or if you want to get the non-null values from a column.

Here's how you can efficiently get the non-null values from a column. Note that reading in the entire tempRange.Value property at once is MUCH faster than reading cell-by-cell, but the tradeoff is that the resulting array can use up much memory.

private static IEnumerable<object> GetNonNullValuesInColumn(_Application application, _Worksheet worksheet, string columnName)
{
    // get the intersection of the column and the used range on the sheet (this is a superset of the non-null cells)
    var tempRange = application.Intersect(worksheet.UsedRange, (Range) worksheet.Columns[columnName]);

    // if there is no intersection, there are no values in the column
    if (tempRange == null)
        yield break;

    // get complete set of values from the temp range (potentially memory-intensive)
    var value = tempRange.Value2;

    // if value is NULL, it's a single cell with no value
    if (value == null)
        yield break;

    // if value is not an array, the temp range was a single cell with a value
    if (!(value is Array))
    {
        yield return value;
        yield break;
    }

    // otherwise, the value is a 2-D array
    var value2 = (object[,]) value;
    var rowCount = value2.GetLength(0);
    for (var row = 1; row <= rowCount; ++row)
    {
        var v = value2[row, 1];
        if (v != null)
            yield return v;
    }
}

Here's an efficient way to get the minimum range that contains the non-empty cells in a column. Note that I am still reading the entire set of tempRange values at once, and then I use the resulting array (if multi-cell range) to determine which cells contain the first and last values. Then I construct the bounding range after having figured out which rows have data.

private static Range GetNonEmptyRangeInColumn(_Application application, _Worksheet worksheet, string columnName)
{
    // get the intersection of the column and the used range on the sheet (this is a superset of the non-null cells)
    var tempRange = application.Intersect(worksheet.UsedRange, (Range) worksheet.Columns[columnName]);

    // if there is no intersection, there are no values in the column
    if (tempRange == null)
        return null;

    // get complete set of values from the temp range (potentially memory-intensive)
    var value = tempRange.Value2;

    // if value is NULL, it's a single cell with no value
    if (value == null)
        return null;

    // if value is not an array, the temp range was a single cell with a value
    if (!(value is Array))
        return tempRange;

    // otherwise, the temp range is a 2D array which may have leading or trailing empty cells
    var value2 = (object[,]) value;

    // get the first and last rows that contain values
    var rowCount = value2.GetLength(0);
    int firstRowIndex;
    for (firstRowIndex = 1; firstRowIndex <= rowCount; ++firstRowIndex)
    {
        if (value2[firstRowIndex, 1] != null)
            break;
    }
    int lastRowIndex;
    for (lastRowIndex = rowCount; lastRowIndex >= firstRowIndex; --lastRowIndex)
    {
        if (value2[lastRowIndex, 1] != null)
            break;
    }

    // if there are no first and last used row, there is no used range in the column
    if (firstRowIndex > lastRowIndex)
        return null;

    // return the range
    return worksheet.Range[tempRange[firstRowIndex, 1], tempRange[lastRowIndex, 1]];
}

Solution 2

If you don't mind losing the empty rows completely:

var nonEmptyRanges = myRange.Cast<Excel.Range>()
    .Where(r => !string.IsNullOrEmpty(r.Text))
foreach (var r in nonEmptyRanges)
{
    // handle the r
    MessageBox.Show(r.Text);
}
Share:
10,952
trueCamelType
Author by

trueCamelType

I change stacks every couple of months, so I feel like a mid-level developer for life. I love learning new things, and happened to land in a career field that allows me to try lots of different things regularly. Trumpet player, Lumberjack, Hobbyist.

Updated on June 26, 2022

Comments

  • trueCamelType
    trueCamelType almost 2 years

    I'm trying to read a single column from an Excel document. I'd like to read the entire column, but obviously only store the cells that have data. I also would like to try and handle the case, where a cell(s) in the column are empty, but it will read in later cell values if there's something farther down in the column. For example:

    | Column1 |
    |---------|
    |bob      |
    |tom      |
    |randy    |
    |travis   |
    |joe      |
    |         |
    |jennifer |
    |sam      |
    |debby    |
    

    If I had that column, I don't mind having a value of "" for the row after joe, but I do want it to keep getting values after the blank cell. However, I do not want it to go on for 35,000 lines past debby assuming debby is the last value in the column.

    It is also safe to assume that this will always be the first column.

    So far, I have this:

    Excel.Application myApplication = new Excel.Application();
    myApplication.Visible = true;
    Excel.Workbook myWorkbook = myApplication.Workbooks.Open("C:\\aFileISelect.xlsx");
    Excel.Worksheet myWorksheet = myWorkbook.Sheets["aSheet"] as Excel.Worksheet;
    Excel.Range myRange = myWorksheet.get_Range("A:A", Type.Missing);
    
    foreach (Excel.Range r in myRange)
    {
        MessageBox.Show(r.Text);
    }
    

    I've found lots of examples from older versions of .NET that do similar things, but not exactly this, and wanted to make sure I did something that's more modern (assuming the method one would use to do this has changed some amount).

    My current code reads the entire column, but includes blank cells after the last value.


    EDIT1

    I liked Isedlacek's answer below, but I do have a problem with it, that I'm not certain is specific to his code. If I use it in this way:

    Excel.Application myApplication = new Excel.Application();
    myApplication.Visible = true;
    Excel.Workbook myWorkbook = myApplication.Workbooks.Open("C:\\aFileISelect.xlsx");
    Excel.Worksheet myWorksheet = myWorkbook.Sheets["aSheet"] as Excel.Worksheet;
    Excel.Range myRange = myWorksheet.get_Range("A:A", Type.Missing);
    
    var nonEmptyRanges = myRange.Cast<Excel.Range>()
    .Where(r => !string.IsNullOrEmpty(r.Text));
    
    foreach (var r in nonEmptyRanges)
    {
        MessageBox.Show(r.Text);
    }
    
    MessageBox.Show("Finished!");
    

    the Finished! MessageBox never shows. I'm not sure why that happens, but it appears to never actually finish searching. I tried adding a counter to the loop to see if it was just continuously searching through the column, but it doesn't appear to be ... it appears to just stop.

    Where the Finished! MessageBox is, I tried to just close the workbook and spreadsheet, but that code never ran (as expected, since the MessageBox never ran).

    If I close the Excel spreadsheet manually, I get a COMException:

    COMException was unhandled by user code
    Additional information: Exception from HRESULT: 0x803A09A2

    Any ideas?