How do you convert Excel to CSV using OpenXML SDK?

15,318

Solution 1

I don't think OpenXml is the right tool for this problem. I would recommend getting the data out of the sheet with an OleDbConnection and then into a csv file with this method.

Once you've got the data in a datatable in memory, you've got a lot more control over the situation.

Solution 2

you can use oledb connection and query the excel file, convert the rows to csv format and save the results to a file

here is a simple example i tested for this it creates a different csv file unicode encoded, tab delimited for each sheet in the excel file

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.OleDb;
using System.IO;
using System.Linq;
using System.Text;

namespace XlsTests
{
    class Program
    {
        static void Main(string[] args)
        {
            string _XlsConnectionStringFormat = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml;HDR=NO;IMEX=1\"";
            string xlsFilename = @"C:\test.xlsx";
            using (OleDbConnection conn = new OleDbConnection(string.Format(_XlsConnectionStringFormat, xlsFilename)))
            {
                try
                {
                    conn.Open();

                    string outputFilenameHeade = Path.GetFileNameWithoutExtension(xlsFilename);
                    string dir = Path.GetDirectoryName(xlsFilename);
                    string[] sheetNames = conn.GetSchema("Tables")
                                              .AsEnumerable()
                                              .Select(a => a["TABLE_NAME"].ToString())
                                              .ToArray();
                    foreach (string sheetName in sheetNames)
                    {
                        string outputFilename = Path.Combine(dir, string.Format("{0}_{1}.csv", outputFilenameHeade, sheetName));
                        using (StreamWriter sw = new StreamWriter(File.Create(outputFilename), Encoding.Unicode))
                        {
                            using (DataSet ds = new DataSet())
                            {
                                using (OleDbDataAdapter adapter = new OleDbDataAdapter(string.Format("SELECT * FROM [{0}]", sheetName), conn))
                                {
                                    adapter.Fill(ds);

                                    foreach (DataRow dr in ds.Tables[0].Rows)
                                    {
                                        string[] cells = dr.ItemArray.Select(a => a.ToString()).ToArray();
                                        sw.WriteLine("\"{0}\"", string.Join("\"\t\"", cells));
                                    }
                                }
                            }
                        }
                    }
                }
                catch (Exception exp)
                {
                    // handle exception
                }
                finally
                {
                    if (conn.State != ConnectionState.Open)
                    {
                        try
                        {
                            conn.Close();
                        }
                        catch (Exception ex)
                        {
                            // handle exception
                        }
                    }
                }
            }
        }
    }
}
Share:
15,318
TheSean
Author by

TheSean

Updated on June 05, 2022

Comments

  • TheSean
    TheSean almost 2 years

    I have a requirement to convert Excel (2010) files to csv. Currently I'm using Excel Interop to open and SaveAs csv, which works well. However the Interop has some issues in the environemt where we use it, so I'm looking for another solution.

    I found the way to work with Excel files without interop is to use the OpenXML SDK. I got some code together to itterate through all the cells in each sheet and simply writes them to another file in CSV.

    One problem I have is handling blank rows and cells. It seems that, with this code, blank rows and cells are completely non-existant so I have no way to know about them. Is there away to itterate through all rows and cells, including blanks?

    string filename = @"D:\test.xlsx";
    string outputDir = Path.GetDirectoryName(filename);
    //--------------------------------------------------------
    
    using (SpreadsheetDocument document = SpreadsheetDocument.Open(filename, false))
    {
    
        foreach (Sheet sheet in document.WorkbookPart.Workbook.Descendants<Sheet>())
        {
            WorksheetPart worksheetPart = (WorksheetPart) document.WorkbookPart.GetPartById(sheet.Id);
            Worksheet worksheet = worksheetPart.Worksheet;
    
            SharedStringTablePart shareStringPart = document.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();
            SharedStringItem[] items = shareStringPart.SharedStringTable.Elements<SharedStringItem>().ToArray();
    
            // Create a new filename and save this file out.
            if (string.IsNullOrWhiteSpace(outputDir))
                outputDir = Path.GetDirectoryName(filename);
            string newFilename = string.Format("{0}_{1}.csv", Path.GetFileNameWithoutExtension(filename), sheet.Name);
            newFilename = Path.Combine(outputDir, newFilename);
    
            using (var outputFile = File.CreateText(newFilename))
            {
                foreach (var row in worksheet.Descendants<Row>())
                {
                    StringBuilder sb = new StringBuilder();
                    foreach (Cell cell in row)
                    {
                        string value = string.Empty;
                        if (cell.CellValue != null)
                        {
                            // If the content of the first cell is stored as a shared string, get the text
                            // from the SharedStringTablePart. Otherwise, use the string value of the cell.
                            if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
                                value = items[int.Parse(cell.CellValue.Text)].InnerText;
                            else
                                value = cell.CellValue.Text;
                        }
    
                        // to be safe, always use double quotes.
                        sb.Append(string.Format("\"{0}\",", value.Trim()));
                    }
                    outputFile.WriteLine(sb.ToString().TrimEnd(','));
                }
            }
        }
    }
    

    If I have the following Excel file data:

    one,two,three
    ,,
    last,,row
    

    I will get the following CSV (which is wrong):

    one,two,three
    last,row