xls to csv converter

python excel csv xls export-to-csv

167,466

Solution 1

I would use xlrd - it's faster, cross platform and works directly with the file.

As of version 0.8.0, xlrd reads both XLS and XLSX files.

But as of version 2.0.0, support was reduced back to only XLS.

import xlrd
import csv

def csv_from_excel():
    wb = xlrd.open_workbook('your_workbook.xls')
    sh = wb.sheet_by_name('Sheet1')
    your_csv_file = open('your_csv_file.csv', 'wb')
    wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)

    for rownum in xrange(sh.nrows):
        wr.writerow(sh.row_values(rownum))

    your_csv_file.close()

Solution 2

I would use pandas. The computationally heavy parts are written in cython or c-extensions to speed up the process and the syntax is very clean. For example, if you want to turn "Sheet1" from the file "your_workbook.xls" into the file "your_csv.csv", you just use the top-level function read_excel and the method to_csv from the DataFrame class as follows:

import pandas as pd
data_xls = pd.read_excel('your_workbook.xls', 'Sheet1', index_col=None)
data_xls.to_csv('your_csv.csv', encoding='utf-8')

Setting encoding='utf-8' alleviates the UnicodeEncodeError mentioned in other answers.

Solution 3

Maybe someone find this ready-to-use piece of code useful. It allows to create CSVs from all spreadsheets in Excel's workbook.

enter image description here

Python 2:

# -*- coding: utf-8 -*-
import xlrd
import csv
from os import sys
 
def csv_from_excel(excel_file):
    workbook = xlrd.open_workbook(excel_file)
    all_worksheets = workbook.sheet_names()
    for worksheet_name in all_worksheets:
        worksheet = workbook.sheet_by_name(worksheet_name)
        with open(u'{}.csv'.format(worksheet_name), 'wb') as your_csv_file:
            wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
            for rownum in xrange(worksheet.nrows):
                wr.writerow([unicode(entry).encode("utf-8") for entry in worksheet.row_values(rownum)])

if __name__ == "__main__":
    csv_from_excel(sys.argv[1])

Python 3:

import xlrd
import csv
from os import sys

def csv_from_excel(excel_file):
    workbook = xlrd.open_workbook(excel_file)
    all_worksheets = workbook.sheet_names()
    for worksheet_name in all_worksheets:
        worksheet = workbook.sheet_by_name(worksheet_name)
        with open(u'{}.csv'.format(worksheet_name), 'w', encoding="utf-8") as your_csv_file:
            wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
            for rownum in range(worksheet.nrows):
                wr.writerow(worksheet.row_values(rownum))

if __name__ == "__main__":
    csv_from_excel(sys.argv[1])

Solution 4

I'd use csvkit, which uses xlrd (for xls) and openpyxl (for xlsx) to convert just about any tabular data to csv.

Once installed, with its dependencies, it's a matter of:

python in2csv myfile > myoutput.csv

It takes care of all the format detection issues, so you can pass it just about any tabular data source. It's cross-platform too (no win32 dependency).

Solution 5

First read your excel spreadsheet into pandas, below code will import your excel spreadsheet into pandas as a OrderedDict type which contain all of your worksheet as dataframes. Then simply use worksheet_name as a key to access specific worksheet as a dataframe and save only required worksheet as csv file by using df.to_csv(). Hope this will workout in your case.

import pandas as pd
df = pd.read_excel('YourExcel.xlsx', sheet_name=None)
df['worksheet_name'].to_csv('YourCsv.csv')

If your Excel file contain only one worksheet then simply use below code:

import pandas as pd
df = pd.read_excel('YourExcel.xlsx')
df.to_csv('YourCsv.csv')

If someone want to convert all the excel worksheets from single excel workbook to the different csv files, try below code:

import pandas as pd
def excelTOcsv(filename):
    df = pd.read_excel(filename, sheet_name=None)  
    for key, value in df.items(): 
        return df[key].to_csv('%s.csv' %key)

This function is working as a multiple Excel sheet of same excel workbook to multiple csv file converter. Where key is the sheet name and value is the content inside sheet.

View more solutions

167,466

Author by

Lalit Chattar

I am technical consultant and working on java/j2ee technology. Apart from this i am also interested in learning mobile phone technology like android.

Updated on July 09, 2022

Comments

Lalit Chattar almost 2 years

I am using win32.client in python for converting my .xlsx and .xls file into a .csv. When I execute this code it's giving an error. My code is:

def convertXLS2CSV(aFile):
    '''converts a MS Excel file to csv w/ the same name in the same directory'''

    print "------ beginning to convert XLS to CSV ------"

    try:
        import win32com.client, os
        from win32com.client import constants as c
        excel = win32com.client.Dispatch('Excel.Application')

        fileDir, fileName = os.path.split(aFile)
        nameOnly = os.path.splitext(fileName)
        newName = nameOnly[0] + ".csv"
        outCSV = os.path.join(fileDir, newName)
        workbook = excel.Workbooks.Open(aFile)
        workbook.SaveAs(outCSV, c.xlCSVMSDOS) # 24 represents xlCSVMSDOS
        workbook.Close(False)
        excel.Quit()
        del excel

        print "...Converted " + nameOnly + " to CSV"
    except:
        print ">>>>>>> FAILED to convert " + aFile + " to CSV!"

convertXLS2CSV("G:\\hello.xlsx")

I am not able to find the error in this code. Please help.

kuujo over 11 years

Shouldn't it be wr.writerow(sh.row_values(rownum))? See here.
sharafjaffri over 11 years

Does it support datetime conversion from xls datmode to normal datetime
Javier Novoa C. almost 10 years

just a couple of annotations: some worksheets may be empty. I don't see no utility on generating empty CSV files, better do a previous evaluation on worksheet.nrows > 0 before doing anythin.
Javier Novoa C. almost 10 years

also, it would be better to use contexts for the CSV file ;)
Stew over 9 years

I believe it's best practice to wrap the for loop in a try..finally, with your_csv_file.close() called within the finally block. Something to do with closing the file regardless of whether an error is thrown during access?
duhaime about 9 years

You can skip empty sheets with if worksheet.nrows == 0: continue
Li-aung Yip almost 9 years

If you don't know the name of the sheet (i.e. it's not Sheet1) then you can use wb.sheet_by_index(0) to get the first sheet, regardless of its name.
Stew almost 9 years

CAUTION: this approach will not preserve Excel formatting of certain numbers. Integer-formatted numeric values will be written in decimal form (e.g. 2 -> 2.0), integer-formatted formulas will also be written in decimal form (e.g. =A1/B2 shows as 1 but exports as 0.9912319), and leading zeroes of text-formatted numeric values will be stripped (e.g. "007" -> "7.0"). Good luck querying for Mr. Bond in your database of secret agents! If you are lucky, these issues will crop up in obvious failures. If you are not lucky, they could silently poison your data.
Muhammad Shauket over 7 years

it does not work in case if you have some other languages text in rows.it shows ??? in text
CodeFarmer over 7 years

@philE This is too slow. Use xlsx2csv
devforfu almost 7 years

Like this tool also. Not quite relevant to this question, but I've met a mention of this csvkit thing in this book alongside with some other data processing utils that allow you to transform data right inside of your shell.
Raghav almost 7 years

any tips on handling newline characters that might be in excel cell contents ?
Orhan Yazar almost 7 years

I'm getting File "<ipython-input-24-5fa644cde9f8>", line 15, in <module> csv_from_excel("Analyse Article Lustucru PF.xlsx") File "<ipython-input-24-5fa644cde9f8>", line 6, in csv_from_excel with open('{}.csv'.format(worksheet_name), 'wb') as your_csv_file: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 2: ordinal not in range(128) do you know how to deal with it ?
andilabs over 6 years

@OrhanYazar try with u'{}.csv'.format(worksheet_name) notice u in the beginning standing for unciode
Sailanarmo almost 6 years

I have done the same, and I get the same garbage as well. Do you know of a solution to this?
user1632812 almost 6 years

sorry, I forgot what I did back then. I learned that that's not a random number, that the internal representation that Excel uses or datetimes. So there's an algoritm to get a proper datetime back.
user1632812 almost 6 years

I can't be more precise tough, sorry
binarymason over 5 years

any suggestions on what to do instead, @Stew?
Aviral Srivastava about 5 years

with your code, i am getting an error: >>> dfs = pd.read_excel(file_name, sheet_name=None) >>> dfs.columns = dfs.columns.str.strip() Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'collections.OrderedDict' object has no attribute 'columns'
Tyler Hitzeman about 4 years

for python 3: use your_csv_file = open(xls_path, 'w') (not 'wb'). the csv module takes input in text mode, not bytes mode. Otherwise, you'll get: TypeError: a bytes-like object is required, not 'str'