xls to csv converter

167,466

Solution 1

I would use xlrd - it's faster, cross platform and works directly with the file.

As of version 0.8.0, xlrd reads both XLS and XLSX files.

But as of version 2.0.0, support was reduced back to only XLS.

import xlrd
import csv

def csv_from_excel():
    wb = xlrd.open_workbook('your_workbook.xls')
    sh = wb.sheet_by_name('Sheet1')
    your_csv_file = open('your_csv_file.csv', 'wb')
    wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)

    for rownum in xrange(sh.nrows):
        wr.writerow(sh.row_values(rownum))

    your_csv_file.close()

Solution 2

I would use pandas. The computationally heavy parts are written in cython or c-extensions to speed up the process and the syntax is very clean. For example, if you want to turn "Sheet1" from the file "your_workbook.xls" into the file "your_csv.csv", you just use the top-level function read_excel and the method to_csv from the DataFrame class as follows:

import pandas as pd
data_xls = pd.read_excel('your_workbook.xls', 'Sheet1', index_col=None)
data_xls.to_csv('your_csv.csv', encoding='utf-8')

Setting encoding='utf-8' alleviates the UnicodeEncodeError mentioned in other answers.

Solution 3

Maybe someone find this ready-to-use piece of code useful. It allows to create CSVs from all spreadsheets in Excel's workbook.

enter image description here

Python 2:

# -*- coding: utf-8 -*-
import xlrd
import csv
from os import sys
 
def csv_from_excel(excel_file):
    workbook = xlrd.open_workbook(excel_file)
    all_worksheets = workbook.sheet_names()
    for worksheet_name in all_worksheets:
        worksheet = workbook.sheet_by_name(worksheet_name)
        with open(u'{}.csv'.format(worksheet_name), 'wb') as your_csv_file:
            wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
            for rownum in xrange(worksheet.nrows):
                wr.writerow([unicode(entry).encode("utf-8") for entry in worksheet.row_values(rownum)])

if __name__ == "__main__":
    csv_from_excel(sys.argv[1])

Python 3:

import xlrd
import csv
from os import sys

def csv_from_excel(excel_file):
    workbook = xlrd.open_workbook(excel_file)
    all_worksheets = workbook.sheet_names()
    for worksheet_name in all_worksheets:
        worksheet = workbook.sheet_by_name(worksheet_name)
        with open(u'{}.csv'.format(worksheet_name), 'w', encoding="utf-8") as your_csv_file:
            wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
            for rownum in range(worksheet.nrows):
                wr.writerow(worksheet.row_values(rownum))

if __name__ == "__main__":
    csv_from_excel(sys.argv[1])

Solution 4

I'd use csvkit, which uses xlrd (for xls) and openpyxl (for xlsx) to convert just about any tabular data to csv.

Once installed, with its dependencies, it's a matter of:

python in2csv myfile > myoutput.csv

It takes care of all the format detection issues, so you can pass it just about any tabular data source. It's cross-platform too (no win32 dependency).

Solution 5

First read your excel spreadsheet into pandas, below code will import your excel spreadsheet into pandas as a OrderedDict type which contain all of your worksheet as dataframes. Then simply use worksheet_name as a key to access specific worksheet as a dataframe and save only required worksheet as csv file by using df.to_csv(). Hope this will workout in your case.

import pandas as pd
df = pd.read_excel('YourExcel.xlsx', sheet_name=None)
df['worksheet_name'].to_csv('YourCsv.csv')  

If your Excel file contain only one worksheet then simply use below code:

import pandas as pd
df = pd.read_excel('YourExcel.xlsx')
df.to_csv('YourCsv.csv') 

If someone want to convert all the excel worksheets from single excel workbook to the different csv files, try below code:

import pandas as pd
def excelTOcsv(filename):
    df = pd.read_excel(filename, sheet_name=None)  
    for key, value in df.items(): 
        return df[key].to_csv('%s.csv' %key)

This function is working as a multiple Excel sheet of same excel workbook to multiple csv file converter. Where key is the sheet name and value is the content inside sheet.

Share:
167,466
Lalit Chattar
Author by

Lalit Chattar

I am technical consultant and working on java/j2ee technology. Apart from this i am also interested in learning mobile phone technology like android.

Updated on July 09, 2022

Comments

  • Lalit Chattar
    Lalit Chattar almost 2 years

    I am using win32.client in python for converting my .xlsx and .xls file into a .csv. When I execute this code it's giving an error. My code is:

    def convertXLS2CSV(aFile):
        '''converts a MS Excel file to csv w/ the same name in the same directory'''
    
        print "------ beginning to convert XLS to CSV ------"
    
        try:
            import win32com.client, os
            from win32com.client import constants as c
            excel = win32com.client.Dispatch('Excel.Application')
    
            fileDir, fileName = os.path.split(aFile)
            nameOnly = os.path.splitext(fileName)
            newName = nameOnly[0] + ".csv"
            outCSV = os.path.join(fileDir, newName)
            workbook = excel.Workbooks.Open(aFile)
            workbook.SaveAs(outCSV, c.xlCSVMSDOS) # 24 represents xlCSVMSDOS
            workbook.Close(False)
            excel.Quit()
            del excel
    
            print "...Converted " + nameOnly + " to CSV"
        except:
            print ">>>>>>> FAILED to convert " + aFile + " to CSV!"
    
    convertXLS2CSV("G:\\hello.xlsx")
    

    I am not able to find the error in this code. Please help.

  • kuujo
    kuujo over 11 years
    Shouldn't it be wr.writerow(sh.row_values(rownum))? See here.
  • sharafjaffri
    sharafjaffri over 11 years
    Does it support datetime conversion from xls datmode to normal datetime
  • Javier Novoa C.
    Javier Novoa C. almost 10 years
    just a couple of annotations: some worksheets may be empty. I don't see no utility on generating empty CSV files, better do a previous evaluation on worksheet.nrows > 0 before doing anythin.
  • Javier Novoa C.
    Javier Novoa C. almost 10 years
    also, it would be better to use contexts for the CSV file ;)
  • Stew
    Stew over 9 years
    I believe it's best practice to wrap the for loop in a try..finally, with your_csv_file.close() called within the finally block. Something to do with closing the file regardless of whether an error is thrown during access?
  • duhaime
    duhaime about 9 years
    You can skip empty sheets with if worksheet.nrows == 0: continue
  • Li-aung Yip
    Li-aung Yip almost 9 years
    If you don't know the name of the sheet (i.e. it's not Sheet1) then you can use wb.sheet_by_index(0) to get the first sheet, regardless of its name.
  • Stew
    Stew almost 9 years
    CAUTION: this approach will not preserve Excel formatting of certain numbers. Integer-formatted numeric values will be written in decimal form (e.g. 2 -> 2.0), integer-formatted formulas will also be written in decimal form (e.g. =A1/B2 shows as 1 but exports as 0.9912319), and leading zeroes of text-formatted numeric values will be stripped (e.g. "007" -> "7.0"). Good luck querying for Mr. Bond in your database of secret agents! If you are lucky, these issues will crop up in obvious failures. If you are not lucky, they could silently poison your data.
  • Muhammad Shauket
    Muhammad Shauket over 7 years
    it does not work in case if you have some other languages text in rows.it shows ??? in text
  • CodeFarmer
    CodeFarmer over 7 years
    @philE This is too slow. Use xlsx2csv
  • devforfu
    devforfu almost 7 years
    Like this tool also. Not quite relevant to this question, but I've met a mention of this csvkit thing in this book alongside with some other data processing utils that allow you to transform data right inside of your shell.
  • Raghav
    Raghav almost 7 years
    any tips on handling newline characters that might be in excel cell contents ?
  • Orhan Yazar
    Orhan Yazar almost 7 years
    I'm getting File "<ipython-input-24-5fa644cde9f8>", line 15, in <module> csv_from_excel("Analyse Article Lustucru PF.xlsx") File "<ipython-input-24-5fa644cde9f8>", line 6, in csv_from_excel with open('{}.csv'.format(worksheet_name), 'wb') as your_csv_file: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 2: ordinal not in range(128) do you know how to deal with it ?
  • andilabs
    andilabs over 6 years
    @OrhanYazar try with u'{}.csv'.format(worksheet_name) notice u in the beginning standing for unciode
  • Sailanarmo
    Sailanarmo almost 6 years
    I have done the same, and I get the same garbage as well. Do you know of a solution to this?
  • user1632812
    user1632812 almost 6 years
    sorry, I forgot what I did back then. I learned that that's not a random number, that the internal representation that Excel uses or datetimes. So there's an algoritm to get a proper datetime back.
  • user1632812
    user1632812 almost 6 years
    I can't be more precise tough, sorry
  • binarymason
    binarymason over 5 years
    any suggestions on what to do instead, @Stew?
  • Aviral Srivastava
    Aviral Srivastava about 5 years
    with your code, i am getting an error: >>> dfs = pd.read_excel(file_name, sheet_name=None) >>> dfs.columns = dfs.columns.str.strip() Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'collections.OrderedDict' object has no attribute 'columns'
  • Tyler Hitzeman
    Tyler Hitzeman about 4 years
    for python 3: use your_csv_file = open(xls_path, 'w') (not 'wb'). the csv module takes input in text mode, not bytes mode. Otherwise, you'll get: TypeError: a bytes-like object is required, not 'str'