Programmatically extract data from an Excel spreadsheet
Solution 1
Maybe xlrd will do the Job (in Python)
edit: I should really learn to read questions. But writing csv shouldn't be a huge problem so maybe you can actually use it.
Solution 2
for ruby, the spreadsheet gem is excellent to read write modify, ...excell files
https://github.com/zdavatz/spreadsheet
Solution 3
You can use pyexcelerator in python.
This code (included in the examples
folder of pyexcelerator as xls2csv.py
) extracts all sheets from the spreadsheets and outputs them to stdout
as CSV.
You can easily change the code to do what you want.
The cool thing about pyexcelerator is that you can also use it to write/create excel xls files, without having excel installed.
#!/usr/bin/env python
# -*- coding: windows-1251 -*-
# Copyright (C) 2005 Kiseliov Roman
__rev_id__ = """$Id: xls2csv.py,v 1.1 2005/05/19 09:27:42 rvk Exp $"""
from pyExcelerator import *
import sys
me, args = sys.argv[0], sys.argv[1:]
if args:
for arg in args:
print >>sys.stderr, 'extracting data from', arg
for sheet_name, values in parse_xls(arg, 'cp1251'): # parse_xls(arg) -- default encoding
matrix = [[]]
print 'Sheet = "%s"' % sheet_name.encode('cp866', 'backslashreplace')
print '----------------'
for row_idx, col_idx in sorted(values.keys()):
v = values[(row_idx, col_idx)]
if isinstance(v, unicode):
v = v.encode('cp866', 'backslashreplace')
else:
v = str(v)
last_row, last_col = len(matrix), len(matrix[-1])
while last_row < row_idx:
matrix.extend([[]])
last_row = len(matrix)
while last_col < col_idx:
matrix[-1].extend([''])
last_col = len(matrix[-1])
matrix[-1].extend([v])
for row in matrix:
csv_row = ','.join(row)
print csv_row
else:
print 'usage: %s (inputfile)+' % me
Solution 4
This is quite late to the game, but I thought I'd add another option via Ruby using the gem "roo":
require 'rubygems' require 'roo' my_excel_file = Excelx.new("path/to/my_excel_file.xlsx") my_excel_file.default_sheet = my_excel_file.sheets[2] my_excel_file.to_csv("path/to/my_excel_file.csv")
Solution 5
In Ruby, here is the code I use: (requires the excellent parseexcel gem) require 'parseexcel'
def excelGetSheet(worksheet)
sheet=Array.new
worksheet.each { |row|
if row != nil # empty row?
cells=Array.new
j=0
row.each { |cell|
cells << cell.to_s('latin1') unless cell == nil
j=j+1
}
sheet << cells
end
}
return sheet
end
workbook = Spreadsheet::ParseExcel.parse("MyExcelFile.xls")
sheet1 = excelGetSheet(workbook.worksheet(0))
puts sheet1.inspect
dreeves
Startup: Beeminder.com Blog: MessyMatters.com Homepage: Dreev.es Twitter.com/dreev Favorite programming language: Mathematica Random fact: Dreeves is an ultra-marathon inline skater
Updated on June 17, 2022Comments
-
dreeves almost 2 years
Is there a simple way, using some common Unix scripting language (Perl/Python/Ruby) or command line utility, to convert an Excel Spreadsheet file to CSV? Specifically, this one:
http://www.econ.yale.edu/~shiller/data/ie_data.xls
And specifically the third sheet of that spreadsheet (the first two being charts).
-
John Machin over 14 yearspyXLreader appears not to be have been maintained since 2005, has an astonishing outstanding bug in the sourceforge tracker (omits fractions, all numbers presented as integers) plus a few more problems I found in a few minutes of playing with it (crashes on large files, sometimes prints "FCUKDATE" (no kidding!) when there are dates in the data, time taken is at least 6 times that of xlrd and is possibly O(N**2), memory used is at least 7 times that of xlrd). Do you use it very much yourself?
-
John Machin over 14 yearsWhat are those 'cp1251' and 'cp866' doing in there? What will happen if there are dates in the data? Why does it use alist.extend([something]) instead of alist.append(something)? Why does it use ','.join(row) instead of using the csv module?? Are you serious???
-
nosklo over 14 years@John: I just copied the example of pyexcelerator. I think the point of the example is reading the xls file, not the csv generation part. It works. You can modify and improve it as needed.
-
John Machin over 14 yearsA review? I could't find such a thing using google. Please post the link.
-
dreeves over 14 yearsRight, outputting as CSV is no big deal. Thanks!
-
dreeves over 14 yearsOh, I didn't mean it as a Language Wars question or anything. Just needed to get that task done and was happy to use whatever language made it easy.
-
poseid about 14 yearsmore explanations is also found here: codeweblog.com/parseexcel-ruby-used-to-read-excel
-
Alfabravo over 11 yearsAnd how to choose third sheet?
-
vikjam over 11 yearsAh fair point. Before the "to_csv" line, set the default sheet to the third sheet, i.e. my_excel_file.default_sheet = my_excel_file.sheets[2]