Convert a .xlsx (MS Excel) file to .csv on command line with semicolon separated fields

120,883

Solution 1

OpenOffice comes with the unoconv program to perform format conversions on the command line.

unoconv -f csv filename.xlsx

For more complex requirements, you can parse XLSX files with Spreadsheet::XLSX in Perl or openpyxl in Python. For example, here's a quickie script to print out a worksheet as a semicolon-separated CSV file (warning: untested, typed directly in the browser):

perl -MSpreadsheet::XLSX -e '
    $\ = "\n"; $, = ";";
    my $workbook = Spreadsheet::XLSX->new()->parse($ARGV[0]);
    my $worksheet = ($workbook->worksheets())[0];
    my ($row_min, $row_max) = $worksheet->row_range();
    my ($col_min, $col_max) = $worksheet->col_range();
    for my $row ($row_min..$row_max) {
        print map {$worksheet->get_cell($row,$_)->value()} ($col_min..$col_max);
    }
' filename.xlsx >filename.csv

Solution 2

https://github.com/dilshod/xlsx2csv

Worked well for me. About 85 MB XLSX file converted at about 3 minutes on a Mac Book Pro SSD.

Solution 3

I'm using Perl's xls2csv to convert xls files to csv.

Not sure tho if it works with xlsx too.

About:

It can't be comma separated unfortunately since some columns have commas in them

that's why quoting has been introduced:

1,2,"data,data, more data"

Solution 4

I use PHP. Just instal the PHPExel library from http://phpexcel.codeplex.com/ and probably you need XML functions too.

This is my code :

<?php

error_reporting(E_ALL);
date_default_timezone_set('Europe/London');

/** PHPExcel_IOFactory */

require_once '/home/markov/Downloads/1.7.6/Classes/PHPExcel/IOFactory.php';

$file="RIF394305.xlsx"; //PATH TO CSV FILE

// Check prerequisites

if (!file_exists($file)) {
    exit("Please run 06largescale.php first.\n");
}

$objReader = PHPExcel_IOFactory::createReader('Excel2003XML');

$objPHPExcel = $objReader->load($file);

$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV');

$objWriter->save(str_replace('.xlsx', '.csv',$file));
?>

You can revert the process or use different Excel/CSV format. Look at the different php files in the PHPExcel directory.

Share:
120,883

Related videos on Youtube

allrite
Author by

allrite

Updated on September 18, 2022

Comments

  • allrite
    allrite over 1 year

    I realize that this is not an entirely unix/linux related question. But since this is something I'll do on linux, I hope someone has an answer.

    I have an online excel file (.xlsx) which gets updated periodically (by someone else). I want to write a script and put it in as a cronjob in order to to process that excel sheet. But to do that, I need to convert that into a text file (so a .csv) with semicolon separated columns. It can't be comma separated unfortunately since some columns have commas in them. Is it at all possible to do this conversion from shell? I have Open office installed and I can do this by using its GUI, but want to know if it is possible to do this from command line. Thanks!

    PS: I have a Mac machine as well, so if some solution can work there, thats good as well. :)

  • allrite
    allrite over 12 years
    thanks for the tip, I will try that. I still prefer semicolon separated, since after the csv conversion, the file goes through awk scripts. And its just easier to pass semicolon as the field separator in awk. I could look for commas inside quotes to replace them with something else... now that's another question :)
  • allrite
    allrite over 12 years
    unoconv did not come with my OO, but I installed it and it works great (converts to comma separated file, not semicolon though)! Thanks! I will still need to figure out how I will get my fields that contain commas. But thanks anyways.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 12 years
    @allrite Oh, I'd missed the requirement of semicolons as separators. My suggestion to do the processing in Python or Perl still stands. But I've also added a script (untested) to convert to CSV with ; as the separator.
  • allrite
    allrite over 12 years
    Thanks! I used Spreadsheet::XLSX, but used the code in the CPAN link you provided. It works :)
  • allrite
    allrite over 12 years
    Thanks @neurino. I used Gilles method instead, but thanks for the reply anyways.
  • Madhan
    Madhan over 7 years
    Worked well in OSX $ python xlsx2csv.py -d ";" my.xlsx my.csv worked well and ability to define the deliminator, thank you +!
  • Matt Smeets
    Matt Smeets about 3 years
    This is not a viable option if you have a memory constraint. But still nice work :)