How can I convert a XLSB file to csv using python?

34,317

Solution 1

Most popular Excel python packages openpyxl and xlrd have no support for xlsb format (bug tracker entries: openpyxl, xlrd).

So I'm afraid there is no native python way =/. However, since you are using windows, it should be easy to script the task with external tools.

I would suggest taking look at Convert XLS to XLSB Programatically?. You mention python in title but the matter of the question does not imply you are strongly coupled to it, so you could go pure c# way.

If you feel really comfortable only with python one of the answers there suggests a command line tool under a fancy name of Convert-XLSB. You could script it as an external tool from python with subprocess.

I know this is not a good answer, but I don't think there is better/easier way as of now.

Solution 2

I've encountered this same problem and using pyxlsb does it for me:

from pyxlsb import open_workbook

with open_workbook('HugeDataFile.xlsb') as wb:
    for sheetname in wb.sheets:
        with wb.get_sheet(sheetname) as sheet:
            for row in sheet.rows():
                values = [r.v for r in row]  # retrieving content
                csv_line = ','.join(values)  # or do your thing

Solution 3

In my previous experience, i was handling converting xlsb using libreoffice command line utility,

In ruby i just execute system command to call libreoffice for converting xlsb format to csv:

`libreoffice --headless --convert-to csv your_xlsb_file.xlsb --outdir /path/csv`

and to change the encoding i use command line to using iconv, using ruby :

`iconv -f ISO-8859-1 -t UTF-8 your_csv_file.csv > new_file_csv.csv`

Solution 4

I also looked at the problem and the following worked for me. First opening the file in excel via python and than saving it to different file. Bit of a workaround but I like it more than other solutions. In example I use file format 6 which is CSV but you can also use other ones.

import win32com.client
excel = win32com.client.Dispatch("Excel.Application")
excel.DisplayAlerts = False
excel.Visible=False
doc = excel.Workbooks.Open("C:/users/A295998/Python/@TA1PROG3.xlsb")
doc.SaveAs(Filename="C:\\users\\A295998\\Python\\test5.csv",FileFormat=6)
doc.Close()
excel.Quit()

Solution 5

XLSB is a binary format and I don't think you'll be able to parse it with current python tools and packages. If you still want to somehow automate the process with python you can do what the others have told you and script that windows CLI tool. Calling the .exe from the command line with subprocess, and passing an array of the files you want to convert.

I.e: with a script similar to this one you could convert all the .xlsb files that you place in the "xlsb" folder to .csv format...

├── xlsb
│   ├── file1.xlsb
│   ├── file2.xlsb
│   └── file3.xlsb
└── xlsb_to_csv.py


xlsb_to_csv.py

#!/usr/bin/env python

import os

files = [f for f in os.listdir('./xlsb')]
for f in files:
    subprocess.call("ConvertXLS.EXE " + str(f) + " --arguments", shell=True)

Note: the Windows command is pseudocode... I use a similar approach to batch-convert stuff in headless windows servers for testing purpouses. You just have to figure out the exe location and the windows command...

Hope it helps... good luck!

Share:
34,317
IordanouGiannis
Author by

IordanouGiannis

Updated on August 05, 2021

Comments

  • IordanouGiannis
    IordanouGiannis almost 3 years

    I have been provided with a xlsb file full of data. I want to process the data using python. I can convert it to csv using excel or open office, but I would like the whole process to be more automated. Any ideas?

    Update: I took a look at this question and used the first answer:

    import subprocess
    subprocess.call("cscript XlsToCsv.vbs data.xlsb data.csv", shell=False)
    

    The issue is the file contains greek letters so the encoding is not preserved. Opening the csv with Notepad++ it looks as it should, but when I try to insert into a database comes like this ���. Opening the file as csv, just to read text is displayed like this: \xc2\xc5\xcb instead of ΒΕΛ.

    I realize it's an issue in encoding, but it's possible to retain the original encoding converting the xlsb file to csv ?

  • mpettis
    mpettis about 8 years
    Awesome, thank you. One correction -- change your_csv_file.csv to your_xlsb_file.xlsb.
  • FluxIX
    FluxIX over 7 years
    I might almost advocate for having the accepted answer changed to this, although this solution does require Excel to be installed on the machine running the script and I prefer platform-agnostic solutions (although to my knowledge there is not a platform-agnostic solution currently).
  • Hatt
    Hatt about 6 years
    Hey I know this is over a year old but this worked perfectly for me as for some reason a bunch of files I need to aggregate are all in xlsb. One question I have - I ran this code and it brought in the correct data from the third tab - which is what I was looking for coincidentally - how would I tell it which tab to use from each file?
  • Juraj Bezručka
    Juraj Bezručka about 6 years
    Works like a charm, IMO this should be the accepted answer.
  • PL200
    PL200 over 5 years
    Should be marked as answer. Best solution I've seen, and really the only decent way I've seen to process .XLSB files at all.
  • Sergio Lucero
    Sergio Lucero over 5 years
    Glad to help! Now I feel like I've contributed.
  • Joni Hoppen
    Joni Hoppen over 5 years
    This link seems to be no longer available.
  • Hofbr
    Hofbr almost 4 years
    I'm running into an issue when there are float values. Join doesn't like it.
  • walter
    walter about 3 years
    excelent, the only change I needed to do is: values = [str(r.v) for r in row]