Convert .xls/.xlsx spreadsheets to multiple .csv's based on a list

7,790

Solution 1

You can just put a loop inside another loop.

To avoid errors, don't use for with find results.

while IFS= read -r file; do
    while IFS= read -r sheet; do
        in2csv --sheet "$sheet" "$file" > "${file%.*}-${sheet}.csv"
    done < <(in2csv -n "$file")
done < <(find . -name '*.xls' -o -name '*.xlsx')

Solution 2

Skipping find and using bash:

shopt -s globstar  # enable recursive globbing
for f in **/*.xls{,x}  # for files ending in .xls or .xlsx
do
    in2csv -n "$f" |   # get the sheetnames
      xargs -I {} bash -c 'in2csv --sheet "$2" "$1" > "${1%.*}"-"$2".csv' _ "$f" {} # {} will be replaced with the sheetname
done

Solution 3

csvkit version > 1.0.2 has a builtin function to write all sheets:

--write-sheets: WRITE_SHEETS
                      The names of the Excel sheets to write to files, or
                      "-" to write all sheets.

So you could try the following:

find . -name '*.xls' -o -name '*.xlsx' -exec in2csv --write-sheets "-" {} \;

Note:

This seems not to work 100% as expected. But worth a try and as this is the first version with that option maybe in future versions the implementation is better/easier.

Share:
7,790

Related videos on Youtube

csheth
Author by

csheth

Updated on September 18, 2022

Comments

  • csheth
    csheth almost 2 years

    I need to convert all sheets of a single .xls/.xlsx file to a .csv. This will be done on all .xls files in all directories and sub-directories (recursively).

    Step 1: Get the sheetnames of all .xls into a .csv using:

    for file in $(find . -name '*.xls' -o -name '*.xlsx');do in2csv -n "$file" > ${file%.xls}-sheetnames-list.csv; done
    

    filename-sheetnames-list.csv can act as a list:

    sheetname1
    sheetname2
    sheetname3
    

    Step 2 : The code for converting a specific sheet into a .csv using in2csv is:

    in2csv --sheet "SHEETNAME" filename.xls > filename-SHEETNAME.csv
    

    How can I get every sheetname in a .xls/x and write every sheet separately for all directories containing a .xls/x ?

    in2csv --write-sheets "-" filename.xls > filename-sheet1.csv filename-sheet2.csv .... gives output only on sheet1.csv, not sure how to get all sheets from this.

    • dessert
      dessert over 6 years
      Why not just find every .xls{,x} and loop over every sheet using -exec?
    • terdon
      terdon over 6 years
      @glennjackman this is perfectly on topic here, just as it would be on Unix & Linux.
  • terdon
    terdon over 6 years
    @muru ah crap. You're absolutely right. I'd tested in an environment where the IFS had already been changed so of course it propagated downwards. Idiot. Thanks, edit reverted.
  • csheth
    csheth over 6 years
    this script looks elegant but its output is filename-{}.csv containing no data. I'm a novice and can't seem to find the error by editing the script and reading up. Some help?
  • csheth
    csheth over 6 years
    @RoVo the first option works fine. The second one however gives me no output or error. I am unsure why; for a single .xls in2csv --write-sheets "-" filename.xls > sheetname.csv gives only the first sheet. I don't know what additional info to add to write all sheets. That shall give us clues to correct your code.
  • muru
    muru over 6 years
    @ChintanSheth my bad, I'd forgotten the redirection would be outside xargs. Corrected, not as elegant now.
  • pLumo
    pLumo over 6 years
    xargs and > is evil :-P. That's why I prefer another loop, it's less error prone.
  • pLumo
    pLumo over 6 years
    did you update to that version 1.0.2 ? pip install csvkit -U. I think the way it works is not what you like, with the simple skript from 1st option you have more ways to control the output and the filenames etc.
  • muru
    muru over 6 years
    @RoVo I'd have usually gone for another loop too, just wanted to show another method here.
  • csheth
    csheth over 6 years
    still doesn't work with the update, and yes I'd prefer using a list than --write-sheets Maybe you can set this alternative option as another answer... I will accept the first option as the answer then. Thanks @RoVo
  • csheth
    csheth over 6 years
    This works now, however slightly slower than @RoVo answer.
  • pLumo
    pLumo over 6 years
    Maybe generally a good idea to have alternative options in another answer. Thanks, glad that I could help.