Appending Pandas DataFrame to existing Excel document

27,344

You can use with:

with pd.ExcelWriter('test.xlsx', engine='openpyxl', mode='a') as writer:
    d1.to_excel(writer,sheet_name='d1')
    d2.to_excel(writer,sheet_name='d2')
    writer.save()

writer.close()

update

This should work just note that the a blank file needs to be created before hand. You can just create a blank file using python if you want. I created a simple loop to, in some ways, mimic the essence of what you are trying to accomplish:

import pandas as pd
from openpyxl import load_workbook

d1 = pd.DataFrame({"A":['Bob','Joe', 'Mark'], 
               "B":['5', '10', '20']})
d2 = pd.DataFrame({"A":['Jeffrey','Ann', 'Sue'], 
                "B":['1', '2', '3']})

dfs = [d1,d2]

for i in range(len(dfs)):
    sheet = 'd'+str(i+1)
    data = dfs[i]
    writer = pd.ExcelWriter('atest.xlsx',engine='openpyxl', mode='a')
    writer.book = load_workbook('atest.xlsx') # here is the difference
    data.to_excel(writer,sheet_name=sheet)
    writer.save()
    writer.close()

or here is the modified first example:

d1 = pd.DataFrame({"A":['Bob','Joe', 'Mark'], 
               "B":['5', '10', '20']})
d2 = pd.DataFrame({"A":['Jeffrey','Ann', 'Sue'], 
                "B":['1', '2', '3']})

writer = pd.ExcelWriter('atest.xlsx', engine='openpyxl', mode='w')
d1.to_excel(writer,sheet_name='d1')
writer.save()
writer.close()

writer = pd.ExcelWriter('atest.xlsx', engine='openpyxl', mode='a')
writer.book = load_workbook('atest.xlsx')
d2.to_excel(writer,sheet_name='d2')
writer.save()
writer.close()
Share:
27,344
enter_display_name_here
Author by

enter_display_name_here

Updated on February 28, 2020

Comments

  • enter_display_name_here
    enter_display_name_here about 4 years

    Per https://github.com/pandas-dev/pandas/pull/21251/files/09e5b456e1af5cde55f18f903ab90c761643b05a, we should be able to append DataFrames to new XLSX sheets.

    Based on the documentation, I tried the following:

    >>> import pandas as pd
    >>>                
    ... d1 = pd.DataFrame({"A":['Bob','Joe', 'Mark'], 
    ...                "B":['5', '10', '20']})
    >>> d2 = pd.DataFrame({"A":['Jeffrey','Ann', 'Sue'], 
    ...                "B":['1', '2', '3']})
    >>> 
    >>> # Create XLSX document for ticker
    ... writer = pd.ExcelWriter('test.xlsx',engine='openpyxl')
    >>> d1.to_excel(writer,sheet_name='d1')
    >>> writer.save()
    >>> 
    >>> writer = pd.ExcelWriter('test.xlsx',engine='openpyxl', mode='a')
    >>> d2.to_excel(writer,sheet_name='d2')
    >>> writer.save()
    >>> 
    >>> pd.__version__
    '0.23.4'     # Just updated this per a comment
    >>> 
    >>> 
    

    The result is a single workbook named 'test.xlsx' with a single tab 'd2'.

    How can I prevent the workbook/sheet form being overwritten?

    • Poojan
      Poojan over 5 years
    • user3471881
      user3471881 over 5 years
      Sorry, a bit too quick to mark as duplicate. This was introduced in pandas version 0.23.1 so you need to upgrade.
    • enter_display_name_here
      enter_display_name_here over 5 years
      Thanks, I upgraded per your suggestion. I was using "pip3 install pandas" instead of adding "--upgrade" so I thought I had the latest version. Per your suggestion, I updated but am still getting the same result.
    • enter_display_name_here
      enter_display_name_here over 5 years
      Also, the other referenced <stackoverflow.com/questions/20219254/…> is for the function 'df.to_excel()', not 'pd.ExcelWriter'.
    • r.ook
      r.ook over 5 years
      Are you running this on a REPL? Make sure you restart your shell since the pandas module is loaded in memory with the older version.
    • enter_display_name_here
      enter_display_name_here over 5 years
      Based on my search, I am guessing that "REPL" is typing directly into the terminal. So since I am on a Mac, I am logged into the Terminal and entered commands one by one. I did two things per your recommendation (1) Closed all Terminal windows, then restarted Terminal; (2) Ran the script via the Terminal, i.e. "python3 test.py". The results from both cases are the same, unfortunately.
  • enter_display_name_here
    enter_display_name_here over 5 years
    I tried this but the result is comparable to xlsxwriter.readthedocs.io/example_pandas_multiple.html. It's a different engine but the workbook is opened only once then both tabs are written. I need to open/write/close then open/write/close. I tried your suggestion by putting d1.to_excel() and d2.to_excel() in separate with loops but the result was a single XLSX with a single tab. I also tried your suggestion but with setting mode='w'. It worked the same as mode='a'.
  • It_is_Chris
    It_is_Chris over 5 years
    @enter_display_name_here why do you need to open/write/close multiple times?
  • enter_display_name_here
    enter_display_name_here over 5 years
    I am building an extremely large dictionary of DataFrames and would like to write out each DataFrame as a tab as they get completed. I would hate to get 100+ DataFrames in and then the program experiences an issue and nothing is written out. Basically, I am scraping 4 websites, compiling different tables of data into a single DF, then doing that process over and over again. I'd like to generate an individual tab per DF as it goes through the program.
  • enter_display_name_here
    enter_display_name_here over 5 years
    Thanks for taking the time to put that together. Your update is actually the workaround to what was done prior to the addition of mode='a'. If you refer to line 1009 (here: github.com/pandas-dev/pandas/pull/21251/files/…), you'll notice that they added book = load_workbook(self.path) to their code. Your code works the same as with mode='w'. If I am unable to get mode='a' to work, then I will use this. I am hoping that their new code will check for an existing tab and then overwrite that.
  • It_is_Chris
    It_is_Chris over 5 years
    did you create the blank test.xlsx file first
  • enter_display_name_here
    enter_display_name_here over 5 years
    Yep, I tried that and it worked. I also just double-checked their code. At first, I thought a new workbook would be created if one doesn't exist, but I think that's incorrect. I think the code creates a new workbook if mode= is not 'a', not if a workbook does not exist.
  • It_is_Chris
    It_is_Chris over 5 years
    After looking at the code that is correct, a file is not created when mode=='a' they are just using load_workbook, which means the file needs to exist prior to running ExcelWriter with the param mode='a' if mode != a then they are creating a new workbook: self.book = Workbook()
  • Ishkatan
    Ishkatan about 5 years
    Just re-read and see that I answered how to append to the same tab but the question is about multiple tabs. I have a program similar to yours that writes multiple tabs successfully. I don't specify engine='openpyxl' and I don't do a writer.save() until the very end of the program - last step. Works though I could see limits if my data was too large.