Appending Pandas DataFrame to existing Excel document
27,344
You can use with
:
with pd.ExcelWriter('test.xlsx', engine='openpyxl', mode='a') as writer:
d1.to_excel(writer,sheet_name='d1')
d2.to_excel(writer,sheet_name='d2')
writer.save()
writer.close()
update
This should work just note that the a blank file needs to be created before hand. You can just create a blank file using python if you want. I created a simple loop to, in some ways, mimic the essence of what you are trying to accomplish:
import pandas as pd
from openpyxl import load_workbook
d1 = pd.DataFrame({"A":['Bob','Joe', 'Mark'],
"B":['5', '10', '20']})
d2 = pd.DataFrame({"A":['Jeffrey','Ann', 'Sue'],
"B":['1', '2', '3']})
dfs = [d1,d2]
for i in range(len(dfs)):
sheet = 'd'+str(i+1)
data = dfs[i]
writer = pd.ExcelWriter('atest.xlsx',engine='openpyxl', mode='a')
writer.book = load_workbook('atest.xlsx') # here is the difference
data.to_excel(writer,sheet_name=sheet)
writer.save()
writer.close()
or here is the modified first example:
d1 = pd.DataFrame({"A":['Bob','Joe', 'Mark'],
"B":['5', '10', '20']})
d2 = pd.DataFrame({"A":['Jeffrey','Ann', 'Sue'],
"B":['1', '2', '3']})
writer = pd.ExcelWriter('atest.xlsx', engine='openpyxl', mode='w')
d1.to_excel(writer,sheet_name='d1')
writer.save()
writer.close()
writer = pd.ExcelWriter('atest.xlsx', engine='openpyxl', mode='a')
writer.book = load_workbook('atest.xlsx')
d2.to_excel(writer,sheet_name='d2')
writer.save()
writer.close()
Author by
enter_display_name_here
Updated on February 28, 2020Comments
-
enter_display_name_here about 4 years
Per https://github.com/pandas-dev/pandas/pull/21251/files/09e5b456e1af5cde55f18f903ab90c761643b05a, we should be able to append DataFrames to new XLSX sheets.
Based on the documentation, I tried the following:
>>> import pandas as pd >>> ... d1 = pd.DataFrame({"A":['Bob','Joe', 'Mark'], ... "B":['5', '10', '20']}) >>> d2 = pd.DataFrame({"A":['Jeffrey','Ann', 'Sue'], ... "B":['1', '2', '3']}) >>> >>> # Create XLSX document for ticker ... writer = pd.ExcelWriter('test.xlsx',engine='openpyxl') >>> d1.to_excel(writer,sheet_name='d1') >>> writer.save() >>> >>> writer = pd.ExcelWriter('test.xlsx',engine='openpyxl', mode='a') >>> d2.to_excel(writer,sheet_name='d2') >>> writer.save() >>> >>> pd.__version__ '0.23.4' # Just updated this per a comment >>> >>>
The result is a single workbook named 'test.xlsx' with a single tab 'd2'.
How can I prevent the workbook/sheet form being overwritten?
-
Poojan over 5 yearsPossible duplicate of How to write to an existing excel file without overwriting data (using pandas)?
-
user3471881 over 5 yearsSorry, a bit too quick to mark as duplicate. This was introduced in
pandas
version0.23.1
so you need to upgrade. -
enter_display_name_here over 5 yearsThanks, I upgraded per your suggestion. I was using "pip3 install pandas" instead of adding "--upgrade" so I thought I had the latest version. Per your suggestion, I updated but am still getting the same result.
-
enter_display_name_here over 5 yearsAlso, the other referenced <stackoverflow.com/questions/20219254/…> is for the function 'df.to_excel()', not 'pd.ExcelWriter'.
-
r.ook over 5 yearsAre you running this on a REPL? Make sure you restart your shell since the
pandas
module is loaded in memory with the older version. -
enter_display_name_here over 5 yearsBased on my search, I am guessing that "REPL" is typing directly into the terminal. So since I am on a Mac, I am logged into the Terminal and entered commands one by one. I did two things per your recommendation (1) Closed all Terminal windows, then restarted Terminal; (2) Ran the script via the Terminal, i.e. "python3 test.py". The results from both cases are the same, unfortunately.
-
-
enter_display_name_here over 5 yearsI tried this but the result is comparable to xlsxwriter.readthedocs.io/example_pandas_multiple.html. It's a different engine but the workbook is opened only once then both tabs are written. I need to open/write/close then open/write/close. I tried your suggestion by putting
d1.to_excel()
andd2.to_excel()
in separatewith
loops but the result was a single XLSX with a single tab. I also tried your suggestion but with settingmode='w'
. It worked the same asmode='a'
. -
It_is_Chris over 5 years@enter_display_name_here why do you need to open/write/close multiple times?
-
enter_display_name_here over 5 yearsI am building an extremely large dictionary of DataFrames and would like to write out each DataFrame as a tab as they get completed. I would hate to get 100+ DataFrames in and then the program experiences an issue and nothing is written out. Basically, I am scraping 4 websites, compiling different tables of data into a single DF, then doing that process over and over again. I'd like to generate an individual tab per DF as it goes through the program.
-
enter_display_name_here over 5 yearsThanks for taking the time to put that together. Your update is actually the workaround to what was done prior to the addition of
mode='a'
. If you refer to line 1009 (here: github.com/pandas-dev/pandas/pull/21251/files/…), you'll notice that they addedbook = load_workbook(self.path)
to their code. Your code works the same as withmode='w'
. If I am unable to getmode='a'
to work, then I will use this. I am hoping that their new code will check for an existing tab and then overwrite that. -
It_is_Chris over 5 yearsdid you create the blank
test.xlsx
file first -
enter_display_name_here over 5 yearsYep, I tried that and it worked. I also just double-checked their code. At first, I thought a new workbook would be created if one doesn't exist, but I think that's incorrect. I think the code creates a new workbook if
mode=
is not'a'
, not if a workbook does not exist. -
It_is_Chris over 5 yearsAfter looking at the code that is correct, a file is not created when
mode=='a'
they are just usingload_workbook
, which means the file needs to exist prior to runningExcelWriter
with the parammode='a'
ifmode != a
then they are creating a new workbook:self.book = Workbook()
-
Ishkatan about 5 yearsJust re-read and see that I answered how to append to the same tab but the question is about multiple tabs. I have a program similar to yours that writes multiple tabs successfully. I don't specify engine='openpyxl' and I don't do a writer.save() until the very end of the program - last step. Works though I could see limits if my data was too large.