Is there a way to auto-adjust Excel column widths with pandas.ExcelWriter?

131,698

Solution 1

Inspired by user6178746's answer, I have the following:

# Given a dict of dataframes, for example:
# dfs = {'gadgets': df_gadgets, 'widgets': df_widgets}

writer = pd.ExcelWriter(filename, engine='xlsxwriter')
for sheetname, df in dfs.items():  # loop through `dict` of dataframes
    df.to_excel(writer, sheet_name=sheetname)  # send df to writer
    worksheet = writer.sheets[sheetname]  # pull worksheet object
    for idx, col in enumerate(df):  # loop through all columns
        series = df[col]
        max_len = max((
            series.astype(str).map(len).max(),  # len of largest item
            len(str(series.name))  # len of column name/header
            )) + 1  # adding a little extra space
        worksheet.set_column(idx, idx, max_len)  # set column width
writer.save()

Solution 2

I'm posting this because I just ran into the same issue and found that the official documentation for Xlsxwriter and pandas still have this functionality listed as unsupported. I hacked together a solution that solved the issue i was having. I basically just iterate through each column and use worksheet.set_column to set the column width == the max length of the contents of that column.

One important note, however. This solution does not fit the column headers, simply the column values. That should be an easy change though if you need to fit the headers instead. Hope this helps someone :)

import pandas as pd
import sqlalchemy as sa
import urllib


read_server = 'serverName'
read_database = 'databaseName'

read_params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+read_server+";DATABASE="+read_database+";TRUSTED_CONNECTION=Yes")
read_engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % read_params)

#Output some SQL Server data into a dataframe
my_sql_query = """ SELECT * FROM dbo.my_table """
my_dataframe = pd.read_sql_query(my_sql_query,con=read_engine)

#Set destination directory to save excel.
xlsFilepath = r'H:\my_project' + "\\" + 'my_file_name.xlsx'
writer = pd.ExcelWriter(xlsFilepath, engine='xlsxwriter')

#Write excel to file using pandas to_excel
my_dataframe.to_excel(writer, startrow = 1, sheet_name='Sheet1', index=False)

#Indicate workbook and worksheet for formatting
workbook = writer.book
worksheet = writer.sheets['Sheet1']

#Iterate through each column and set the width == the max length in that column. A padding length of 2 is also added.
for i, col in enumerate(my_dataframe.columns):
    # find length of column i
    column_len = my_dataframe[col].astype(str).str.len().max()
    # Setting the length if the column header is larger
    # than the max column value length
    column_len = max(column_len, len(col)) + 2
    # set the column length
    worksheet.set_column(i, i, column_len)
writer.save()

Solution 3

Dynamically adjust all the column lengths

writer = pd.ExcelWriter('/path/to/output/file.xlsx') 
df.to_excel(writer, sheet_name='sheetName', index=False, na_rep='NaN')

for column in df:
    column_length = max(df[column].astype(str).map(len).max(), len(column))
    col_idx = df.columns.get_loc(column)
    writer.sheets['sheetName'].set_column(col_idx, col_idx, column_length)

writer.save()

Manually adjust a column using Column Name

col_idx = df.columns.get_loc('columnName')
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

Manually adjust a column using Column Index

writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

In case any of the above is failing with

AttributeError: 'Worksheet' object has no attribute 'set_column'

make sure to install xlsxwriter:

pip install xlsxwriter

For a more comprehensive explanation you can read the article How to Auto-Adjust the Width of Excel Columns with Pandas ExcelWriter on TDS.

Solution 4

There is a nice package that I started to use recently called StyleFrame.

it gets DataFrame and lets you to style it very easily...

by default the columns width is auto-adjusting.

for example:

from StyleFrame import StyleFrame
import pandas as pd

df = pd.DataFrame({'aaaaaaaaaaa': [1, 2, 3], 
                   'bbbbbbbbb': [1, 1, 1],
                   'ccccccccccc': [2, 3, 4]})
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(excel_writer=excel_writer, row_to_add_filters=0,
            columns_and_rows_to_freeze='B2')
excel_writer.save()

you can also change the columns width:

sf.set_column_width(columns=['aaaaaaaaaaa', 'bbbbbbbbb'],
                    width=35.3)

UPDATE 1

In version 1.4 best_fit argument was added to StyleFrame.to_excel. See the documentation.

UPDATE 2

Here's a sample of code that works for StyleFrame 3.x.x

from styleframe import StyleFrame
import pandas as pd

columns = ['aaaaaaaaaaa', 'bbbbbbbbb', 'ccccccccccc', ]
df = pd.DataFrame(data={
        'aaaaaaaaaaa': [1, 2, 3, ],
        'bbbbbbbbb': [1, 1, 1, ],
        'ccccccccccc': [2, 3, 4, ],
    }, columns=columns,
)
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(
    excel_writer=excel_writer, 
    best_fit=columns,
    columns_and_rows_to_freeze='B2', 
    row_to_add_filters=0,
)
excel_writer.save()

Solution 5

There is probably no automatic way to do it right now, but as you use openpyxl, the following line (adapted from another answer by user Bufke on how to do in manually) allows you to specify a sane value (in character widths):

writer.sheets['Summary'].column_dimensions['A'].width = 15
Share:
131,698

Related videos on Youtube

badideas
Author by

badideas

Updated on November 28, 2021

Comments

  • badideas
    badideas over 2 years

    I am being asked to generate some Excel reports. I am currently using pandas quite heavily for my data, so naturally I would like to use the pandas.ExcelWriter method to generate these reports. However the fixed column widths are a problem.

    The code I have so far is simple enough. Say I have a dataframe called 'df':

    writer = pd.ExcelWriter(excel_file_path, engine='openpyxl')
    df.to_excel(writer, sheet_name="Summary")
    

    I was looking over the pandas code, and I don't really see any options to set column widths. Is there a trick out there in the universe to make it such that the columns auto-adjust to the data? Or is there something I can do after the fact to the xlsx file to adjust the column widths?

    (I am using the OpenPyXL library, and generating .xlsx files - if that makes any difference.)

    Thank you.

    • Jeff
      Jeff almost 11 years
      doesn't look possible at the moment, please open an issue for this enhancement on github (and maybe a PR?). doesn't look that hard to do.
    • badideas
      badideas almost 11 years
      thanks Jeff, i have submitted the issue. i'm not sure if i will have time to actually dive into the pandas codebase to solve it, but you never know :)
    • Jeff
      Jeff almost 11 years
      yep....saw your issue.....comment on the issue if you need some help! (essentially need to pass an optional argument to to_excel, maybe col_style=dict which contains col header style elements (rather than the default header_style which seems to be hard coded now
    • dmvianna
      dmvianna over 10 years
  • Admin
    Admin about 8 years
    Good solution. I like how you used pandas instead of another package.
  • ᴊᴇsᴘᴇʀᴋ.ᴇᴛʜ
    ᴊᴇsᴘᴇʀᴋ.ᴇᴛʜ over 7 years
    FYI: In my case I needed to use "index=False" in the "df.to_excel(...)" call, or else the columns were off by 1
  • Heikki Pulkkinen
    Heikki Pulkkinen almost 6 years
    yep, I also had to add df.to_excel(writer, sheet_name=sheetname, index=False)
  • ac24
    ac24 almost 6 years
    If you can't use index=False (because you have a multiindex on rows), then you can get the index level depth with df.index.nlevels and then use this to add on to your set column call: worksheet.set_column(idx+nlevels, idx+nlevels, max_len). Otherwise the length is calculated for the first column of the frame, and then applied to the first column in the excel, which is probably the index.
  • Dascienz
    Dascienz over 5 years
    For anyone still looking for this answer, enumerate(df) should be enumerate(df.columns) since you're iterating over each column in df.
  • alichaudry
    alichaudry over 5 years
    @Dascienz the same way iterating over a dict actually iterates over the keys in the dict (you don't have to manually say dict.keys()), iterating over a pd.DataFrame iterates over the columns. You don't have to manually iterate over df.columns.
  • Dascienz
    Dascienz over 5 years
    @alichaudry Ah, I see! Thanks for clarification!
  • John Y
    John Y over 5 years
    The StyleFrame package may be easy to use, but I don't see how "by default the columns width is auto-adjusting". When I run the code sample you gave, all the columns are the same width, and all three headers are wrapped. Your sample data is also poorly chosen, because they are all almost the same width naturally. To really illustrate automatic adjustment, you should choose some really wide data and some narrow data. When I do this for myself, the column widths are still exactly the same as before. There was no adjustment whatsoever.
  • John Y
    John Y over 5 years
    Maybe at one point in StyleFrame's history, the column widths were automatically adjusted by default, but at least today, you have to specify the column or columns you want adjusted in the best_fit parameter. Also, when I tried this, I got very poor results.
  • Admin
    Admin over 5 years
    the width seems to be off 1 column. I tried enabling and disabling the index parameter but no dice.
  • Chaoste
    Chaoste about 5 years
    Pandas supports a nice notation for calculation string length and other stuff: series.astype(str).map(len).max() can be rewritten as: series.astype(str).str.len().max().
  • Serdia
    Serdia over 4 years
    I think you need () inside max function: ` max(column_len(), len(col)) + 2`
  • ojdo
    ojdo over 4 years
    The default ExcelWriter engine pandas is using has changed since 2013 to Xlsxwriter, which does not contain a column_dimensions attribute. If you want to keep using openpyxl, simply specify it when creating the writer using pd.ExcelWriter(excel_filename, engine='openpyxl')
  • ojdo
    ojdo over 4 years
    @Sunil: check the other answers using Xlsxwriter as the engine to see how to specify the column width with today's default engine.
  • Nikhil VJ
    Nikhil VJ about 4 years
    thanks! for those looking: How you add more styling to header for example: sf.apply_headers_style(Styler(bold=False)) it took me a long time to figure that out. And in the import statement, from StyleFrame import StyleFrame, Styler . here's all the options apart from bold: styleframe.readthedocs.io/en/2.0.5/…
  • Hagbard
    Hagbard almost 4 years
    Unfortunately, this answer is outdated and I only get import errors if I try to apply it as the API seems to have changed significantly.
  • user3019973
    user3019973 almost 4 years
    I got following error when i replicated this code: AttributeError: 'str' object has no attribute 'to_excel'. It think it has something to do with the way "dataframe_list" is created. Mine is a list with 6 dataframe names
  • Umutambyi Gad
    Umutambyi Gad almost 4 years
    codes only doesn't answer the question you have to add some explanations or take time and read documentation about How do I write a good answer?
  • Brian
    Brian almost 4 years
    Hello! While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.
  • rafat.ch
    rafat.ch over 3 years
    Yes, the "dataframe_list" should have dataframes and not dataframe names.
  • DeepSpace
    DeepSpace over 3 years
    @Hagbard as of version 3 the import should be from styleframe import StyleFrame in order to comply with PEP8 name conventions
  • Guido
    Guido over 3 years
    Works only on Windows and MacOS, not on Linux though
  • Guido
    Guido over 3 years
    I see an issue dealing with multi_index dataframes - just fail once passing index=True in the method to_excel
  • jmd
    jmd about 3 years
    it seems this works well for string but a lot less well for date format, where the actual width can be much larger than what is calculated here
  • jmb
    jmb almost 3 years
    Good, simple solution here. Keep in mind if you're using an index, df.columns will have a different shape than what df.to_excel(writer,sheet_name=...) will output in the excel file. That can misalign the enumerate's i with what worksheet.set_column is expecting. I resolved this with df.reset_index().to_excel(...), but there's probably a better solution.
  • parsecer
    parsecer almost 3 years
    What is df here? Could you please show code, including df initialization?
  • Giorgos Myrianthous
    Giorgos Myrianthous almost 3 years
    @parsecer You can refer to the article I've shared at the bottom of the post.
  • MattG
    MattG over 2 years
    Worked perfectly, including the auto widths, explicit widths by column name and exception resolved by installing xlswriter. Thanks :)
  • Philippe Hebert
    Philippe Hebert over 2 years
    If you get AttributeError: 'Worksheet' object has no attribute 'set_column', you may be missing XlsxWriter and pandas is falling back on openpyxl. pip install XlsxWriter should solve it :)