How to increase process speed using read_excel in pandas?

17,697

Read all worksheets without guessing

Use sheetname = None argument to pd.read_excel. This will read all worksheets into a dictionary of dataframes. For example:

dfs = pd.read_excel('file.xlsx', sheetname=None)

# access 'Sheet1' worksheet
res = dfs['Sheet1']

Limit number of rows or columns

You can use parse_cols and skip_footer arguments to limit the number of columns and/or rows. This will reduce read time, and also works with sheetname = None.

For example, the following will read the first 3 columns and, if your worksheet has 100 rows, it will read only the first 20.

df = pd.read_excel('file.xlsx', sheetname=None, parse_cols='A:C', skip_footer=80)

If you wish to apply worksheet-specific logic, you can do so by extracting sheetnames:

sheet_names = pd.ExcelFile('file.xlsx', on_demand=True).sheet_names

dfs = {}
for sheet in sheet_names:
    dfs[sheet] = pd.read_excel('file.xlsx', sheet)

Improving performance

Reading Excel files into Pandas is naturally slower than other options (CSV, Pickle, HDF5). If you wish to improve performance, I strongly suggest you consider these other formats.

One option, for example, is to use a VBA script to convert your Excel worksheets to CSV files; then use pd.read_csv.

Share:
17,697
james.peng
Author by

james.peng

before 2017:a trditional oracle/mysql dba in 2017:python learner/vertica learner in future:hadoop learner

Updated on June 05, 2022

Comments

  • james.peng
    james.peng almost 2 years

    I need use pd.read_excel to process every sheet in one excel file.
    But in most cases,I did not know the sheet name.
    So I use this to judge how many sheet in excel:

    i_sheet_count=0
    i=0
    try:
      df.read_excel('/tmp/1.xlsx',sheetname=i)
      i_sheet_count+=1
      i+=1
    else:
      i+=1
    print(i_sheet_count)
    

    During the process,I found that the process is quite slow,
    So,can read_excel only read limited rows to improve the speed?
    I tried nrows but did not work..still slow..

  • james.peng
    james.peng almost 6 years
    Thats perfect! Thanks a lot