Pandas: Looking up the list of sheets in an excel file

python excel pandas openpyxl xlrd

249,485

Solution 1

You can still use the ExcelFile class (and the sheet_names attribute):

xl = pd.ExcelFile('foo.xls')

xl.sheet_names  # see all sheet names

xl.parse(sheet_name)  # read a specific sheet to DataFrame

see docs for parse for more options...

Solution 2

You should explicitly specify the second parameter (sheetname) as None. like this:

 df = pandas.read_excel("/yourPath/FileName.xlsx", None);

"df" are all sheets as a dictionary of DataFrames, you can verify it by run this:

df.keys()

result like this:

[u'201610', u'201601', u'201701', u'201702', u'201703', u'201704', u'201705', u'201706', u'201612', u'fund', u'201603', u'201602', u'201605', u'201607', u'201606', u'201608', u'201512', u'201611', u'201604']

please refer pandas doc for more details: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

Solution 3

This is the fastest way I have found, inspired by @divingTobi's answer. All The answers based on xlrd, openpyxl or pandas are slow for me, as they all load the whole file first.

from zipfile import ZipFile
from bs4 import BeautifulSoup  # you also need to install "lxml" for the XML parser

with ZipFile(file) as zipped_file:
    summary = zipped_file.open(r'xl/workbook.xml').read()
soup = BeautifulSoup(summary, "xml")
sheets = [sheet.get("name") for sheet in soup.find_all("sheet")]

Solution 4

The easiest way to retrieve the sheet-names from an excel (xls., xlsx) is:

tabs = pd.ExcelFile("path").sheet_names 
print(tabs)

Then to read and store the data of a particular sheet (say, sheet names are "Sheet1", "Sheet2", etc.), say "Sheet2" for example:

data = pd.read_excel("path", "Sheet2") 
print(data)

Solution 5

#It will work for Both '.xls' and '.xlsx' by using pandas

import pandas as pd
excel_Sheet_names = (pd.ExcelFile(excelFilePath)).sheet_names

#for '.xlsx' use only  openpyxl

from openpyxl import load_workbook
excel_Sheet_names = (load_workbook(excelFilePath, read_only=True)).sheet_names

View more solutions

249,485

Amelio Vazquez-Reina

I'm passionate about people, technology and research. Some of my favorite quotes: "Far better an approximate answer to the right question than an exact answer to the wrong question" -- J. Tukey, 1962. "Your title makes you a manager, your people make you a leader" -- Donna Dubinsky, quoted in "Trillion Dollar Coach", 2019.

Updated on December 08, 2021

Comments

Amelio Vazquez-Reina over 2 years
The new version of Pandas uses the following interface to load Excel files:
```
read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
```
but what if I don't know the sheets that are available?

For example, I am working with excel files that the following sheets

Data 1, Data 2 ..., Data N, foo, bar

but I don't know N a priori.

Is there any way to get the list of sheets from an excel document in Pandas?
Amelio Vazquez-Reina almost 11 years

Thanks @Andy. May I ask, does Pandas load the excel sheet in ExcelFile? Also, say I look up the list of sheets and decide to load N of them, should I at that point call read_excel (the new interface) for each sheet, or stick to x1.parse?
Andy Hayden almost 11 years

I think ExcelFile keeps the file open (and doesn't read it all), I think using parse (and opening the file only once) makes most sense here. tbh I missed the arrival of read_excel!
Andy Hayden almost 11 years

Mentioned before here, but I like to keep a dictionary of DataFrames using {sheet_name: xl.parse(sheet_name) for sheet_name in xl.sheet_names}
Ezekiel Kruglick almost 9 years

Wish I could give you more upvotes, this works across multiple versions of pandas too! (don't know why they like changing the API so often) Thanks for pointing me at the parse function, here is the current link though: pandas.pydata.org/pandas-docs/stable/generated/…
semore_1267 about 7 years

Agreed, imo this is the best way to load Excel files w/ pandas.
Nicholas Lu almost 7 years

read_excel provided the build-in support to iterate sheets, I think it's not necessary to use the old ExcelFile interface. please see my answer.
Andy Hayden almost 7 years

This unnecessarily parses every sheet as a DataFrame, which is not required. "How to read an xls/xlsx file" is a different question.
Andy Hayden almost 7 years

@NicholasLu the downvote was unnecessary, this answer is from 2013! That said, whilst ExcelFile is the original way to parse excel files it is not deprecated and remains a perfectly valid way to do this.
CodeMonkey over 6 years

@AndyHayden it might not be efficient, but it might be the best if you care about all the sheets, or you don't care about the additional overhead.
divingTobi about 4 years

No, .xls is a completely different file format, so I would not expect this code to work.
Daniel almost 4 years

What are the modules you are using?
Dhwanil shah almost 4 years

@Daniel I have used only zipfile which is an in-built module and xmltodict which I used to convert the XML into an easily iterable dictionary. Although you can look at @divingTobi's answer below where you can read the same file without actually extracting the files within.
flutefreak7 almost 4 years

When I tried openpyxl with the read_only flag it is significantly faster (200X faster for my 5 MB file). load_workbook(excel_file).sheetnames averaged 8.24s where load_workbook(excel_file, read_only=True).sheetnames averaged 39.6ms.
vjangus about 3 years

When opening xlsx files, this will fail in pandas 1.1.5. But can be fixed by using xl = pd.ExcelFile('foo.xls', engine='openpyxl'). Related on my issue, see this thread
Corey Levinson almost 3 years

The named argument is called sheet_name. I.e., df = pandas.read_excel("/yourPath/FileName.xlsx", sheet_name=None, engine='openpyxl')
StudentAtLU about 2 years

I believe the method is called ".sheetnames" (without underscore).