Pandas: Looking up the list of sheets in an excel file
Solution 1
You can still use the ExcelFile class (and the sheet_names
attribute):
xl = pd.ExcelFile('foo.xls')
xl.sheet_names # see all sheet names
xl.parse(sheet_name) # read a specific sheet to DataFrame
see docs for parse for more options...
Solution 2
You should explicitly specify the second parameter (sheetname) as None. like this:
df = pandas.read_excel("/yourPath/FileName.xlsx", None);
"df" are all sheets as a dictionary of DataFrames, you can verify it by run this:
df.keys()
result like this:
[u'201610', u'201601', u'201701', u'201702', u'201703', u'201704', u'201705', u'201706', u'201612', u'fund', u'201603', u'201602', u'201605', u'201607', u'201606', u'201608', u'201512', u'201611', u'201604']
please refer pandas doc for more details: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html
Solution 3
This is the fastest way I have found, inspired by @divingTobi's answer. All The answers based on xlrd, openpyxl or pandas are slow for me, as they all load the whole file first.
from zipfile import ZipFile
from bs4 import BeautifulSoup # you also need to install "lxml" for the XML parser
with ZipFile(file) as zipped_file:
summary = zipped_file.open(r'xl/workbook.xml').read()
soup = BeautifulSoup(summary, "xml")
sheets = [sheet.get("name") for sheet in soup.find_all("sheet")]
Solution 4
The easiest way to retrieve the sheet-names from an excel (xls., xlsx) is:
tabs = pd.ExcelFile("path").sheet_names
print(tabs)
Then to read and store the data of a particular sheet (say, sheet names are "Sheet1", "Sheet2", etc.), say "Sheet2" for example:
data = pd.read_excel("path", "Sheet2")
print(data)
Solution 5
#It will work for Both '.xls' and '.xlsx' by using pandas
import pandas as pd
excel_Sheet_names = (pd.ExcelFile(excelFilePath)).sheet_names
#for '.xlsx' use only openpyxl
from openpyxl import load_workbook
excel_Sheet_names = (load_workbook(excelFilePath, read_only=True)).sheet_names
Related videos on Youtube
Amelio Vazquez-Reina
I'm passionate about people, technology and research. Some of my favorite quotes: "Far better an approximate answer to the right question than an exact answer to the wrong question" -- J. Tukey, 1962. "Your title makes you a manager, your people make you a leader" -- Donna Dubinsky, quoted in "Trillion Dollar Coach", 2019.
Updated on December 08, 2021Comments
-
Amelio Vazquez-Reina over 2 years
The new version of Pandas uses the following interface to load Excel files:
read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
but what if I don't know the sheets that are available?
For example, I am working with excel files that the following sheets
Data 1, Data 2 ..., Data N, foo, bar
but I don't know
N
a priori.Is there any way to get the list of sheets from an excel document in Pandas?
-
Amelio Vazquez-Reina almost 11 yearsThanks @Andy. May I ask, does Pandas load the excel sheet in
ExcelFile
? Also, say I look up the list of sheets and decide to load N of them, should I at that point callread_excel
(the new interface) for each sheet, or stick tox1.parse
? -
Andy Hayden almost 11 yearsI think ExcelFile keeps the file open (and doesn't read it all), I think using parse (and opening the file only once) makes most sense here. tbh I missed the arrival of read_excel!
-
Andy Hayden almost 11 yearsMentioned before here, but I like to keep a dictionary of DataFrames using
{sheet_name: xl.parse(sheet_name) for sheet_name in xl.sheet_names}
-
Ezekiel Kruglick almost 9 yearsWish I could give you more upvotes, this works across multiple versions of pandas too! (don't know why they like changing the API so often) Thanks for pointing me at the parse function, here is the current link though: pandas.pydata.org/pandas-docs/stable/generated/…
-
semore_1267 about 7 yearsAgreed, imo this is the best way to load Excel files w/ pandas.
-
Nicholas Lu almost 7 yearsread_excel provided the build-in support to iterate sheets, I think it's not necessary to use the old ExcelFile interface. please see my answer.
-
Andy Hayden almost 7 yearsThis unnecessarily parses every sheet as a DataFrame, which is not required. "How to read an xls/xlsx file" is a different question.
-
Andy Hayden almost 7 years@NicholasLu the downvote was unnecessary, this answer is from 2013! That said, whilst ExcelFile is the original way to parse excel files it is not deprecated and remains a perfectly valid way to do this.
-
CodeMonkey over 6 years@AndyHayden it might not be efficient, but it might be the best if you care about all the sheets, or you don't care about the additional overhead.
-
divingTobi about 4 yearsNo, .xls is a completely different file format, so I would not expect this code to work.
-
Daniel almost 4 yearsWhat are the modules you are using?
-
Dhwanil shah almost 4 years@Daniel I have used only
zipfile
which is an in-built module andxmltodict
which I used to convert the XML into an easily iterable dictionary. Although you can look at @divingTobi's answer below where you can read the same file without actually extracting the files within. -
flutefreak7 almost 4 yearsWhen I tried openpyxl with the read_only flag it is significantly faster (200X faster for my 5 MB file).
load_workbook(excel_file).sheetnames
averaged 8.24s whereload_workbook(excel_file, read_only=True).sheetnames
averaged 39.6ms. -
vjangus about 3 yearsWhen opening xlsx files, this will fail in pandas 1.1.5. But can be fixed by using
xl = pd.ExcelFile('foo.xls', engine='openpyxl')
. Related on my issue, see this thread -
Corey Levinson almost 3 yearsThe named argument is called
sheet_name
. I.e.,df = pandas.read_excel("/yourPath/FileName.xlsx", sheet_name=None, engine='openpyxl')
-
StudentAtLU about 2 yearsI believe the method is called ".sheetnames" (without underscore).