Pandas: Looking up the list of sheets in an excel file

249,485

Solution 1

You can still use the ExcelFile class (and the sheet_names attribute):

xl = pd.ExcelFile('foo.xls')

xl.sheet_names  # see all sheet names

xl.parse(sheet_name)  # read a specific sheet to DataFrame

see docs for parse for more options...

Solution 2

You should explicitly specify the second parameter (sheetname) as None. like this:

 df = pandas.read_excel("/yourPath/FileName.xlsx", None);

"df" are all sheets as a dictionary of DataFrames, you can verify it by run this:

df.keys()

result like this:

[u'201610', u'201601', u'201701', u'201702', u'201703', u'201704', u'201705', u'201706', u'201612', u'fund', u'201603', u'201602', u'201605', u'201607', u'201606', u'201608', u'201512', u'201611', u'201604']

please refer pandas doc for more details: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

Solution 3

This is the fastest way I have found, inspired by @divingTobi's answer. All The answers based on xlrd, openpyxl or pandas are slow for me, as they all load the whole file first.

from zipfile import ZipFile
from bs4 import BeautifulSoup  # you also need to install "lxml" for the XML parser

with ZipFile(file) as zipped_file:
    summary = zipped_file.open(r'xl/workbook.xml').read()
soup = BeautifulSoup(summary, "xml")
sheets = [sheet.get("name") for sheet in soup.find_all("sheet")]

Solution 4

The easiest way to retrieve the sheet-names from an excel (xls., xlsx) is:

tabs = pd.ExcelFile("path").sheet_names 
print(tabs)

Then to read and store the data of a particular sheet (say, sheet names are "Sheet1", "Sheet2", etc.), say "Sheet2" for example:

data = pd.read_excel("path", "Sheet2") 
print(data)

Solution 5

#It will work for Both '.xls' and '.xlsx' by using pandas

import pandas as pd
excel_Sheet_names = (pd.ExcelFile(excelFilePath)).sheet_names

#for '.xlsx' use only  openpyxl

from openpyxl import load_workbook
excel_Sheet_names = (load_workbook(excelFilePath, read_only=True)).sheet_names
                                      
Share:
249,485

Related videos on Youtube

Amelio Vazquez-Reina
Author by

Amelio Vazquez-Reina

I'm passionate about people, technology and research. Some of my favorite quotes: "Far better an approximate answer to the right question than an exact answer to the wrong question" -- J. Tukey, 1962. "Your title makes you a manager, your people make you a leader" -- Donna Dubinsky, quoted in "Trillion Dollar Coach", 2019.

Updated on December 08, 2021

Comments

  • Amelio Vazquez-Reina
    Amelio Vazquez-Reina over 2 years

    The new version of Pandas uses the following interface to load Excel files:

    read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
    

    but what if I don't know the sheets that are available?

    For example, I am working with excel files that the following sheets

    Data 1, Data 2 ..., Data N, foo, bar

    but I don't know N a priori.

    Is there any way to get the list of sheets from an excel document in Pandas?

  • Amelio Vazquez-Reina
    Amelio Vazquez-Reina almost 11 years
    Thanks @Andy. May I ask, does Pandas load the excel sheet in ExcelFile? Also, say I look up the list of sheets and decide to load N of them, should I at that point call read_excel (the new interface) for each sheet, or stick to x1.parse?
  • Andy Hayden
    Andy Hayden almost 11 years
    I think ExcelFile keeps the file open (and doesn't read it all), I think using parse (and opening the file only once) makes most sense here. tbh I missed the arrival of read_excel!
  • Andy Hayden
    Andy Hayden almost 11 years
    Mentioned before here, but I like to keep a dictionary of DataFrames using {sheet_name: xl.parse(sheet_name) for sheet_name in xl.sheet_names}
  • Ezekiel Kruglick
    Ezekiel Kruglick almost 9 years
    Wish I could give you more upvotes, this works across multiple versions of pandas too! (don't know why they like changing the API so often) Thanks for pointing me at the parse function, here is the current link though: pandas.pydata.org/pandas-docs/stable/generated/…
  • semore_1267
    semore_1267 about 7 years
    Agreed, imo this is the best way to load Excel files w/ pandas.
  • Nicholas Lu
    Nicholas Lu almost 7 years
    read_excel provided the build-in support to iterate sheets, I think it's not necessary to use the old ExcelFile interface. please see my answer.
  • Andy Hayden
    Andy Hayden almost 7 years
    This unnecessarily parses every sheet as a DataFrame, which is not required. "How to read an xls/xlsx file" is a different question.
  • Andy Hayden
    Andy Hayden almost 7 years
    @NicholasLu the downvote was unnecessary, this answer is from 2013! That said, whilst ExcelFile is the original way to parse excel files it is not deprecated and remains a perfectly valid way to do this.
  • CodeMonkey
    CodeMonkey over 6 years
    @AndyHayden it might not be efficient, but it might be the best if you care about all the sheets, or you don't care about the additional overhead.
  • divingTobi
    divingTobi about 4 years
    No, .xls is a completely different file format, so I would not expect this code to work.
  • Daniel
    Daniel almost 4 years
    What are the modules you are using?
  • Dhwanil shah
    Dhwanil shah almost 4 years
    @Daniel I have used only zipfile which is an in-built module and xmltodict which I used to convert the XML into an easily iterable dictionary. Although you can look at @divingTobi's answer below where you can read the same file without actually extracting the files within.
  • flutefreak7
    flutefreak7 almost 4 years
    When I tried openpyxl with the read_only flag it is significantly faster (200X faster for my 5 MB file). load_workbook(excel_file).sheetnames averaged 8.24s where load_workbook(excel_file, read_only=True).sheetnames averaged 39.6ms.
  • vjangus
    vjangus about 3 years
    When opening xlsx files, this will fail in pandas 1.1.5. But can be fixed by using xl = pd.ExcelFile('foo.xls', engine='openpyxl'). Related on my issue, see this thread
  • Corey Levinson
    Corey Levinson almost 3 years
    The named argument is called sheet_name. I.e., df = pandas.read_excel("/yourPath/FileName.xlsx", sheet_name=None, engine='openpyxl')
  • StudentAtLU
    StudentAtLU about 2 years
    I believe the method is called ".sheetnames" (without underscore).