Openpyxl : need the max number of rows in a column that has data in Excel

11,615

Solution 1

Question: i want max_column containing data in Column 'C' it should return 10:

Simple count cell.value not Empty
Documentation Accessing many cells

PSEUDOCODE

for cell in Column('C'):
    if not cell.value is empty:
        count += 1

Comment: What if we have an empty cell in between?

Count the Rows in sync with the Column Range, and use a maxRowWithData variable. This will also work with no empty cell between.

PSEUDOCODE

for row index, cell in enumerate Column('C'):
    if not cell.value is empty:
        maxRowWithData = row index

Note: The cell index of openpyxl is 1-based!

Documentation: enumerate(iterable, start=0)

Solution 2

"Empty" is a relative concept so your code should be clear about this. The methods in openpyxl are guaranteed to return orthogonal result sets: the length of rows and columns will always be the same.

Using this we can work deduce the row highest row in column of a cell where the value is not None.

max_row_for_c = max((c.row for c in ws['C'] if c.value is not None))

Solution 3

why not just find the length of column 'C' result would be same output-->10 because when u will get the column 'C' values it will present u as tuple elements so just take length of tuple which would come =10

import Openpyxl

file=openpyxl.load_workbook('example.xlsx')

current_sheet=file.get_sheet_by_name('sheet1')  

Column_C=current_sheet['C']   

print ( len(column_C))

data.close()
data.closed()
Share:
11,615
ASHISH M.G
Author by

ASHISH M.G

Highly motivated Big Data Engineer with a strong passion for learning new technologies. Ashish doesn't like to stick to a framework or architecture just because it's popular, instead loves to explore its alternatives and tailor the architecture to best fit the business requirements. Huge fan of opensource and tries to help the community with any opensource tech already in his bucket. Believes in learning by sharing and team achievements. Ashish believes that sometimes less popular decision might be the best fit for a requirement . Recent topics of interest include Data Lake solutions like Apache Hudi, Streaming Platforms like Apache Pulsar and Kafka Total IT Experience: 6 years (as of Nov 12, 2021) Big Data Experience: ~5.5 years Cloud Experience: AWS Certified Solutions Architect Associate (926/1000) Top Skills : Proficient in Apache Spark with both Scala and Python Proficient in implementing Data Lake solution using Apache Hudi AWS Certified Solutions Architect Associate (926/1000) Proficient in Apache Hive Experience using different storages like HDFS, S3, Azure Blob Store, ADLS v2 etc Experience working with workflow orchestration tools like Oozie, Airflow etc Sound knowledge using ADF ( Azure Data Factory ) Kubernetes ( AWS - EKS , Azure - AKS ) Basic Devops using Jenkins, Github Actions, Terraform etc Programming languages : Python( Proficient ) Core Java ( Intermediate ) Scala ( Intermediate ) Shell Scripting Big Data : Apache Spark ( Proficient ) Apache Kafka( Intermediate ) Apache Flink ( Beginner) Apache Airflow and OOzie Devops : Experienced using Terraform for IaC Experienced using Jenkins ( Beginner ) Experienced using Github Actions Tools and Methodologies : Experienced with Git Experienced with working in Agile Methodologies Experienced with JIRA , Confluence, Lucid Charts etc

Updated on June 28, 2022

Comments

  • ASHISH M.G
    ASHISH M.G almost 2 years

    I need the last row in a particular column that contains data in Excel. In openpyxl sheet.max_row or max_column gets us the maximum row or column in the whole sheet. But what I want is for a particular column.

    My scenario is where I have to get some values from database and append it to the end of a particular column in Excel sheet.

    In this screenshot, if I want max_column containing data in column 'C', it should return 10:

    image

    In the above image if I want last cell containing data of column 'C', it should return 10

    ------------- Solution 1 --------------------

    import pandas as pd
    
    # lt is the dataframe containing the data to be loaded to excel file
    
    for index,i in enumerate(lt):
       panda_xl_rd = pd.read_excel('file.xlsx',"sheet_Name") # Panda Dataframe
       max = len(panda_xl_rd.iloc[:,(col-1)].dropna())+2     ''' getting the 
                                                                 row_num of 
                                                                 last record in 
                                                                 column 
                                                                 dropna removes 
                                                                 the Nan 
                                                                 values else we 
                                                                 will get 
                                                                 the entire 
                                                                 sheets max 
                                                                 column length . 
                                                                 +2 gets 
                                                                 the next column 
                                                                 right after the 
                                                                 last column to 
                                                                 enter data '''
       cellref = sheet.cell(row = max+index, column=col)
       cellref.value = i
       del panda_xl_rd
    

    ------------------------Solution 2 ----------------------

    https://stackoverflow.com/a/52816289/10003981

    ------------------------Solution 3 ----------------------

    https://stackoverflow.com/a/52817637/10003981

    Maybe solution 3 is a more concise one !!

  • ASHISH M.G
    ASHISH M.G over 5 years
    What if we have an empty cell in between? Normally we would'nt have , but just being curious
  • ASHISH M.G
    ASHISH M.G over 5 years
    Thanks for the prompt reply ! But this is just returning me the Column name : 'C'
  • ASHISH M.G
    ASHISH M.G over 5 years
    This is working good for me !! Thanks for the prompt reply . Will update both solutions in the question section itself!!
  • Zeitounator
    Zeitounator over 4 years
    Please separate your explanations from code and provide a full code block formatted correctly with a working solution. Thanks in advance.
  • Chandra Shekhar
    Chandra Shekhar over 4 years
    it is throwing error : NameError: name 'empty' is not defined
  • stovfl
    stovfl over 4 years
    @ChandraShekhar "NameError: name 'empty' is not defined": Have you noticed the word PSEUDOCODE, means it is not working code. You have to extend it to valid Python code for your needs.
  • Chandra Shekhar
    Chandra Shekhar over 4 years
    @stovfl Thanks I had missed that word