Openpyxl : need the max number of rows in a column that has data in Excel

python excel openpyxl

11,615

Solution 1

Question: i want max_column containing data in Column 'C' it should return 10:

Simple count cell.value not Empty
Documentation Accessing many cells

PSEUDOCODE

for cell in Column('C'):
    if not cell.value is empty:
        count += 1

Comment: What if we have an empty cell in between?

Count the Rows in sync with the Column Range, and use a maxRowWithData variable. This will also work with no empty cell between.

PSEUDOCODE
for row index, cell in enumerate Column('C'):
    if not cell.value is empty:
        maxRowWithData = row index
Note: The cell index of openpyxl is 1-based!

Documentation: enumerate(iterable, start=0)

Solution 2

"Empty" is a relative concept so your code should be clear about this. The methods in openpyxl are guaranteed to return orthogonal result sets: the length of rows and columns will always be the same.

Using this we can work deduce the row highest row in column of a cell where the value is not None.

max_row_for_c = max((c.row for c in ws['C'] if c.value is not None))

Solution 3

why not just find the length of column 'C' result would be same output-->10 because when u will get the column 'C' values it will present u as tuple elements so just take length of tuple which would come =10

import Openpyxl

file=openpyxl.load_workbook('example.xlsx')

current_sheet=file.get_sheet_by_name('sheet1')  

Column_C=current_sheet['C']   

print ( len(column_C))

data.close()
data.closed()

11,615

Author by

ASHISH M.G

Highly motivated Big Data Engineer with a strong passion for learning new technologies. Ashish doesn't like to stick to a framework or architecture just because it's popular, instead loves to explore its alternatives and tailor the architecture to best fit the business requirements. Huge fan of opensource and tries to help the community with any opensource tech already in his bucket. Believes in learning by sharing and team achievements. Ashish believes that sometimes less popular decision might be the best fit for a requirement . Recent topics of interest include Data Lake solutions like Apache Hudi, Streaming Platforms like Apache Pulsar and Kafka Total IT Experience: 6 years (as of Nov 12, 2021) Big Data Experience: ~5.5 years Cloud Experience: AWS Certified Solutions Architect Associate (926/1000) Top Skills : Proficient in Apache Spark with both Scala and Python Proficient in implementing Data Lake solution using Apache Hudi AWS Certified Solutions Architect Associate (926/1000) Proficient in Apache Hive Experience using different storages like HDFS, S3, Azure Blob Store, ADLS v2 etc Experience working with workflow orchestration tools like Oozie, Airflow etc Sound knowledge using ADF ( Azure Data Factory ) Kubernetes ( AWS - EKS , Azure - AKS ) Basic Devops using Jenkins, Github Actions, Terraform etc Programming languages : Python( Proficient ) Core Java ( Intermediate ) Scala ( Intermediate ) Shell Scripting Big Data : Apache Spark ( Proficient ) Apache Kafka( Intermediate ) Apache Flink ( Beginner) Apache Airflow and OOzie Devops : Experienced using Terraform for IaC Experienced using Jenkins ( Beginner ) Experienced using Github Actions Tools and Methodologies : Experienced with Git Experienced with working in Agile Methodologies Experienced with JIRA , Confluence, Lucid Charts etc

Updated on June 28, 2022

Comments

ASHISH M.G almost 2 years

I need the last row in a particular column that contains data in Excel. In openpyxl sheet.max_row or max_column gets us the maximum row or column in the whole sheet. But what I want is for a particular column.

My scenario is where I have to get some values from database and append it to the end of a particular column in Excel sheet.

In this screenshot, if I want max_column containing data in column 'C', it should return 10:

In the above image if I want last cell containing data of column 'C', it should return 10

------------- Solution 1 --------------------

import pandas as pd

# lt is the dataframe containing the data to be loaded to excel file

for index,i in enumerate(lt):
   panda_xl_rd = pd.read_excel('file.xlsx',"sheet_Name") # Panda Dataframe
   max = len(panda_xl_rd.iloc[:,(col-1)].dropna())+2     ''' getting the 
                                                             row_num of 
                                                             last record in 
                                                             column 
                                                             dropna removes 
                                                             the Nan 
                                                             values else we 
                                                             will get 
                                                             the entire 
                                                             sheets max 
                                                             column length . 
                                                             +2 gets 
                                                             the next column 
                                                             right after the 
                                                             last column to 
                                                             enter data '''
   cellref = sheet.cell(row = max+index, column=col)
   cellref.value = i
   del panda_xl_rd

------------------------Solution 2 ----------------------

https://stackoverflow.com/a/52816289/10003981

------------------------Solution 3 ----------------------

https://stackoverflow.com/a/52817637/10003981

Maybe solution 3 is a more concise one !!

ASHISH M.G over 5 years

What if we have an empty cell in between? Normally we would'nt have , but just being curious
ASHISH M.G over 5 years

Thanks for the prompt reply ! But this is just returning me the Column name : 'C'
ASHISH M.G over 5 years

This is working good for me !! Thanks for the prompt reply . Will update both solutions in the question section itself!!
Zeitounator over 4 years

Please separate your explanations from code and provide a full code block formatted correctly with a working solution. Thanks in advance.
Chandra Shekhar over 4 years

it is throwing error : NameError: name 'empty' is not defined
stovfl over 4 years

@ChandraShekhar "NameError: name 'empty' is not defined": Have you noticed the word PSEUDOCODE, means it is not working code. You have to extend it to valid Python code for your needs.
Chandra Shekhar over 4 years

@stovfl Thanks I had missed that word