Openpyxl : need the max number of rows in a column that has data in Excel
Solution 1
Question: i want max_column containing data in Column 'C' it should return 10:
Simple count cell.value not Empty
Documentation Accessing many cells
PSEUDOCODE
for cell in Column('C'): if not cell.value is empty: count += 1
Comment: What if we have an empty cell in between?
Count the Rows in sync with the Column Range, and use a maxRowWithData
variable. This will also work with no empty cell between.
PSEUDOCODE
for row index, cell in enumerate Column('C'): if not cell.value is empty: maxRowWithData = row index
Note: The cell index of
openpyxl
is 1-based!
Documentation: enumerate(iterable, start=0)
Solution 2
"Empty" is a relative concept so your code should be clear about this. The methods in openpyxl are guaranteed to return orthogonal result sets: the length of rows and columns will always be the same.
Using this we can work deduce the row highest row in column of a cell where the value is not None.
max_row_for_c = max((c.row for c in ws['C'] if c.value is not None))
Solution 3
why not just find the length of column 'C' result would be same output-->10 because when u will get the column 'C' values it will present u as tuple elements so just take length of tuple which would come =10
import Openpyxl
file=openpyxl.load_workbook('example.xlsx')
current_sheet=file.get_sheet_by_name('sheet1')
Column_C=current_sheet['C']
print ( len(column_C))
data.close()
data.closed()
ASHISH M.G
Highly motivated Big Data Engineer with a strong passion for learning new technologies. Ashish doesn't like to stick to a framework or architecture just because it's popular, instead loves to explore its alternatives and tailor the architecture to best fit the business requirements. Huge fan of opensource and tries to help the community with any opensource tech already in his bucket. Believes in learning by sharing and team achievements. Ashish believes that sometimes less popular decision might be the best fit for a requirement . Recent topics of interest include Data Lake solutions like Apache Hudi, Streaming Platforms like Apache Pulsar and Kafka Total IT Experience: 6 years (as of Nov 12, 2021) Big Data Experience: ~5.5 years Cloud Experience: AWS Certified Solutions Architect Associate (926/1000) Top Skills : Proficient in Apache Spark with both Scala and Python Proficient in implementing Data Lake solution using Apache Hudi AWS Certified Solutions Architect Associate (926/1000) Proficient in Apache Hive Experience using different storages like HDFS, S3, Azure Blob Store, ADLS v2 etc Experience working with workflow orchestration tools like Oozie, Airflow etc Sound knowledge using ADF ( Azure Data Factory ) Kubernetes ( AWS - EKS , Azure - AKS ) Basic Devops using Jenkins, Github Actions, Terraform etc Programming languages : Python( Proficient ) Core Java ( Intermediate ) Scala ( Intermediate ) Shell Scripting Big Data : Apache Spark ( Proficient ) Apache Kafka( Intermediate ) Apache Flink ( Beginner) Apache Airflow and OOzie Devops : Experienced using Terraform for IaC Experienced using Jenkins ( Beginner ) Experienced using Github Actions Tools and Methodologies : Experienced with Git Experienced with working in Agile Methodologies Experienced with JIRA , Confluence, Lucid Charts etc
Updated on June 28, 2022Comments
-
ASHISH M.G almost 2 years
I need the last row in a particular column that contains data in Excel. In openpyxl sheet.max_row or max_column gets us the maximum row or column in the whole sheet. But what I want is for a particular column.
My scenario is where I have to get some values from database and append it to the end of a particular column in Excel sheet.
In this screenshot, if I want max_column containing data in column 'C', it should return 10:
In the above image if I want last cell containing data of column 'C', it should return 10
------------- Solution 1 --------------------
import pandas as pd # lt is the dataframe containing the data to be loaded to excel file for index,i in enumerate(lt): panda_xl_rd = pd.read_excel('file.xlsx',"sheet_Name") # Panda Dataframe max = len(panda_xl_rd.iloc[:,(col-1)].dropna())+2 ''' getting the row_num of last record in column dropna removes the Nan values else we will get the entire sheets max column length . +2 gets the next column right after the last column to enter data ''' cellref = sheet.cell(row = max+index, column=col) cellref.value = i del panda_xl_rd
------------------------Solution 2 ----------------------
------------------------Solution 3 ----------------------
Maybe solution 3 is a more concise one !!
-
ASHISH M.G over 5 yearsWhat if we have an empty cell in between? Normally we would'nt have , but just being curious
-
ASHISH M.G over 5 yearsThanks for the prompt reply ! But this is just returning me the Column name : 'C'
-
ASHISH M.G over 5 yearsThis is working good for me !! Thanks for the prompt reply . Will update both solutions in the question section itself!!
-
Zeitounator over 4 yearsPlease separate your explanations from code and provide a full code block formatted correctly with a working solution. Thanks in advance.
-
Chandra Shekhar over 4 yearsit is throwing error : NameError: name 'empty' is not defined
-
stovfl over 4 years@ChandraShekhar "NameError: name 'empty' is not defined": Have you noticed the word PSEUDOCODE, means it is not working code. You have to extend it to valid Python code for your needs.
-
Chandra Shekhar over 4 years@stovfl Thanks I had missed that word