iterate through all rows in specific column openpyxl

109,444

Solution 1

You can specify a range to iterate over with ws.iter_rows():

import openpyxl

wb = openpyxl.load_workbook('C:/workbook.xlsx')
ws = wb['Sheet3']
for row in ws.iter_rows('C{}:C{}'.format(ws.min_row,ws.max_row)):
    for cell in row:
        print cell.value

Edit: per your comment you want the cell values in a list:

import openpyxl

wb = openpyxl.load_workbook('c:/_twd/2016-06-23_xlrd_xlwt/input.xlsx')
ws = wb.get_sheet_by_name('Sheet1')
mylist = []
for row in ws.iter_rows('A{}:A{}'.format(ws.min_row,ws.max_row)):
    for cell in row:
        mylist.append(cell.value)
print mylist 

Solution 2

Why can't you just iterate over column 'C' (version 2.4.7):

for cell in ws['C']:
   print cell.value

Solution 3

You can also do this.

for row in ws.iter_rows():
   print(row[2].value)

With this you are still iterating through the rows (but not the cells) and only pulling the values from column C in the row to print.

Solution 4

Some of the solutions above don't quite work very well (maybe because of latest version of 'openpyxl'). After trying out different things, I used this:

Printing all rows with all columns:

import openpyxl

sheet = openpyxl.load_workbook('myworkbook.xlsx')['Sheet1']
# Iterating through All rows with all columns...
for i in range(1, sheet.max_row+1):
    row = [cell.value for cell in sheet[i]] # sheet[n] gives nth row (list of cells)
    print(row) # list of cell values of this row

Printing all rows with specific columns (e.g. 'E' to 'L'):

# For example we need column 'E' to column 'L'
start_col = 4 # 'E' column index
end_col = 11 # 'L' column index
for i in range(1, sheet.max_row+1):
    row = [cell.value for cell in sheet[i][start_col:end_col+1]]
    print(row) # list of cell values of this row

Please keep these points in mind:

  • sheet[N] gives the list of 'Cell' objects of Nth row. (N is a number starting from 1)
  • To get the first column cell of a row, use sheet[N][0]. (Because sheet[N] is a 'tuple' which can be indexed starting from zero 0).

Solution 5

I do it like this. I'm not sure what I'm doing but it does avoid the cells with no values.

from openpyxl import load_workbook
wb = load_workbook(filename = 'exelfile.xlsx')
ws = wb['sheet1']

for col in ws['A']:
    print (col.value)
Share:
109,444

Related videos on Youtube

Daniel Dahms
Author by

Daniel Dahms

Updated on June 20, 2021

Comments

  • Daniel Dahms
    Daniel Dahms almost 3 years

    I cannot figure out how to iterate through all rows in a specified column with openpyxl.

    I want to print all of the cell values for all rows in column "C"

    Right now I have:

    from openpyxl import workbook
    path = 'C:/workbook.xlsx'
    wb = load_workbook(filename = path)
    ws=wb.get_sheet_by_name('Sheet3')
    
    for row in ws.iter_rows():
        for cell in row:
            if column == 'C':
                print cell.value
    
    • danielhadar
      danielhadar almost 8 years
      What's ws? How did you use openpyxl? Please give some more details about the goal you're trying to achive or else every answer will be based over assumptions.
    • Smiles
      Smiles about 5 years
      @danielhadar I think ws is short for work_sheet.
    • WhyWhat
      WhyWhat over 3 years
      The fact that this question had to be posted says something about the docs of openpyxl.
  • Charlie Clark
    Charlie Clark almost 8 years
    ws.get_squared_range() will let you use numeric boundaries. In read-only mode ws.max_row might not be available. openpyxl 2.4 has a better API here.
  • mechanical_meat
    mechanical_meat almost 8 years
    Thank you @CharlieClark. Added example of that.
  • Daniel Dahms
    Daniel Dahms almost 8 years
    The above script prints rows of strings such as: String1 String2 String3 ect. How would I combine all of these strings into a list: [String1, String2, String3] using list(cell.value) returns a list for each line of strings and separated out into individual characters.
  • Ares9323
    Ares9323 almost 4 years
    If you want to update your comment get_sheet_by_name is deprecated: Call to deprecated function get_sheet_by_name (Use wb[sheetname])
  • Ali Sajjad
    Ali Sajjad almost 4 years
    It is showing me: 'Worksheet' object has no attribute 'get_squared_range'
  • xtian
    xtian over 3 years
    Is string argument for ws.iter_rows() still valid? I needed to use ws.iter_rows(min_row=ws.min_row, max_row=ws.max_row)
  • Larry Guo
    Larry Guo almost 3 years
    With python3, I need to use for row in ws.iter_rows(ws.min_row,ws.max_row) or for row in ws to make it work.
  • Chop Labalagun
    Chop Labalagun over 2 years
    I do like this answer, it can focus in a column
  • Stefan
    Stefan almost 2 years
    ws.get_squared_range() is deprecated use ws.iter_rows, see stackoverflow.com/a/42532310/4442591
  • mechanical_meat
    mechanical_meat almost 2 years
    i removed the reference to the deprecated function. i already had a .iter_rows() solution in this answer.