XLRD/Python: Reading Excel file into dict with for-loops

77,399

Solution 1

The idea is to, first, read the header into the list. Then, iterate over the sheet rows (starting from the next after the header), create new dictionary based on header keys and appropriate cell values and append it to the list of dictionaries:

from xlrd import open_workbook

book = open_workbook('forum.xlsx')
sheet = book.sheet_by_index(3)

# read header values into the list    
keys = [sheet.cell(0, col_index).value for col_index in xrange(sheet.ncols)]

dict_list = []
for row_index in xrange(1, sheet.nrows):
    d = {keys[col_index]: sheet.cell(row_index, col_index).value 
         for col_index in xrange(sheet.ncols)}
    dict_list.append(d)

print dict_list

For a sheet containing:

A   B   C   D
1   2   3   4
5   6   7   8

it prints:

[{'A': 1.0, 'C': 3.0, 'B': 2.0, 'D': 4.0}, 
 {'A': 5.0, 'C': 7.0, 'B': 6.0, 'D': 8.0}]

UPD (expanding the dictionary comprehension):

d = {}
for col_index in xrange(sheet.ncols):
    d[keys[col_index]] = sheet.cell(row_index, col_index).value 

Solution 2

from xlrd import open_workbook

dict_list = []
book = open_workbook('forum.xlsx')
sheet = book.sheet_by_index(3)

# read first row for keys  
keys = sheet.row_values(0)

# read the rest rows for values
values = [sheet.row_values(i) for i in range(1, sheet.nrows)]

for value in values:
    dict_list.append(dict(zip(keys, value)))

print dict_list

Solution 3

Try this one. This function below will return generator contains dict of each row and column.

from xlrd import open_workbook

for row in parse_xlsx():
    print row # {id: 4, thread_id: 100, forum_id: 3, post_time: 1377000566, votes: 1, post_text: 'here is some text'}

def parse_xlsx():
    workbook = open_workbook('excelsheet.xlsx')
    sheets = workbook.sheet_names()
    active_sheet = workbook.sheet_by_name(sheets[0])
    num_rows = active_sheet.nrows
    num_cols = active_sheet.ncols
    header = [active_sheet.cell_value(0, cell).lower() for cell in range(num_cols)]
    for row_idx in xrange(1, num_rows):
        row_cell = [active_sheet.cell_value(row_idx, col_idx) for col_idx in range(num_cols)]
        yield dict(zip(header, row_cell))

Solution 4

This script allow you to transform a excel data to list of dictionnary

import xlrd

workbook = xlrd.open_workbook('forum.xls')
workbook = xlrd.open_workbook('forum.xls', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock the name of the column
for col in range(worksheet.ncols):
    first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnary
data =[]
for row in range(1, worksheet.nrows):
    elm = {}
    for col in range(worksheet.ncols):
        elm[first_row[col]]=worksheet.cell_value(row,col)
    data.append(elm)
print data

Solution 5

Try to first set up your keys by parsing just the first line, all columns, another function to parse the data, then call them in order.

all_fields_list = []
header_dict = {}
def parse_data_headers(sheet):
   global header_dict
   for c in range(sheet.ncols):
       key = sheet.cell(1, c) #here 1 is the row number where your header is
       header_dict[c] = key   #store it somewhere, here I have chosen to store in a dict
def parse_data(sheet):
   for r in range(2, sheet.nrows):
       row_dict = {}
       for c in range(sheet.ncols):
           value = sheet.cell(r,c)
           row_dict[c] = value
       all_fields_list.append(row_dict)
Share:
77,399

Related videos on Youtube

kylerthecreator
Author by

kylerthecreator

Updated on July 23, 2022

Comments

  • kylerthecreator
    kylerthecreator almost 2 years

    I'm looking to read in an Excel workbook with 15 fields and about 2000 rows, and convert each row to a dictionary in Python. I then want to append each dictionary to a list. I'd like each field in the top row of the workbook to be a key within each dictionary, and have the corresponding cell value be the value within the dictionary. I've already looked at examples here and here, but I'd like to do something a bit different. The second example will work, but I feel like it would be more efficient looping over the top row to populate the dictionary keys and then iterate through each row to get the values. My Excel file contains data from discussion forums and looks something like this (obviously with more columns):

    id    thread_id    forum_id    post_time    votes    post_text
    4     100          3           1377000566   1        'here is some text'
    5     100          4           1289003444   0        'even more text here'
    

    So, I'd like the fields id, thread_id and so on, to be the dictionary keys. I'd like my dictionaries to look like:

    {id: 4, 
    thread_id: 100,
    forum_id: 3,
    post_time: 1377000566,
    votes: 1,
    post_text: 'here is some text'}
    

    Initially, I had some code like this iterating through the file, but my scope is wrong for some of the for-loops and I'm generating way too many dictionaries. Here's my initial code:

    import xlrd
    from xlrd import open_workbook, cellname
    
    book = open_workbook('forum.xlsx', 'r')
    sheet = book.sheet_by_index(3)
    
    dict_list = []
    
    for row_index in range(sheet.nrows):
        for col_index in range(sheet.ncols):
            d = {}
    
            # My intuition for the below for-loop is to take each cell in the top row of the 
            # Excel sheet and add it as a key to the dictionary, and then pass the value of 
            # current index in the above loops as the value to the dictionary. This isn't
            # working.
    
            for i in sheet.row(0):
               d[str(i)] = sheet.cell(row_index, col_index).value
               dict_list.append(d)
    

    Any help would be greatly appreciated. Thanks in advance for reading.

    • DSM
      DSM almost 10 years
      As an aside, if you're going to be manipulating tabular data of the sort that gets stored in Excel files, you might be interested in pandas; it'll make things you haven't even thought of yet easier. (For example, your entire code could be one line, if you liked.)
    • kylerthecreator
      kylerthecreator almost 10 years
      @DSM: thanks for the rec. I do know of pandas, but wanted to do some heavy lifting on my own as I thought it would make for a good learning experience. Looking into that in the future, though.
  • kylerthecreator
    kylerthecreator almost 10 years
    This is really excellent. One more thing: would you be willing to conceptually break down the lines containing the variable assignment of d? I'm not able to follow the nested for-loops there for some reason. Thanks again. This is really helpful.
  • alecxe
    alecxe almost 10 years
    @kylerthecreator yup, see the update. FYI, there was a for key in keys unneeded loop that should not be there - I've removed it also. Works the same as before.
  • Pieter
    Pieter over 3 years
    nice use of zip to zip the headings and row values together into a dict.