pandas :Read xlsx file to dict with column1 as key and column2 as values

python python-2.7 pandas xlsx

10,058

Solution 1

You can use a collections.OrderedDict to keep the keys in order. You'll note that pd.read_excel loads the first sheet by default. Edit: then you say you want to encode the items in the dictionary, and evaluate 'None' as None...

import collections as co
import pandas as pd

df = pd.read_excel('file.xlsx')
df = df.where(pd.notnull(df), None)
od = co.OrderedDict((k.strip().encode('utf8'),v.strip().encode('utf8')) 
                    for (k,v) in df.values)

Result:

>>> od
OrderedDict([(u'key1', u'str_value1'), (u'key2', u'str_value2'), (u'key3', u'None'), (u'key4', u'int_value3')])

General note: you should keep strings as Unicode within your Python program.

Solution 2

You can use pandas read_excel method to read the excel file more conveniently. You can pass a index_col argument where you can define which column of your xlsx is the index.

How to change NaN to None is explained in this question.

Given an xlsx file called example.xlsx which is build like you wrote above, the following code should give your expected results:

import pandas as pd

df = pd.read_excel("example.xlsx", index_col=0)
df = df.where(pd.notnull(df), None)

print df.to_dict()["dict_value"]

10,058

Anil_M

I love to solve problems, especially challenging one's , Help others and learn something new along the way. I believe SO is a great platform to learn and share knowledge.

Updated on June 13, 2022

Comments

Anil_M almost 2 years

I am new to pandas. I need to read a xlsx file and convert first column to key of a dict and second column to values of a dict using pandas. I also need to skip / exclude first row which are headers.

The answer here is for pymysql and here is for csv. I need to user pandas.

Here is a sample excel data

dict_key    dict_value  
key1        str_value1  
key2        str_value2  
key3         None  
key4         int_value3

My code so far is as below.

import pandas as pd

excel_file = "file.xlsx"
xls = pd.ExcelFile(excel_file)
df = xls.parse(xls.sheet_names[0], skiprows=1, index_col=None, na_values=['None'])
data_dict = df.to_dict()

However, it gives me dict where keys are column numbers and values are both column1 data as well as column2 data.

>>> data_dict
{u'Chg_Parms': {0: u'  key1 ', 1: u'   key2 ', 2: u'   key3 ', 3: u'   key4 ', 4: u'   str_value1 ', 
                5: u'   str_value2 ', 6: u'   Nan ', 6: u'   int_value3 '}}

what I would like to have is column1 data as key and column two data as values and also NaN replaced with None

data_dict = {'key1': 'str_value1', 'key2': 'str_value2', 'key3': None, 'key4': int_value3}

Thanks for your help.

Anil_M about 7 years

@ bernie Thanks for the ans. This is definitely towards what I need. However, how do I convert each key value to non unicode representation, strip white space and also maintain its type. for eg. str(u' 1') results in '1' and str(u'None') results in 'None'. I need int and boolean values as it is.
mechanical_meat about 7 years

@Anil_M: you're very welcome. Please see edited answer.
mechanical_meat about 7 years

df = df.where(pd.notnull(df), None) nice one, +1
Anil_M about 7 years

I added .strip() next to encode('utf8') to take care of white space. I belive that answers my questions. thanks.
mechanical_meat about 7 years

@Anil_M: anytime! Happy coding to you.