pandas :Read xlsx file to dict with column1 as key and column2 as values
Solution 1
You can use a collections.OrderedDict
to keep the keys in order. You'll note that pd.read_excel
loads the first sheet by default. Edit: then you say you want to encode the items in the dictionary, and evaluate 'None'
as None
...
import collections as co
import pandas as pd
df = pd.read_excel('file.xlsx')
df = df.where(pd.notnull(df), None)
od = co.OrderedDict((k.strip().encode('utf8'),v.strip().encode('utf8'))
for (k,v) in df.values)
Result:
>>> od
OrderedDict([(u'key1', u'str_value1'), (u'key2', u'str_value2'), (u'key3', u'None'), (u'key4', u'int_value3')])
General note: you should keep strings as Unicode within your Python program.
Solution 2
You can use pandas read_excel method to read the excel file more conveniently. You can pass a index_col argument where you can define which column of your xlsx is the index.
How to change NaN to None is explained in this question.
Given an xlsx file called example.xlsx which is build like you wrote above, the following code should give your expected results:
import pandas as pd
df = pd.read_excel("example.xlsx", index_col=0)
df = df.where(pd.notnull(df), None)
print df.to_dict()["dict_value"]
Related videos on Youtube
Anil_M
I love to solve problems, especially challenging one's , Help others and learn something new along the way. I believe SO is a great platform to learn and share knowledge.
Updated on June 13, 2022Comments
-
Anil_M almost 2 years
I am new to pandas. I need to read a
xlsx
file and convert first column to key of a dict and second column to values of a dict usingpandas
. I also need to skip / exclude first row which are headers.The answer here is for
pymysql
and here is forcsv
. I need to userpandas
.Here is a sample excel data
dict_key dict_value key1 str_value1 key2 str_value2 key3 None key4 int_value3
My code so far is as below.
import pandas as pd excel_file = "file.xlsx" xls = pd.ExcelFile(excel_file) df = xls.parse(xls.sheet_names[0], skiprows=1, index_col=None, na_values=['None']) data_dict = df.to_dict()
However, it gives me dict where keys are column numbers and values are both column1 data as well as column2 data.
>>> data_dict {u'Chg_Parms': {0: u' key1 ', 1: u' key2 ', 2: u' key3 ', 3: u' key4 ', 4: u' str_value1 ', 5: u' str_value2 ', 6: u' Nan ', 6: u' int_value3 '}}
what I would like to have is column1 data as key and column two data as values and also
NaN
replaced withNone
data_dict = {'key1': 'str_value1', 'key2': 'str_value2', 'key3': None, 'key4': int_value3}
Thanks for your help.
-
Anil_M about 7 years@ bernie Thanks for the ans. This is definitely towards what I need. However, how do I convert each key value to non unicode representation, strip white space and also maintain its type. for eg. str(u' 1') results in '1' and str(u'None') results in 'None'. I need
int
andboolean
values as it is. -
mechanical_meat about 7 years@Anil_M: you're very welcome. Please see edited answer.
-
mechanical_meat about 7 years
df = df.where(pd.notnull(df), None)
nice one, +1 -
Anil_M about 7 yearsI added .strip() next to encode('utf8') to take care of white space. I belive that answers my questions. thanks.
-
mechanical_meat about 7 years@Anil_M: anytime! Happy coding to you.