How to convert a dataframe to a dictionary

python dictionary pandas dataframe

229,259

Solution 1

If lakes is your DataFrame, you can do something like

area_dict = dict(zip(lakes.id, lakes.value))

Solution 2

See the docs for to_dict. You can use it like this:

df.set_index('id').to_dict()

And if you have only one column, to avoid the column name is also a level in the dict (actually, in this case you use the Series.to_dict()):

df.set_index('id')['value'].to_dict()

Solution 3

mydict = dict(zip(df.id, df.value))

Solution 4

If you want a simple way to preserve duplicates, you could use groupby:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3
>>> {k: g["value"].tolist() for k,g in ptest.groupby("id")}
{'a': [1, 2], 'b': [3]}

Solution 5

The answers by joris in this thread and by punchagan in the duplicated thread are very elegant, however they will not give correct results if the column used for the keys contains any duplicated value.

For example:

>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}

If you have duplicated entries and do not want to lose them, you can use this ugly but working code:

>>> mydict = {}
>>> for x in range(len(ptest)):
...     currentid = ptest.iloc[x,0]
...     currentvalue = ptest.iloc[x,1]
...     mydict.setdefault(currentid, [])
...     mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}

View more solutions

229,259

perigee

Updated on March 22, 2022

Comments

perigee about 2 years
I have a dataframe with two columns and intend to convert it to a dictionary. The first column will be the key and the second will be the value.

Dataframe:
```
    id    value
0    0     10.2
1    1      5.7
2    2      7.4
```
How can I do this?
dalloliogm almost 10 years

Note that this command will lose data if there redundant values in the ID columns: >>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) >>> ptest.set_index('id')['value'].to_dict()
Midnighter almost 10 years

Excuse the formatting due to the lack of a block in comments: mydict = defaultdict(list)\n for (key, val) in ptest[["id", "value"]].itertuples(index=False):\n mydict[key].append(val)
dalloliogm almost 10 years

Nice and elegant solution, but on a 50k rows table, it is about 6 times slower than my ugly solution below.
DSM almost 10 years

@dalloliogm: could you give an example table that happens for? If it's six times slower than a Python loop, there might be a performance bug in pandas.
jezrael over 8 years

In version 0.17.1 get error: TypeError: zip argument #2 must support iteration
jezrael over 8 years

Solution: area_dict = dict(zip(lakes['id'], lakes['value']))
Hardik Gupta over 7 years

I tried this but getting this error TypeError: zip argument #1 must support iteration
evanlivingston about 7 years

I have to say, there is nothing in that docs link that would have given me the answer to this question.
aLbAc about 6 years

Note: in case the index is the desired dictionary key, then do: dict(zip(df.index,df.value))
Michael D almost 6 years

there is no 'records' column in given example. Also in such case the index will be the key, which not what we want to.
tda over 5 years

Looping with pandas isn't the most efficient in terms of memory usage. See: engineering.upside.com/…
Zheng Liu over 5 years

@MichaelD 'records' is not a column. It's an option for the argument orient.
jesseaam about 5 years

What if you wanted more than one column to be the in the dictionary values? I am thinking something like area_dict = dict(zip(lakes.area, (lakes.count, lakes.other_column))). How would you make this happen?
pnv about 5 years

If the second argument has multiple values, this won't work.
Roei Bahumi over 3 years

This will actually output a list of dictionaries in the following format: [{'area': 10, 'count': 7}, {'area': 20, 'count': 5}...] instead of a key->value dict.
Azurespot almost 3 years

Agree, it did not work for me. But how can you do df.id, the column name id is not recognized as a data frame variable, right? As in, a variable written into the data frame object library. I must be misunderstanding something.
Simas Joneliunas over 2 years

Hi, it would be great if you could help us to understand what your code does and how it solves the OP's problem!
TylerH about 2 years

What is a "fast dual-core laptop"? That line would be better removed or replaced with a specific laptop and CPU model. Let us decide for ourselves if it is "fast".
TylerH about 2 years

This just repeats an existing answer by AnandSin from 2018.
Adriaan about 2 years

Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.
Adriaan about 2 years

Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.
Adriaan about 2 years

Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.
Adriaan about 2 years

Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.