How to convert a dataframe to a dictionary
Solution 1
If lakes
is your DataFrame
, you can do something like
area_dict = dict(zip(lakes.id, lakes.value))
Solution 2
See the docs for to_dict
. You can use it like this:
df.set_index('id').to_dict()
And if you have only one column, to avoid the column name is also a level in the dict (actually, in this case you use the Series.to_dict()
):
df.set_index('id')['value'].to_dict()
Solution 3
mydict = dict(zip(df.id, df.value))
Solution 4
If you want a simple way to preserve duplicates, you could use groupby
:
>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value'])
>>> ptest
id value
0 a 1
1 a 2
2 b 3
>>> {k: g["value"].tolist() for k,g in ptest.groupby("id")}
{'a': [1, 2], 'b': [3]}
Solution 5
The answers by joris in this thread and by punchagan in the duplicated thread are very elegant, however they will not give correct results if the column used for the keys contains any duplicated value.
For example:
>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value'])
>>> ptest
id value
0 a 1
1 a 2
2 b 3
# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}
If you have duplicated entries and do not want to lose them, you can use this ugly but working code:
>>> mydict = {}
>>> for x in range(len(ptest)):
... currentid = ptest.iloc[x,0]
... currentvalue = ptest.iloc[x,1]
... mydict.setdefault(currentid, [])
... mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}
Related videos on Youtube
perigee
Updated on March 22, 2022Comments
-
perigee about 2 years
I have a dataframe with two columns and intend to convert it to a dictionary. The first column will be the key and the second will be the value.
Dataframe:
id value 0 0 10.2 1 1 5.7 2 2 7.4
How can I do this?
-
dalloliogm almost 10 yearsNote that this command will lose data if there redundant values in the ID columns:
>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) >>> ptest.set_index('id')['value'].to_dict()
-
Midnighter almost 10 yearsExcuse the formatting due to the lack of a block in comments:
mydict = defaultdict(list)\n for (key, val) in ptest[["id", "value"]].itertuples(index=False):\n mydict[key].append(val)
-
dalloliogm almost 10 yearsNice and elegant solution, but on a 50k rows table, it is about 6 times slower than my ugly solution below.
-
DSM almost 10 years@dalloliogm: could you give an example table that happens for? If it's six times slower than a Python loop, there might be a performance bug in pandas.
-
jezrael over 8 yearsIn version 0.17.1 get error:
TypeError: zip argument #2 must support iteration
-
jezrael over 8 yearsSolution:
area_dict = dict(zip(lakes['id'], lakes['value']))
-
Hardik Gupta over 7 yearsI tried this but getting this error
TypeError: zip argument #1 must support iteration
-
evanlivingston about 7 yearsI have to say, there is nothing in that docs link that would have given me the answer to this question.
-
aLbAc about 6 yearsNote: in case the index is the desired dictionary key, then do: dict(zip(df.index,df.value))
-
Michael D almost 6 yearsthere is no 'records' column in given example. Also in such case the index will be the key, which not what we want to.
-
tda over 5 yearsLooping with pandas isn't the most efficient in terms of memory usage. See: engineering.upside.com/…
-
Zheng Liu over 5 years@MichaelD 'records' is not a column. It's an option for the argument
orient
. -
jesseaam about 5 yearsWhat if you wanted more than one column to be the in the dictionary values? I am thinking something like
area_dict = dict(zip(lakes.area, (lakes.count, lakes.other_column)))
. How would you make this happen? -
pnv about 5 yearsIf the second argument has multiple values, this won't work.
-
Roei Bahumi over 3 yearsThis will actually output a list of dictionaries in the following format: [{'area': 10, 'count': 7}, {'area': 20, 'count': 5}...] instead of a key->value dict.
-
Azurespot almost 3 yearsAgree, it did not work for me. But how can you do
df.id
, the column nameid
is not recognized as a data frame variable, right? As in, a variable written into the data frame object library. I must be misunderstanding something. -
Simas Joneliunas over 2 yearsHi, it would be great if you could help us to understand what your code does and how it solves the OP's problem!
-
TylerH about 2 yearsWhat is a "fast dual-core laptop"? That line would be better removed or replaced with a specific laptop and CPU model. Let us decide for ourselves if it is "fast".
-
TylerH about 2 yearsThis just repeats an existing answer by AnandSin from 2018.
-
Adriaan about 2 yearsPlease read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.
-
Adriaan about 2 yearsPlease read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.
-
Adriaan about 2 yearsPlease read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.
-
Adriaan about 2 yearsPlease read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.