How to convert a dataframe to a dictionary

229,259

Solution 1

If lakes is your DataFrame, you can do something like

area_dict = dict(zip(lakes.id, lakes.value))

Solution 2

See the docs for to_dict. You can use it like this:

df.set_index('id').to_dict()

And if you have only one column, to avoid the column name is also a level in the dict (actually, in this case you use the Series.to_dict()):

df.set_index('id')['value'].to_dict()

Solution 3

mydict = dict(zip(df.id, df.value))

Solution 4

If you want a simple way to preserve duplicates, you could use groupby:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3
>>> {k: g["value"].tolist() for k,g in ptest.groupby("id")}
{'a': [1, 2], 'b': [3]}

Solution 5

The answers by joris in this thread and by punchagan in the duplicated thread are very elegant, however they will not give correct results if the column used for the keys contains any duplicated value.

For example:

>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}

If you have duplicated entries and do not want to lose them, you can use this ugly but working code:

>>> mydict = {}
>>> for x in range(len(ptest)):
...     currentid = ptest.iloc[x,0]
...     currentvalue = ptest.iloc[x,1]
...     mydict.setdefault(currentid, [])
...     mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}
Share:
229,259

Related videos on Youtube

perigee
Author by

perigee

Updated on March 22, 2022

Comments

  • perigee
    perigee about 2 years

    I have a dataframe with two columns and intend to convert it to a dictionary. The first column will be the key and the second will be the value.

    Dataframe:

        id    value
    0    0     10.2
    1    1      5.7
    2    2      7.4
    

    How can I do this?

  • dalloliogm
    dalloliogm almost 10 years
    Note that this command will lose data if there redundant values in the ID columns: >>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) >>> ptest.set_index('id')['value'].to_dict()
  • Midnighter
    Midnighter almost 10 years
    Excuse the formatting due to the lack of a block in comments: mydict = defaultdict(list)\n for (key, val) in ptest[["id", "value"]].itertuples(index=False):\n mydict[key].append(val)
  • dalloliogm
    dalloliogm almost 10 years
    Nice and elegant solution, but on a 50k rows table, it is about 6 times slower than my ugly solution below.
  • DSM
    DSM almost 10 years
    @dalloliogm: could you give an example table that happens for? If it's six times slower than a Python loop, there might be a performance bug in pandas.
  • jezrael
    jezrael over 8 years
    In version 0.17.1 get error: TypeError: zip argument #2 must support iteration
  • jezrael
    jezrael over 8 years
    Solution: area_dict = dict(zip(lakes['id'], lakes['value']))
  • Hardik Gupta
    Hardik Gupta over 7 years
    I tried this but getting this error TypeError: zip argument #1 must support iteration
  • evanlivingston
    evanlivingston about 7 years
    I have to say, there is nothing in that docs link that would have given me the answer to this question.
  • aLbAc
    aLbAc about 6 years
    Note: in case the index is the desired dictionary key, then do: dict(zip(df.index,df.value))
  • Michael D
    Michael D almost 6 years
    there is no 'records' column in given example. Also in such case the index will be the key, which not what we want to.
  • tda
    tda over 5 years
    Looping with pandas isn't the most efficient in terms of memory usage. See: engineering.upside.com/…
  • Zheng Liu
    Zheng Liu over 5 years
    @MichaelD 'records' is not a column. It's an option for the argument orient.
  • jesseaam
    jesseaam about 5 years
    What if you wanted more than one column to be the in the dictionary values? I am thinking something like area_dict = dict(zip(lakes.area, (lakes.count, lakes.other_column))). How would you make this happen?
  • pnv
    pnv about 5 years
    If the second argument has multiple values, this won't work.
  • Roei Bahumi
    Roei Bahumi over 3 years
    This will actually output a list of dictionaries in the following format: [{'area': 10, 'count': 7}, {'area': 20, 'count': 5}...] instead of a key->value dict.
  • Azurespot
    Azurespot almost 3 years
    Agree, it did not work for me. But how can you do df.id, the column name id is not recognized as a data frame variable, right? As in, a variable written into the data frame object library. I must be misunderstanding something.
  • Simas Joneliunas
    Simas Joneliunas over 2 years
    Hi, it would be great if you could help us to understand what your code does and how it solves the OP's problem!
  • TylerH
    TylerH about 2 years
    What is a "fast dual-core laptop"? That line would be better removed or replaced with a specific laptop and CPU model. Let us decide for ourselves if it is "fast".
  • TylerH
    TylerH about 2 years
    This just repeats an existing answer by AnandSin from 2018.
  • Adriaan
    Adriaan about 2 years
    Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.
  • Adriaan
    Adriaan about 2 years
    Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.
  • Adriaan
    Adriaan about 2 years
    Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.
  • Adriaan
    Adriaan about 2 years
    Please read How to Answer and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please edit the answer to include an explanation as to why it works.