Fastest way to update a dictionary in python

17,374

Solution 1

Just add items to the dictionary without checking for their existence. I added 100,000 items to a dictionary using 3 different methods and timed it with the timeit module.

  1. if k not in d: d[k] = v
  2. d.setdefault(k, v)
  3. d[k] = v

Option 3 was the fastest, but not by much.

[ Actually, I also tried if k not in d.keys(): d[k] = v, but that was slower by a factor of 300 (each iteration built a list of keys and performed a linear search). It made my tests so slow that I left it out here. ]

Here's my code:

import timeit

setup = """
import random
random.seed(0)
item_count = 100000
# divide key range by 5 to ensure lots of duplicates 
items = [(random.randint(0, item_count/5), 0) for i in xrange(item_count)]
"""
in_dict = """
d = {}
for k, v in items:
    if k not in d:
        d[k] = v
"""
set_default = """
d = {}
for k, v in items:
    d.setdefault(k, v)
"""
straight_add = """
d = {}
for k, v in items:
    d[k] = v
"""
print 'in_dict      ', timeit.Timer(in_dict, setup).timeit(1000)
print 'set_default  ', timeit.Timer(set_default, setup).timeit(1000)
print 'straight_add ', timeit.Timer(straight_add, setup).timeit(1000)

And the results:

in_dict       13.090878085
set_default   21.1309413091
straight_add  11.4781760635

Note: This is all pretty pointless. We get many questions daily about what's the fastest way to do x or y in Python. In most cases, it is clear that the question was being asked before any performance issues were encountered. My advice? Focus on writing the clearest program you can write and if it's too slow, profile it and optimize where needed. In my experience, I almost never get to to profile and optimize step. From the description of the problem, it seems as if dictionary storage will not be the major bottle-neck in your program.

Solution 2

Using the built-in update() function is even faster. I tweaked Steven Rumbalski's example above a bit and it shows how update() is the fastest. There are at least two ways to use it (with a list of tuples or with another dictionary). The former (shown below as update_method1) is the fastest. Note that I also changed a couple of other things about Steven Rumbalski's example. My dictionaries will each have exactly 100,000 keys but the new values have a 10% chance of not needing to be updated. This chance of redundancy will depend on the nature of the data that you're updating your dictionary with. In all cases on my machine, my update_method1 was the fastest.

import timeit

setup = """
import random
random.seed(0)
item_count = 100000
existing_dict = dict([(str(i), random.randint(1, 10)) for i in xrange(item_count)])
items = [(str(i), random.randint(1, 10)) for i in xrange(item_count)]
items_dict = dict(items)
"""
in_dict = """
for k, v in items:
    if k not in existing_dict:
        existing_dict[k] = v
"""
set_default = """
for k, v in items:
    existing_dict.setdefault(k, v)
"""
straight_add = """
for k, v in items:
    existing_dict[k] = v
"""
update_method1 = """
existing_dict.update(items)
"""
update_method2 = """
existing_dict.update(items_dict)
"""
print 'in_dict        ', timeit.Timer(in_dict, setup).timeit(1000)
print 'set_default    ', timeit.Timer(set_default, setup).timeit(1000)
print 'straight_add   ', timeit.Timer(straight_add, setup).timeit(1000)
print 'update_method1 ', timeit.Timer(update_method1, setup).timeit(1000)
print 'update_method2 ', timeit.Timer(update_method2, setup).timeit(1000)

This code resulted in the following results:

in_dict         10.6597309113
set_default     19.3389420509
straight_add    11.5891621113
update_method1  7.52693581581
update_method2  9.10132408142

Solution 3

if foo not in A.keys(): 
    A[foo] = x 

is very slow, because A.keys() creates a list, which has to be parsed in O(N).

if foo not in A: 
    A[foo] = x 

is faster, because it takes O(1) to check, whether foo exists in A.

A[foo] = x 

is even better, because you already have the object x and you just add (if it already does not exist) a pointer to it to A.

Solution 4

If you "know" that A[foo] "should be" equal to x, then I would just do:

assert(A[foo]==x)

which will tell you if your assumption is wrong!

Solution 5

There are certainly faster ways than your first example. But I suspect the straight update will be faster than any test.

Share:
17,374
amg
Author by

amg

I am here to learn.

Updated on June 04, 2022

Comments

  • amg
    amg almost 2 years

    I have a dictionary A, and a possible entry foo. I know that A[foo] should be equal to x, but I don't know if A[foo] has been already defined. In any case if A[foo] has been defined it means that it already has the correct value.

    It is faster to execute:

    if foo not in A.keys(): 
       A[foo]=x 
    

    or simply update

    A[foo]=x 
    

    because by the time the computer has found the foo entry, it can as well update it. While if not I would have to call the hash table two times?

    Thanks.