python dict: get vs setdefault

51,316

Solution 1

Your two examples do the same thing, but that doesn't mean get and setdefault do.

The difference between the two is basically manually setting d[key] to point to the list every time, versus setdefault automatically setting d[key] to the list only when it's unset.

Making the two methods as similar as possible, I ran

from timeit import timeit

print timeit("c = d.get(0, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("c = d.get(1, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(0, []).extend([1])", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(1, []).extend([1])", "d = {1: []}", number = 1000000)

and got

0.794723378711
0.811882272256
0.724429205999
0.722129751973

So setdefault is around 10% faster than get for this purpose.

The get method allows you to do less than you can with setdefault. You can use it to avoid getting a KeyError when the key doesn't exist (if that's something that's going to happen frequently) even if you don't want to set the key.

See Use cases for the 'setdefault' dict method and dict.get() method returns a pointer for some more info about the two methods.

The thread about setdefault concludes that most of the time, you want to use a defaultdict. The thread about get concludes that it is slow, and often you're better off (speed wise) doing a double lookup, using a defaultdict, or handling the error (depending on the size of the dictionary and your use case).

Solution 2

The accepted answer from agf isn't comparing like with like. After:

print timeit("d[0] = d.get(0, []) + [1]", "d = {1: []}", number = 10000)

d[0] contains a list with 10,000 items whereas after:

print timeit("d.setdefault(0, []) + [1]", "d = {1: []}", number = 10000)

d[0] is simply []. i.e. the d.setdefault version never modifies the list stored in d. The code should actually be:

print timeit("d.setdefault(0, []).append(1)", "d = {1: []}", number = 10000)

and in fact is faster than the faulty setdefault example.

The difference here really is because of when you append using concatenation the whole list is copied every time (and once you have 10,000 elements that is beginning to become measurable. Using append the list updates are amortised O(1), i.e. effectively constant time.

Finally, there are two other options not considered in the original question: defaultdict or simply testing the dictionary to see whether it already contains the key.

So, assuming d3, d4 = defaultdict(list), {}

# variant 1 (0.39)
d1[key] = d1.get(key, []) + [val]
# variant 2 (0.003)
d2.setdefault(key, []).append(val)
# variant 3 (0.0017)
d3[key].append(val)
# variant 4 (0.002)
if key in d4:
    d4[key].append(val)
else:
    d4[key] = [val]

variant 1 is by far the slowest because it copies the list every time, variant 2 is the second slowest, variant 3 is the fastest but won't work if you need Python older than 2.5, and variant 4 is just slightly slower than variant 3.

I would say use variant 3 if you can, with variant 4 as an option for those occasional places where defaultdict isn't an exact fit. Avoid both of your original variants.

Solution 3

For those who are still struggling in understanding these two term, let me tell you basic difference between get() and setdefault() method -

Scenario-1

root = {}
root.setdefault('A', [])
print(root)

Scenario-2

root = {}
root.get('A', [])
print(root)

In Scenario-1 output will be {'A': []} while in Scenario-2 {}

So setdefault() sets absent keys in the dict while get() only provides you default value but it does not modify the dictionary.

Now let come where this will be useful- Suppose you are searching an element in a dict whose value is a list and you want to modify that list if found otherwise create a new key with that list.

using setdefault()

def fn1(dic, key, lst):
    dic.setdefault(key, []).extend(lst)

using get()

def fn2(dic, key, lst):
    dic[key] = dic.get(key, []) + (lst) #Explicit assigning happening here

Now lets examine timings -

dic = {}
%%timeit -n 10000 -r 4
fn1(dic, 'A', [1,2,3])

Took 288 ns

dic = {}
%%timeit -n 10000 -r 4
fn2(dic, 'A', [1,2,3])

Took 128 s

So there is a very large timing difference between these two approaches.

Solution 4

You might want to look at defaultdict in the collections module. The following is equivalent to your examples.

from collections import defaultdict

data = [('a', 1), ('b', 1), ('b', 2)]

d = defaultdict(list)

for k, v in data:
    d[k].append(v)

There's more here.

Solution 5

1. Explained with a good example here:
http://code.activestate.com/recipes/66516-add-an-entry-to-a-dictionary-unless-the-entry-is-a/

dict.setdefault typical usage
somedict.setdefault(somekey,[]).append(somevalue)

dict.get typical usage
theIndex[word] = 1 + theIndex.get(word,0)


2. More explanation : http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html

dict.setdefault() is equivalent to get or set & get. Or set if necessary then get. It's especially efficient if your dictionary key is expensive to compute or long to type.

The only problem with dict.setdefault() is that the default value is always evaluated, whether needed or not. That only matters if the default value is expensive to compute. In that case, use defaultdict.


3. Finally the official docs with difference highlighted http://docs.python.org/2/library/stdtypes.html

get(key[, default])
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.

setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

Share:
51,316
Cerno
Author by

Cerno

Updated on July 05, 2022

Comments

  • Cerno
    Cerno almost 2 years

    The following two expressions seem equivalent to me. Which one is preferable?

    data = [('a', 1), ('b', 1), ('b', 2)]
    
    d1 = {}
    d2 = {}
    
    for key, val in data:
        # variant 1)
        d1[key] = d1.get(key, []) + [val]
        # variant 2)
        d2.setdefault(key, []).append(val)
    

    The results are the same but which version is better or rather more pythonic?

    Personally I find version 2 harder to understand, as to me setdefault is very tricky to grasp. If I understand correctly, it looks for the value of "key" in the dictionary, if not available, enters "[]" into the dict, returns a reference to either the value or "[]" and appends "val" to that reference. While certainly smooth it is not intuitive in the least (at least to me).

    To my mind, version 1 is easier to understand (if available, get the value for "key", if not, get "[]", then join with a list made up from [val] and place the result in "key"). But while more intuitive to understand, I fear this version is less performant, with all this list creating. Another disadvantage is that "d1" occurs twice in the expression which is rather error-prone. Probably there is a better implementation using get, but presently it eludes me.

    My guess is that version 2, although more difficult to grasp for the inexperienced, is faster and therefore preferable. Opinions?

  • Cerno
    Cerno over 12 years
    I am aware of that, nevertheless I would like to know which of the original versions is preferable. Or would you say that using defaultdict is always the way to go? I try to avoid additional imports as best I can, therefore I normally go with the traditional dict. But probably that is a folly?
  • agf
    agf over 12 years
    There are use cases for both get and setdefault that defaultdict doesn't cover.
  • user1066101
    user1066101 over 12 years
    "I try to avoid additional imports as best I can". Please stop doing that. It's a very bad policy and leads to needless complex programs.
  • Cerno
    Cerno over 12 years
    Thanks for the advice. I have some trouble deciding when it's fair to use an import because it's widely accepted and when to avoid it because it might lead to downward compatibility problems. Or should I stop worrying and assume everyone is using 2.7 and roll with it?
  • senderle
    senderle over 12 years
    @Cerno, defaultdict has been around since 2.5 though -- not a lot of people are stuck at 2.4 these days. Overall I think this is the best answer -- defaultdict makes for readable code and it's implemented in c, so it's sure to be fast.
  • Cerno
    Cerno over 12 years
    Thanks, that's some good advice! One question to agf though: You state that dict.get is fast but the time you measured for my example shows that using setdefault is a lot faster still. So I guess all in all defaultdict is the way to go in the end.
  • Duncan
    Duncan over 12 years
    Note that your setdefault version never actually stores the constructed list back in the dictionary, so you are comparing code that constructs a 10,000 element list by addition with code that constructs 10,000 1 element lists and throws them away. See my answer for more on this.
  • agf
    agf over 12 years
    You're right, my examples were wrong, but luckily they didn't point in the wrong direction. I've fixed them a bit.
  • agf
    agf over 12 years
    @Cerno Yeah that was a typo / brainfail. I meant defaultdict rather than dict.get. If it works for your use case, it'll be fastest.
  • Matthew Muller
    Matthew Muller over 2 years
    This answer is misleading due to the drastically different presentation of each approach. There is functionally no difference between get and setdefault in the examples you provided, you just negated the if statement.