Filter dict to contain only certain keys?

python dictionary

475,724

Solution 1

Slightly more elegant dict comprehension:

foodict = {k: v for k, v in mydict.items() if k.startswith('foo')}

Solution 2

Here's an example in python 2.6:

>>> a = {1:1, 2:2, 3:3}
>>> dict((key,value) for key, value in a.iteritems() if key == 1)
{1: 1}

The filtering part is the if statement.

This method is slower than delnan's answer if you only want to select a few of very many keys.

Solution 3

You can do that with project function from my funcy library:

from funcy import project
small_dict = project(big_dict, keys)

Also take a look at select_keys.

Solution 4

This one liner lambda should work:

dictfilt = lambda x, y: dict([ (i,x[i]) for i in x if i in set(y) ])

Here's an example:

my_dict = {"a":1,"b":2,"c":3,"d":4}
wanted_keys = ("c","d")

# run it
In [10]: dictfilt(my_dict, wanted_keys)
Out[10]: {'c': 3, 'd': 4}

It's a basic list comprehension iterating over your dict keys (i in x) and outputs a list of tuple (key,value) pairs if the key lives in your desired key list (y). A dict() wraps the whole thing to output as a dict object.

Solution 5

Code 1:

dict = { key: key * 10 for key in range(0, 100) }
d1 = {}
for key, value in dict.items():
    if key % 2 == 0:
        d1[key] = value

Code 2:

dict = { key: key * 10 for key in range(0, 100) }
d2 = {key: value for key, value in dict.items() if key % 2 == 0}

Code 3:

dict = { key: key * 10 for key in range(0, 100) }
d3 = { key: dict[key] for key in dict.keys() if key % 2 == 0}

All pieced of code performance are measured with timeit using number=1000, and collected 1000 times for each piece of code.

For python 3.6 the performance of three ways of filter dict keys almost the same. For python 2.7 code 3 is slightly faster.

View more solutions

475,724

Author by

mpen

Updated on February 02, 2022

Comments

mpen about 2 years

I've got a dict that has a whole bunch of entries. I'm only interested in a select few of them. Is there an easy way to prune all the other ones out?
mpen over 13 years

except I'd probably use if key in ('x','y','z') I guess.
Admin over 13 years

Well, this is basically an eager version of the "tuple generator version" of my dict comprehension. Very compatible indeed, though generator expressions were introduced in 2.4, spring 2005 - seriously, is anyone still using this?
Kai over 13 years

I don't disagree; 2.3 really shouldn't exist anymore. However, as an outdated survey of 2.3 usage: moinmo.in/PollAboutRequiringPython24 Short version: RHEL4, SLES9, shipped with OS X 10.4
mpen over 10 years

You should allow keys to by any kind of iterable, like what set accepts.
Ryan Shea over 10 years

Ah, good call, thanks for pointing this out. I'll make that update.
mpen over 10 years

Should use a set for wanted_keys, but otherwise looks good.
Hart Simha almost 10 years

Upvoted. I was thinking about adding an answer similar to this. Just out of curiosity though, why do {k:v for k,v in dict.items() ...} rather than {k:dict[k] for k in dict ...} Is there a performance difference?
Hart Simha almost 10 years

Answered my own question. The {k:dict[k] for k in dict ...} is about 20-25% faster, at least in Python 2.7.6, with a dictionary of 26 items (timeit(..., setup="d = {chr(x+97):x+1 for x in range(26)}")), depending on how many items are being filtered out (filtering out consonant keys is faster than filtering out vowel keys because you're looking up fewer items). The difference in performance may very well become less significant as your dictionary size grows.
skatenerd about 9 years

I wonder if you are better off with two functions. If you asked 10 people "does invert imply that the keys argument is kept, or that the keys argument is rejected?", how many of them would agree?
Ryan Shea about 9 years

Updated. Let me know what you think.
jnnnnn over 8 years

if you already know which keys you want, use delnan's answer. If you need to test each key with an if statement, use ransford's answer.
FaCoffee over 8 years

This gives me a blank dictionary if my original dictionary contains lists in place of values. Any workarounds?
FaCoffee over 8 years

This appears not to be working if the input dict has lists in place of values. In this case you get a void dict. Any workarounds?
Jim over 8 years

@Francesco, can you provide an example? If I run: dictfilt({'x':['wefwef',52],'y':['iuefiuef','efefij'],'z':['‌oiejf','iejf']}, ('x','z')), it returns {'x': ['wefwef', 52], 'z': ['oiejf', 'iejf']} as intended.
FaCoffee over 8 years

I tried this with: dict={'0':[1,3], '1':[0,2,4], '2':[1,4]} and the result was {}, which I assumed to be a blank dict.
Jim over 8 years

One thing, "dict" is a reserved word so you shouldn't use it to name a dict. What were the keys you were trying to pull out? If I run: foo = {'0':[1,3], '1':[0,2,4], '2':[1,4]}; dictfilt(foo,('0','2')), I get: {'0': [1, 3], '2': [1, 4]} which is the intended outcome
gae123 about 8 years

This solution has one more advantage. If the dictionary is returned from an expensive function call (i.e. a/old_dict is a function call) this solution calls the function only once. In an imperative environment storing the dictionary returned by the function in a variable is not a big deal but in a functional environment (e.g. in a lambda) this is key observation.
Pat almost 8 years

Would probably be the same perf if you used mydict.iteritems() instead. .items() creates another list.
user5359531 over 6 years

just curious, did you make that plot from Python?
Moberg about 6 years

You could also do {k: old_dict.get(k, default) for k in ...}
keithpjolley over 5 years

ggplot2 in R - part of tidyverse
CMCDragonkai about 5 years

Wrap filtered in dict and you get back the dictionary!
mpen almost 5 years

Neat. Only works in Python 3. Python 2 says "TypeError: unsupported operand type(s) for -: 'list' and 'set'"
this.srivastava almost 5 years

Added set(d.keys()) for Python 2. This is working when I run.
linguist_at_large almost 3 years

This method is the easiest one to parse and understand for newbies like me.
Valentin Waeselynck over 2 years

For efficiency, I'd recommend filtering keys using a set: if k in {'a','c'} rather than if k in ['a','c'].
Underoos over 2 years

better not use dict as it's builtin in python.
Mr_and_Mrs_D over 2 years

Can you profile what is faster? Note you don't need to create the list: stackoverflow.com/a/36763172/281545
Mr_and_Mrs_D over 2 years

Can you profile what is faster in case you don't need to create a copy (so creating the dict vs for key in e_keys: del your_dict[key])?
Black Thunder over 2 years

@Mr_and_Mrs_D Not creating a copy would be definitely faster. About 15% faster
Mr_and_Mrs_D over 2 years

Turns out dict comprehension is indeed slower but greatly depends on size of filtered keys - see: stackoverflow.com/a/69973383/281545
mpen over 2 years

drop_keys isn't a fair comparison. Question is more akin to keep_keys. We know which keys we want, not which ones we don't want.
Mr_and_Mrs_D over 2 years

Thanks @mpen - indeed if we try to calculate drop_keys this slows down a lot pop/del methods. Will post some timings for that
Mr_and_Mrs_D over 2 years

There @mpen - seems del beats the dict comprehension even if I calculate drop_keys (I assumed keep keys are sets for O(1) k in keep_keys). Probably this means that creating a dict with 500 entries is a bit slower than creating a list with 500 elements :P