Filter dict to contain only certain keys?

475,724

Solution 1

Slightly more elegant dict comprehension:

foodict = {k: v for k, v in mydict.items() if k.startswith('foo')}

Solution 2

Here's an example in python 2.6:

>>> a = {1:1, 2:2, 3:3}
>>> dict((key,value) for key, value in a.iteritems() if key == 1)
{1: 1}

The filtering part is the if statement.

This method is slower than delnan's answer if you only want to select a few of very many keys.

Solution 3

You can do that with project function from my funcy library:

from funcy import project
small_dict = project(big_dict, keys)

Also take a look at select_keys.

Solution 4

This one liner lambda should work:

dictfilt = lambda x, y: dict([ (i,x[i]) for i in x if i in set(y) ])

Here's an example:

my_dict = {"a":1,"b":2,"c":3,"d":4}
wanted_keys = ("c","d")

# run it
In [10]: dictfilt(my_dict, wanted_keys)
Out[10]: {'c': 3, 'd': 4}

It's a basic list comprehension iterating over your dict keys (i in x) and outputs a list of tuple (key,value) pairs if the key lives in your desired key list (y). A dict() wraps the whole thing to output as a dict object.

Solution 5

Code 1:

dict = { key: key * 10 for key in range(0, 100) }
d1 = {}
for key, value in dict.items():
    if key % 2 == 0:
        d1[key] = value

Code 2:

dict = { key: key * 10 for key in range(0, 100) }
d2 = {key: value for key, value in dict.items() if key % 2 == 0}

Code 3:

dict = { key: key * 10 for key in range(0, 100) }
d3 = { key: dict[key] for key in dict.keys() if key % 2 == 0}

All pieced of code performance are measured with timeit using number=1000, and collected 1000 times for each piece of code.

enter image description here

For python 3.6 the performance of three ways of filter dict keys almost the same. For python 2.7 code 3 is slightly faster.

Share:
475,724
mpen
Author by

mpen

Updated on February 02, 2022

Comments

  • mpen
    mpen about 2 years

    I've got a dict that has a whole bunch of entries. I'm only interested in a select few of them. Is there an easy way to prune all the other ones out?

  • mpen
    mpen over 13 years
    except I'd probably use if key in ('x','y','z') I guess.
  • Admin
    Admin over 13 years
    Well, this is basically an eager version of the "tuple generator version" of my dict comprehension. Very compatible indeed, though generator expressions were introduced in 2.4, spring 2005 - seriously, is anyone still using this?
  • Kai
    Kai over 13 years
    I don't disagree; 2.3 really shouldn't exist anymore. However, as an outdated survey of 2.3 usage: moinmo.in/PollAboutRequiringPython24 Short version: RHEL4, SLES9, shipped with OS X 10.4
  • mpen
    mpen over 10 years
    You should allow keys to by any kind of iterable, like what set accepts.
  • Ryan Shea
    Ryan Shea over 10 years
    Ah, good call, thanks for pointing this out. I'll make that update.
  • mpen
    mpen over 10 years
    Should use a set for wanted_keys, but otherwise looks good.
  • Hart Simha
    Hart Simha almost 10 years
    Upvoted. I was thinking about adding an answer similar to this. Just out of curiosity though, why do {k:v for k,v in dict.items() ...} rather than {k:dict[k] for k in dict ...} Is there a performance difference?
  • Hart Simha
    Hart Simha almost 10 years
    Answered my own question. The {k:dict[k] for k in dict ...} is about 20-25% faster, at least in Python 2.7.6, with a dictionary of 26 items (timeit(..., setup="d = {chr(x+97):x+1 for x in range(26)}")), depending on how many items are being filtered out (filtering out consonant keys is faster than filtering out vowel keys because you're looking up fewer items). The difference in performance may very well become less significant as your dictionary size grows.
  • skatenerd
    skatenerd about 9 years
    I wonder if you are better off with two functions. If you asked 10 people "does invert imply that the keys argument is kept, or that the keys argument is rejected?", how many of them would agree?
  • Ryan Shea
    Ryan Shea about 9 years
    Updated. Let me know what you think.
  • jnnnnn
    jnnnnn over 8 years
    if you already know which keys you want, use delnan's answer. If you need to test each key with an if statement, use ransford's answer.
  • FaCoffee
    FaCoffee over 8 years
    This gives me a blank dictionary if my original dictionary contains lists in place of values. Any workarounds?
  • FaCoffee
    FaCoffee over 8 years
    This appears not to be working if the input dict has lists in place of values. In this case you get a void dict. Any workarounds?
  • Jim
    Jim over 8 years
    @Francesco, can you provide an example? If I run: dictfilt({'x':['wefwef',52],'y':['iuefiuef','efefij'],'z':['‌​oiejf','iejf']}, ('x','z')), it returns {'x': ['wefwef', 52], 'z': ['oiejf', 'iejf']} as intended.
  • FaCoffee
    FaCoffee over 8 years
    I tried this with: dict={'0':[1,3], '1':[0,2,4], '2':[1,4]} and the result was {}, which I assumed to be a blank dict.
  • Jim
    Jim over 8 years
    One thing, "dict" is a reserved word so you shouldn't use it to name a dict. What were the keys you were trying to pull out? If I run: foo = {'0':[1,3], '1':[0,2,4], '2':[1,4]}; dictfilt(foo,('0','2')), I get: {'0': [1, 3], '2': [1, 4]} which is the intended outcome
  • gae123
    gae123 about 8 years
    This solution has one more advantage. If the dictionary is returned from an expensive function call (i.e. a/old_dict is a function call) this solution calls the function only once. In an imperative environment storing the dictionary returned by the function in a variable is not a big deal but in a functional environment (e.g. in a lambda) this is key observation.
  • Pat
    Pat almost 8 years
    Would probably be the same perf if you used mydict.iteritems() instead. .items() creates another list.
  • user5359531
    user5359531 over 6 years
    just curious, did you make that plot from Python?
  • Moberg
    Moberg about 6 years
    You could also do {k: old_dict.get(k, default) for k in ...}
  • keithpjolley
    keithpjolley over 5 years
    ggplot2 in R - part of tidyverse
  • CMCDragonkai
    CMCDragonkai about 5 years
    Wrap filtered in dict and you get back the dictionary!
  • mpen
    mpen almost 5 years
    Neat. Only works in Python 3. Python 2 says "TypeError: unsupported operand type(s) for -: 'list' and 'set'"
  • this.srivastava
    this.srivastava almost 5 years
    Added set(d.keys()) for Python 2. This is working when I run.
  • linguist_at_large
    linguist_at_large almost 3 years
    This method is the easiest one to parse and understand for newbies like me.
  • Valentin Waeselynck
    Valentin Waeselynck over 2 years
    For efficiency, I'd recommend filtering keys using a set: if k in {'a','c'} rather than if k in ['a','c'].
  • Underoos
    Underoos over 2 years
    better not use dict as it's builtin in python.
  • Mr_and_Mrs_D
    Mr_and_Mrs_D over 2 years
    Can you profile what is faster? Note you don't need to create the list: stackoverflow.com/a/36763172/281545
  • Mr_and_Mrs_D
    Mr_and_Mrs_D over 2 years
    Can you profile what is faster in case you don't need to create a copy (so creating the dict vs for key in e_keys: del your_dict[key])?
  • Black Thunder
    Black Thunder over 2 years
    @Mr_and_Mrs_D Not creating a copy would be definitely faster. About 15% faster
  • Mr_and_Mrs_D
    Mr_and_Mrs_D over 2 years
    Turns out dict comprehension is indeed slower but greatly depends on size of filtered keys - see: stackoverflow.com/a/69973383/281545
  • mpen
    mpen over 2 years
    drop_keys isn't a fair comparison. Question is more akin to keep_keys. We know which keys we want, not which ones we don't want.
  • Mr_and_Mrs_D
    Mr_and_Mrs_D over 2 years
    Thanks @mpen - indeed if we try to calculate drop_keys this slows down a lot pop/del methods. Will post some timings for that
  • Mr_and_Mrs_D
    Mr_and_Mrs_D over 2 years
    There @mpen - seems del beats the dict comprehension even if I calculate drop_keys (I assumed keep keys are sets for O(1) k in keep_keys). Probably this means that creating a dict with 500 entries is a bit slower than creating a list with 500 elements :P