Python: find closest key in a dictionary from the given input key

24,469

Solution 1

here's your function on one line:

data.get(num, data[min(data.keys(), key=lambda k: abs(k-num))])

edit: to not evaluate the min when the key is in the dict use:

data[num] if num in data else data[min(data.keys(), key=lambda k: abs(k-num))]

or if all values in data evaluate to True you can use:

data.get(num) or data[min(data.keys(), key=lambda k: abs(k-num))]

Solution 2

This issue is made a lot harder by dict keys being in no particular order. If you can play with how you make the dict so they are in order (like your example) and use python >= 2.7 you can use OrderedDict and bisect to make this lightning fast.

import collections
a = collections.OrderedDict()
for i in range(100):
    a[i] = i
import bisect
ind = bisect.bisect_left(a.keys(), 45.3)

Then you only have to check element ind and ind-1 to see which is closer, thus making a lot fewer calculations.


As pointed out below by Steven G, in Python3 the .keys() is not just a list and must be changed into one.

bisect.bisect_left(list(a.keys()), 45.3)

Solution 3

Rather than using OrderedDict and bisect, consider the SortedDict type in the sortedcontainers module. It's a pure-Python and fast-as-C implementation of sorted list, sorted dict, and sorted set types with 100% testing coverage and hours of stress.

With a SortedDict you can bisect for the desired key. For example:

from itertools import islice
from sortedcontainers import SortedDict
def closest(sorted_dict, key):
    "Return closest key in `sorted_dict` to given `key`."
    assert len(sorted_dict) > 0
    keys = list(islice(sorted_dict.irange(minimum=key), 1))
    keys.extend(islice(sorted_dict.irange(maximum=key, reverse=True), 1))
    return min(keys, key=lambda k: abs(key - k))

The closest function uses SortedDict.irange to create an iterator of keys nearest the given key. The keys are bisected with log(N) runtime complexity.

>>> sd = SortedDict({-3: 'a', 0: 'b', 2: 'c'})
>>> for num in range(-5, 5):
...     key = closest(sd, num)
...     print('Given', num, ', closest:', key)
Given -5 , closest: -3
Given -4 , closest: -3
Given -3 , closest: -3
Given -2 , closest: -3
Given -1 , closest: 0
Given 0 , closest: 0
Given 1 , closest: 2
Given 2 , closest: 2
Given 3 , closest: 2
Given 4 , closest: 2

It's Pythonic to use PyPI!

Solution 4

If all you have is a Python dictionary, you can't do better than checking all the entries in the dictionary (as in Will's answer). However, if you want to find the closest key more efficiently than that (i.e., in O(log N) instead of O(N)), you want a balanced tree of some sort.

Unfortunately, I don't believe Python has such a datastructure in its standard library -- as the Pythonic way is to use a dict instead. So, if you expect to make a many such queries on a large map, your best choice may be to find an extension library, or even roll your own...

Share:
24,469

Related videos on Youtube

frazman
Author by

frazman

Updated on May 14, 2020

Comments

  • frazman
    frazman about 3 years

    I have a data in form a dictionary.. NOw I take the input from the user and it can be anything.. And I am trying to do the following. If the key exists then cool.. fetch the value from the dictionary. if not, then fetch the nearest (in numeric sense). For example..if the input key is 200 and the keys are like :....

    197,202,208...
    

    Then probably 202 is the closest key to 200.. Now, from algorithm point of view. its straight forward.. but is there a pythonic way to do this? Thanks

    • Daniel Pryden
      Daniel Pryden over 11 years
      Does it need to be a dict, or would a "dictionary-like" object suffice? If instead you use a binary tree or sorted list, then you can use binary search to find the closest key in O(log n) time.
    • Laurence Gonsalves
      Laurence Gonsalves over 11 years
      "from algorithm point of view. its straight forward"... I assume this means you're okay with O(n) solutions, as O(log n) solutions are less straightforward.
  • PaulMcG
    PaulMcG over 11 years
    Unfortunately, this evaluates min(data.keys()...) for every lookup, even if the key exists in data. Maybe break up get's logic into a ternary: data[num] if num in data else data[min(data.keys(), key=lambda k: abs(k-num))]
  • PaulMcG
    PaulMcG over 11 years
    Check out bisect for what you describe. Create a class with a bisect for keys and a dict for key-value mapping. Use the bisect to find the proper insertion point of a new key in the list of keys, and then check the neighboring values to see which one is closer.
  • PaulMcG
    PaulMcG over 11 years
    Glad to help, but if d.has_key(k) has been deprecated in favor of if k in d.
  • cosmictypist
    cosmictypist over 7 years
    How does SortedDict() handle negative key values?
  • cosmictypist
    cosmictypist over 7 years
    I've been using SortedDict(), but it incorrectly sorts the keys for negative values.
  • GrantJ
    GrantJ over 7 years
    @christylynn002 please open an issue at github.com/grantjenks/sorted_containers/issues
  • Paebbels
    Paebbels about 7 years
    Is this issue fixed?
  • GrantJ
    GrantJ about 7 years
    @Paebbels No issue was opened. I could never reproduce myself.
  • Steven G
    Steven G over 5 years
    I get TypeError: 'odict_keys' object does not support indexing when trying your solution on python 3.6
  • Steven G
    Steven G over 5 years
    this can be corrected by using bisect.bisect_left(list(a.keys()), 45.3)
  • ogurets
    ogurets almost 5 years
    I think I know what's wrong. The reason is not within SortedDict, but this code itself - sd.iloc[-1] returns last element of sd, when it should (within common sense) return first element or None. Should I edit?
  • GrantJ
    GrantJ almost 5 years
    @ogurets Good observation! I see now how the previous code could include the largest key when index equalled 0. I updated the code for version 2 of sortedcontainers. Better here to use irange than the deprecated iloc.
  • nurettin
    nurettin about 1 year
    as of Python 3.7, dict preserves the insertion order (mail.python.org/pipermail/python-dev/2017-December/151283.h‌​tml), so you don't even need to use OrderedDict for this anymore.