Difference between scipy.spatial.KDTree and scipy.spatial.cKDTree

10,970

Solution 1

cKDTree is a subset of KDTree, implemented in C++ wrapped in Cython, so therefore faster.

Each of them is

a binary trie, each of whose nodes represents an axis-aligned hyperrectangle. Each node specifies an axis and splits the set of points based on whether their coordinate along that axis is greater than or less than a particular value.

but KDTree

also supports all-neighbors queries, both with arrays of points and with other kd-trees. These do use a reasonably efficient algorithm, but the kd-tree is not necessarily the best data structure for this sort of calculation.

Solution 2

In a use case (5D nearest neighbor look ups in a KDTree with approximately 100K points) cKDTree is around 12x faster than KDTree.

Solution 3

Currently, both have almost same APIs, and cKDTree is faster than KDTree. So, In the near future, SciPy developers are planning to remove KDTree, and cKDTree will be renamed to KDTree in a backwards-compatible way.

Ref: Detailed SciPy Roadmap — SciPy v1.6.0.dev Reference Guide https://docs.scipy.org/doc/scipy/reference/roadmap-detailed.html#spatial

Share:
10,970
Benjamin
Author by

Benjamin

Work in Python, Numpy/SciPy, R, and others...

Updated on June 06, 2022

Comments

  • Benjamin
    Benjamin almost 2 years

    What is the difference between these two algorithms?

  • pythonjsgeo
    pythonjsgeo over 8 years
    I am surprised this is not advertised more prominently in the KDTree docs and articles. For my simple (and presumably common) use case of finding neighbours in 3D for around 20,000 points, cKDTree was 40x faster.
  • gansub
    gansub over 5 years
    @agf - cKDTree is actually implemented in C++. will you accept an edit ?
  • agf
    agf over 5 years
    @gansub Sure, if you include a link to the source or something else that shows that it's C++.
  • gansub
    gansub over 5 years
  • agf
    agf over 5 years
    @gansub That appears to be Cython, not C++?
  • gansub
    gansub over 5 years
    @agf - you will need to download scipy module and check the code under scipy/spatial/src. There is a .cxx file there.
  • agf
    agf over 5 years
  • Matti Wens
    Matti Wens over 5 years
    Another data point: Finding the two nearest neighbours among 1,640 points in 24-dimensions for around 50,000 test vectors: KDTree - 2m 32s / cKDTree - 360ms.
  • Nathan
    Nathan about 5 years
    What do "all-neighbors queries" look like? I'm guessing it's kind of like a parallelized version asking for the nearest points to many points at once. Can anyone confirm?
  • Nathan
    Nathan about 5 years
    This answer would be 10x more helpful with speed tests to confirm, and/or the "all-neighbors queries" mentioned by @agf
  • Trevor Boyd Smith
    Trevor Boyd Smith over 4 years
    i'm with frank. i do not know what is "all neighbors queries". can you please explain what is "all neighbors queries"?