Difference between scipy.spatial.KDTree and scipy.spatial.cKDTree
Solution 1
cKDTree is a subset of KDTree, implemented in C++ wrapped in Cython, so therefore faster.
Each of them is
a binary trie, each of whose nodes represents an axis-aligned hyperrectangle. Each node specifies an axis and splits the set of points based on whether their coordinate along that axis is greater than or less than a particular value.
but KDTree
also supports all-neighbors queries, both with arrays of points and with other kd-trees. These do use a reasonably efficient algorithm, but the kd-tree is not necessarily the best data structure for this sort of calculation.
Solution 2
In a use case (5D nearest neighbor look ups in a KDTree with approximately 100K points) cKDTree is around 12x faster than KDTree.
Solution 3
Currently, both have almost same APIs, and cKDTree
is faster than KDTree
.
So, In the near future, SciPy developers are planning to remove KDTree
, and cKDTree
will be renamed to KDTree
in a backwards-compatible way.
Ref: Detailed SciPy Roadmap — SciPy v1.6.0.dev Reference Guide https://docs.scipy.org/doc/scipy/reference/roadmap-detailed.html#spatial
Comments
-
Benjamin almost 2 years
What is the difference between these two algorithms?
-
pythonjsgeo over 8 yearsI am surprised this is not advertised more prominently in the KDTree docs and articles. For my simple (and presumably common) use case of finding neighbours in 3D for around 20,000 points, cKDTree was 40x faster.
-
gansub over 5 years@agf - cKDTree is actually implemented in C++. will you accept an edit ?
-
agf over 5 years@gansub Sure, if you include a link to the source or something else that shows that it's C++.
-
gansub over 5 years@agf - github.com/scipy/scipy/blob/master/scipy/spatial/ckdtree.pyx distutils language c++
-
agf over 5 years@gansub That appears to be Cython, not C++?
-
gansub over 5 years@agf - you will need to download scipy module and check the code under scipy/spatial/src. There is a .cxx file there.
-
agf over 5 years@gansub I tracked it down: github.com/scipy/scipy/tree/master/scipy/spatial/ckdtree/src
-
Matti Wens over 5 yearsAnother data point: Finding the two nearest neighbours among 1,640 points in 24-dimensions for around 50,000 test vectors: KDTree - 2m 32s / cKDTree - 360ms.
-
Nathan about 5 yearsWhat do "all-neighbors queries" look like? I'm guessing it's kind of like a parallelized version asking for the nearest points to many points at once. Can anyone confirm?
-
Nathan about 5 yearsThis answer would be 10x more helpful with speed tests to confirm, and/or the "all-neighbors queries" mentioned by @agf
-
Trevor Boyd Smith over 4 yearsi'm with frank. i do not know what is "all neighbors queries". can you please explain what is "all neighbors queries"?