Algorithm for finding nearby points?

10,705

Solution 1

How about using quadtree?

You divide area to rectangles, if area has low density of points, rectangles are large, and if area has high density of points, rectangles will be small. You recursively subdivide each rectangle to four sub rectangles until rectangles are small enough or contain few enough points.

You can then start looking at points in rectangles near the location, and move outwards until you have found your 1000 points.

Code for this could get somewhat complex, so maybe you should try first with the simple grid and see if it is fast enough.

Solution 2

Quadtrees are nice, but BSP trees are guaranteed to run in O(log n) time. I think quadtrees require a finite bounding volume, and and there are some degenerate cases where quadtrees fail miserably, such as when a large number of points occupy the same relatively small space.

That being said, Quadtrees are arguably easier to implement and quite effective in most common situations. It's what UPS uses in their routing algorithms, because it's drawbacks don't pose significant problems in practice, probably because cities tend to be spread out over the region of interest.

Solution 3

You want to use a structure like a Quad tree, or an RTree. These are multidimensional index structures.

The key is using a good "space filling curve", which is what helps define the nearness of points. A simple space filling curve is a Zorder, but you would be more interested in something like a hilbert curve.

http://en.wikipedia.org/wiki/Space_filling_curve

I don't know of any prepackaged implementations of this stuff. I recently implemented my own RTree in 2 dimensions that only supports bulk loading and searches (via a provided bounding box).

One drawback here is that your points have to be contained in a finite region. There know there are space filling curves that work for spaces that are not finite, but I do not know anything about them.

Solution 4

In addition to the QuadTree and BSP tree suggestions, you should look up nearest neighbour searching. The choice of algorithm is based on how often you are adding to your base dataset. If you are adding and removing often, tree solutions are superior. If the data is more static, nearest neighbour searching and voronoi diagrams can be much faster and scale better.

Solution 5

If the set of points rarely changes, you could also consider using a voronoi diagram. I'm not sure if that helps finding the first point faster, but it should make it a lot easier to find the next 999 points.

Share:
10,705

Related videos on Youtube

Bemmu
Author by

Bemmu

Updated on January 21, 2020

Comments

  • Bemmu
    Bemmu almost 4 years

    Given a set of several million points with x,y coordinates, what is the algorithm of choice for quickly finding the top 1000 nearest points from a location? "Quickly" here means about 100ms on a home computer.

    Brute force would mean doing millions of multiplications and then sorting them. While even a simple Python app could do that in less than a minute, it is still too long for an interactive application.

    The bounding box for the points will be known, so partitioning the space into a simple grid would be possible. However the points are distributed somewhat unevenly, so I suspect most grid squares would be empty and then suddenly some of them would contain a large portion of the points.

    Edit: Does not have to be exact, actually can be quite inaccurate. It wouldn't be a huge deal if the top 1000 are actually just some random points from the top 2000 for example.

    Edit: Set of points rarely changes.

    • TonJ
      TonJ over 14 years
      Does it have to be exact, or is it also OK if e.g. 900 out of 1000 selected are among the closest 1000?
    • Juha Syrjälä
      Juha Syrjälä over 14 years
      Is the set of points fixed? Will you fetch the closest 1000 points for several differents locations, before set of points changes?
  • Bemmu
    Bemmu over 14 years
    They are not in a relational database, and also I remember reading that a relational database like MySQL can only use one index at a time in a situation like this.
  • Bemmu
    Bemmu over 14 years
    These space-filling curves are an amazingly fresh point of view for me to think about the problem, thanks a lot!
  • Nick Johnson
    Nick Johnson over 14 years
    Doing range queries on two different properties can not be efficiently satisfied using only 1D indexes. Relational databases aren't magic.