Find closest point of every point (Nearest Neighbor)

22,949

Solution 1

There are some fairly standard ways of improving this kind of search, and how complicated you want to get depends on how many points you are searching.

A fairly common easy one is to sort the points by X or Y. For each point you then look for near points, going both forwards and backwards in the array. Remember how far away the nearest one you have found is, and when the difference in X (or Y) is greater than that you know there can't be any nearer point left to find.

You can also partition your space using a tree. Wikipedia has a page that gives some possible algorithms. Sometimes the cost to set them up is larger than what you save. That's the sort of thing you have to decide based on how many points you are searching.

Solution 2

I would definitely sort by x first. Then I would use the x distance between points as a quick reject test: once you have the distance to one neighbor, any closer neighbor has to be closer in x. This avoids all the distSquared computations for points outside the x range. Every time you find a closer neighbor, you also tighten up the range of x that you need to search.

Also, if P2 is the closest neighbor to P1, then I would use P1 as the initial guess for the closest neighbor to P2.

EDIT: On second thought, I'd sort by whichever dimension has the largest range.

Solution 3

Brute force for nearest neighbour search is only feasible for a small number of points.

You might want to look into kd-Trees or spatial data structures generally.

Here is a demo for kd-Tree. This is what wikipedia says.

Solution 4

Another possibility, simpler than creating a kd-tree, is using a neighborhood matrix.

First place all your points into a 2D square matrix. Then you can run a full or partial spatial sort, so points will became ordered inside the matrix.

Points with small Y could move to the top rows of the matrix, and likewise, points with large Y would go to the bottom rows. The same will happen with points with small X coordinates, that should move to the columns on the left. And symmetrically, points with large X value will go to the right columns.

After you did the spatial sort (there are many ways to achieve this, both by serial or parallel algorithms) you can lookup the nearest points of a given point P by just visiting the adjacent cells where point P is actually stored in the neighborhood matrix.

You can read more details for this idea in the following paper (you will find PDF copies of it online): Supermassive Crowd Simulation on GPU based on Emergent Behavior.

The sorting step gives you interesting choices. You can use just the even-odd transposition sort described in the paper, which is very simple to implement (maybe even in CUDA). If you run just one pass of this, it will give you a partial sort, which can be already useful if your matrix is near-sorted. That is, if your points move slowly, it will save you a lot of computation.

If you need a full sort, you can run such even-odd transposition pass several times (as described in the following Wikipedia page):

http://en.wikipedia.org/wiki/Odd%E2%80%93even_sort

If the changes are small, one or two even-odd passes will suffice to get the array sorted again.

Solution 5

Either use a kd-tree, or use a good library for nearest neighbor search. Weka includes one.

Share:
22,949
Admin
Author by

Admin

Updated on February 19, 2020

Comments

  • Admin
    Admin about 4 years

    I am writing a method that takes as input an array of points and finds, for each point in the array, the closest point to it other than itself. I am currently doing this in a brute force way (cheking every point with every other point). My current implimentation doesn't have the array sorted but i can sort it by p.x values with the CompareByX method. I am chekcking the running time of the algorithm, and it gets very time consuming with large values of n. I am not very knowledgable on this subject and know very littel about different types of data structures, any simple help would be great!

    My current code is:

    import java.util.*;
    import java.lang.*;
    import java.io.*;
    
    class My2dPoint {
      double x;
      double y;
    
      public My2dPoint(double x1, double y1) {
        x=x1;
        y=y1;
      }
    
    }
    
    
    class CompareByX implements Comparator<My2dPoint> {
        public int compare(My2dPoint p1, My2dPoint p2) {
        if (p1.x < p2.x) return -1;
            if (p1.x == p2.x) return 0;
            return 1;
        }
    }
    
        /* An object of the above comparator class is used by java.util.Arrays.sort() in main to sort an array of points by x-coordinates */
    
    class Auxiliaries {
    
        public static double distSquared(My2dPoint p1, My2dPoint p2) {
            double result;
            result = (p1.x-p2.x)*(p1.x-p2.x) + (p1.y-p2.y)*(p1.y-p2.y);
            return result;
        }
    
    }
    
    public class HW3 {
        public static void main (String argv []) throws IOException {
            int range = 1000000; // Range of x and y coordinates in points
    
            System.out.println("Enter the number of points");
    
            InputStreamReader reader1 = new InputStreamReader(System.in);
            BufferedReader buffer1 = new BufferedReader(reader1);
            String npoints = buffer1.readLine();
            int numpoints = Integer.parseInt(npoints);
    
            // numpoints is now the number of points we wish to generate
    
            My2dPoint inputpoints [] = new My2dPoint [numpoints];
    
            // array to hold points
    
            int closest [] = new int [numpoints];
    
            // array to record soln; closest[i] is index of point closest to i'th
    
            int px, py;
            double dx, dy, dist;
            int i,j;
            double currbest;
            int closestPointIndex;
            long tStart, tEnd;
    
            for (i = 0; i < numpoints; i++) {
    
              px = (int) ( range * Math.random());
              dx = (double) px;
              py = (int) (range * Math.random());
              dy = (double) py;
              inputpoints[i] = new My2dPoint(dx, dy);
    
            }
    
            // array inputpoints has now been filled
    
    
    
            tStart = System.currentTimeMillis();
    
            // find closest [0]
    
    
            closest[0] = 1;
            currbest = Auxiliaries.distSquared(inputpoints[0],inputpoints[1]);
            for (j = 2; j < numpoints; j++) {
               dist = Auxiliaries.distSquared(inputpoints[0],inputpoints[j]);
               if (dist < currbest) {
                   closest[0] = j;
                   currbest = dist;
               }
            }
    
            // now find closest[i] for every other i 
    
            for (i = 1; i < numpoints; i++) {
                closest[i] = 0;
                currbest = Auxiliaries.distSquared(inputpoints[i],inputpoints[0]);
                for (j = 1; j < i; j++) {
                  dist = Auxiliaries.distSquared(inputpoints[i],inputpoints[j]);
                  if (dist < currbest) {
                   closest[i] = j;
                   currbest = dist;
              }
                }
    
                for (j = i+1; j < numpoints; j++) {
                  dist = Auxiliaries.distSquared(inputpoints[i],inputpoints[j]);
                  if (dist < currbest) {
              closest[i] = j;
                      currbest = dist;
              }
                }
            }
    
            tEnd = System.currentTimeMillis();
            System.out.println("Time taken in Milliseconds: " + (tEnd - tStart));
        }
    }
    
  • Waldheinz
    Waldheinz about 13 years
    Nearest Neighbour -> kd-Tree. Right.
  • Bob Coder
    Bob Coder about 10 years
    In a degenerate case, say all points are arranged along the x axis, would this method still work?
  • mgmalheiros
    mgmalheiros about 10 years
    Of course, if you have a distribution of points that is heterogeneous, you will have not-so-good neighborhoods. In the limit case, as you said, with points all placed in just one axis, you will still get them sorted into a matrix, but the chances are that many nearby points in space will be far away inside the matrix. So, the more uniform you have the spatial distribution, the well behaved will be your matrix, and more consistent the neighborhoods inside the matrix. I've put same sample code for the sorting on PasteBin here.
  • Bob Coder
    Bob Coder about 10 years
    I ran a test on a degenerate case. It sorts it but some nearby points find themselves far removed from each-other. I wonder if it is possible to come up with an algorithm that is more robust? Any opinions? p.s. thank you for replying, I appreciate it.
  • Micromega
    Micromega over 9 years
    @BobCoder:This is even worst when you add x and y coordinates together. For example translate x- and y coordinate to a binary and concatenate both values. Then sort the points. This orders the points along a z-curve, a.k.a. monster curve. It has much better spatial properties and it's relative easy and fast.