Given list of 2d points, find the point closest to all other points

12,995

Solution 1

Note that in 1-D the point that minimizes the sum of distances to all the points is the median.

In 2-D the problem can be solved in O(n log n) as follows:

Create a sorted array of x-coordinates and for each element in the array compute the "horizontal" cost of choosing that coordinate. The horizontal cost of an element is the sum of distances to all the points projected onto the X-axis. This can be computed in linear time by scanning the array twice (once from left to right and once in the reverse direction). Similarly create a sorted array of y-coordinates and for each element in the array compute the "vertical" cost of choosing that coordinate.

Now for each point in the original array, we can compute the total cost to all other points in O(1) time by adding the horizontal and vertical costs. So we can compute the optimal point in O(n). Thus the total running time is O(n log n).

Solution 2

What you are looking for is center of mass.

You basically add all xs' and ys' together and divide be the mass of the whole system. Now, You have uniform mass less particles let their mass be 1.

then all you have to do is is sum the location of the particles and divide by the number of particles.

say we have p1(1,2) p2(1,1) p3 (1,0)

// we sum the x's 

    bestXcord = (1+1+1)/3 = 1

//we sum the y's 

    bestYcord = (2+1)/3 = 1 

so p2 is the closest.

solved in O(n)

Solution 3

Starting from your initial algortihm, there is an optimization possible:

minDist=inf
bestPoint = null
for p1 in points:
    dist = 0
    for p2 in points:
        dist+=distance(p1,p2)
        //This will weed out bad points rather fast
        if dist>=minDist then continue(p1)
    /*
    //Unnecessary because of new line above
    minDist = min(dist,minDist)
    bestPoint = argmin(p1, bestPoint)
    */
    bestPoint = p1

The idea is, to throw away outliers as fast as possible. This can be improved more:

  • start p1 loop with a heuristic "inner" point (This creates a good minDist first, so worse points get thrown away faster)
  • start p2 loop with heuristic "outer" points (This makes dist rise fast, possibly triggering the exit condition faster

If you trade size for speed, you can go another route:

//assumes points are numbered 0..n
dist[]=int[n+1]; //initialized to 0
for (i=0;i<n;i++)
  for (j=i+1;j<=n;j++) {
    dist=dist(p[i], p[j]);
    dist[i]+=dist;
    dist[j]+=dist;
  }
minDist=min(dist);
bestPoint=p[argmin(dist)];

which needs more space for the dist array, but calculates each distance only once. This has the advantage to be better suited to a "get the N best points" sort of problem and for calculation-intensive metrices. I suspect it would bring nothing for the manhattan metric, on an x86 or x64 architecture though: The memory access would heavily dominate the calculation.

Share:
12,995
Razor Storm
Author by

Razor Storm

(your about me is currently blank)

Updated on June 04, 2022

Comments

  • Razor Storm
    Razor Storm almost 2 years
    Input: list of 2d points (x,y) where x and y are integers.
    
    Distance: distance is defined as the Manhattan distance. 
        ie:
        def dist(p1,p2) 
             return abs(p1.x-p2.x) + abs(p1.y - p2.y)
    

    What is an efficient algorithm to find the point that is closest to all other points.

    I can only think of a brute force O(n^2) solution:

    minDist=inf
    bestPoint = null
    for p1 in points:
        dist = 0
        for p2 in points:
            dist+=distance(p1,p2)
        minDist = min(dist,minDist)
        bestPoint = argmin(p1, bestPoint)
    

    basically look at every pair of points.