Algorithm to find points that are furthest apart -- better than O(n^2)?

17,334

Solution 1

If you just need the scale and not the exact points you can do this in O(n) time with some error margin. Think about the simple case of making a bounding box. Calculate the minimum x value from all the points, the maximum x, the minimum y and the maximum y. These four numbers give you a maximum bounding box around your points with max error of 1 - (1/sqrt(2)) about 30%. You can reduce this by adding more sides to your square. Think about the case of an octagon. To calculate the min and max values for the other sides you have to rotate your coordinate system.

Error vs run time breaks down like this.

shape - run time - max error

  • square - 2N - 30%
  • octagon - 4N - 16%
  • 16 sides - 8N - 4%
  • 32 sides - 16N - 1%

Here's the equation for the max error I came up with.

angle = 180 / sides
max_error = (1 / cos angle) - cos angle

Let me know if I should add a diagram explaining this.

Solution 2

The following may help to put the average case linear-time algorithms (for the diameter of a finite set) in a clearer light, as well as contrast multi-dimensional and plane geometry problems.

In 1983 Megiddo gave a deterministic linear-time algorithm for the smallest enclosing circle (or sphere in higher dimensions).

In general position the enclosing circle will have two or three points on its boundary, and thus finding the two farthest apart can be done "on average" in constant time once the bounding circle is known. In higher dimensions the number of points in general position needed on the boundary of the sphere increases (D+1 points for dimension D), and indeed the cost of computing distance between a single pair of points rises linearly with dimension.

The subset of points lying on the bounding circle or sphere is also found in linear time. In the theoretical worst-case all points would lie on the bounding circle or sphere, but this at least is stricter than just having all points on the convex hull. If the points on the sphere are independently perturbed, say along radial lines, then general position is assured with probability 1, and an approximate diameter can be found from just D+1 points on the revised enclosing sphere. This randomized approximation has quadratic dependence on dimension but only linear complexity in number of points.

If the points lying on a bounding circle are "sorted" (cyclically, of course), finding the pair farthest apart can be done in linear time, relying upon the "unimodality" of the circle (meaning that distances from a fixed point rise monotonically until the antipode and then fall) to amortize the cost of computing distances. Unfortunately that sorting would introduce a step with O(n log n) time complexity, and this proves to be worst-case optimal for exact deterministic methods in the planar case.

In 2001 Ramos succeeded in showing an O(n log n) deterministic algorithm for three-dimensional sets, but the technique is so involved that an implementation may be impractical or slower than brute force all-pairs search up to very large datasets.

For higher dimensions many authors have considered randomized or approximate algorithms. See Piotr Indyk's thesis (2000) for approximate methods with only polynomial dependence on dimension for various proximity problems.

Solution 3

As mentioned in this answer, you are seeking the "diameter" of the set of N points, a well known problem in computational geometry. There are basically two steps:

  1. Find the convex hull of the points. Algorithms exist that are O(N ln N), worst case. In practice, QuickHull is usually a fast choice, although potentially O(N^2) worst case. The QHull implementation is convenient to call from the command line. The CGAL library provides a C++ implementation

  2. Antipodal pairs on the convex hull are candidates for furthest points. One can search over the antipodal points using an algorithm like Rotating calipers in O(N) time.

The problem can be generalized to an "all-farthest pairs" problem: for each point i, find the most distant point j---we're now seeking N pairs of points. The solution again uses the convex hull, but now the second part can be done with a matrix searching algorithm.

Solution 4

Not really - a common approach is to group points into clusters and then store the distances between clusters.

That way you don't need to check if a certain house in New York is furthest from Paris if you have already determiend that Australia is further away

Solution 5

The distance from A to B is the same as the distance from B to A. You can easily modify the algorithm to eliminate half of the computations this way. It'll still be O(n^2) but will be twice as fast.

That is, instead of computing all off-diagonal elements of the distance matrix P x P:

P = {A, B, C, D, ...}

  + A + B + C + D + ...
A |   | * | * | * | ...
B | * |   | * | * | ...
C | * | * |   | * | ...
D | * | * | * |   | ...
  |   |   |   |   |

you can compute either the upper triangle:

  + A + B + C + D + ...
A |   | * | * | * | ...
B |   |   | * | * | ...
C |   |   |   | * | ...
D |   |   |   |   | ...
  |   |   |   |   |

or the lower triangle:

  + A + B + C + D + ...
A |   |   |   |   | ...
B | * |   |   |   | ...
C | * | * |   |   | ...
D | * | * | * |   | ...
  |   |   |   |   |
Share:
17,334
houbysoft
Author by

houbysoft

Updated on June 05, 2022

Comments

  • houbysoft
    houbysoft almost 2 years

    In my program, I have a set of points. For purposes of rescaling, I am searching for the two nodes that are furthest apart, and then computing a factor by which to multiply all coordinates so that the maximum distance is equal to some predefined one I define.

    The algorithm I am using to find the two points furthest apart is, however, problematic for large sets of points, as it is O(n^2); pseudocode (distances that were already calculated are skipped):

     for each point in points:
         for each other point in points:
             if distance between point and other point > max
                 max = distance between point and other point
    

    Is there something faster?