How to find the nearest neighbors for latitude and longitude point on python?

10,490

Solution 1

I honestly don't know if using a kd-tree would work correctly, but my hunch says it would be inaccurate.

I think you need to use something like greater circle distance to get accurate distances.


from math import radians, cos, sin, asin, sqrt, degrees, atan2

def validate_point(p):
    lat, lon = p
    assert -90 <= lat <= 90, "bad latitude"
    assert -180 <= lon <= 180, "bad longitude"

# original formula from  http://www.movable-type.co.uk/scripts/latlong.html
def distance_haversine(p1, p2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    Haversine
    formula: 
        a = sin²(Δφ/2) + cos φ1 ⋅ cos φ2 ⋅ sin²(Δλ/2)
                        _   ____
        c = 2 ⋅ atan2( √a, √(1−a) )
        d = R ⋅ c

    where   φ is latitude, λ is longitude, R is earth’s radius (mean radius = 6,371km);
            note that angles need to be in radians to pass to trig functions!
    """
    lat1, lon1 = p1
    lat2, lon2 = p2
    for p in [p1, p2]:
        validate_point(p)

    R = 6371 # km - earths's radius

    # convert decimal degrees to radians 
    lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])

    # haversine formula 
    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) # 2 * atan2(sqrt(a), sqrt(1-a))
    d = R * c
    return d

Solution 2

scikit-learn provides a BallTree class that supports the Haversine metric. See also this SO question.

Solution 3

I think you are trying to solve the k Nearest Neighbor problem.

Since your data set lies in 2D, then a kd-tree would work nicely, in general, I do not know about spicy.

However, if your points start living in a higher dimension, then kd-tree will not be a smart choice.

Share:
10,490
Andrei
Author by

Andrei

Updated on June 05, 2022

Comments

  • Andrei
    Andrei almost 2 years

    Input:

    point = (lat, long)
    places = [(lat1, long1), (lat2, long2), ..., (latN, longN)]
    count = L
    

    Output: neighbors = subset of places close to the point. (len(neighbors)=L)

    Question: Can I use kd-tree for quick nearest-neighbors lookup for points with latitude and longitude? (For example, implementation in scipy)

    Is it necessary to transform the geographical coordinates (latitude and longitude) of the point in the coordinates x,y?

    Is it the best way to solve this?

  • fmarm
    fmarm almost 5 years
    where to find the function validate_point used in this code ? I assume it checks that latitude and longitude are between -90 and 90 ?
  • lumbric
    lumbric almost 5 years
    This answer is kind of incomplete and does not answer the OP's question. The function distance_haversine() calculates the distance in km between two points given in lat/lon, but it does not answer the question how to find the nearest neighbors using this metric.
  • lumbric
    lumbric almost 5 years
    Data is not given in 2D, points are in lat/lon which cannot be converted accurately to 2D coordinates. One can calculate the distance between points using the Haversine formula but it is not identical to points in 2D (I don't know if kd-tree is applicable or not).
  • Marcel Wilson
    Marcel Wilson almost 5 years
    @lumbric technically you are correct. I provided the means on how to calculate the distances, as part of the question was asking if geography points need to be converted. Which ultimately the answer is you don't need to convert them if using distance_haversine. You simply find the distance to each set of points and pick the smallest.
  • lumbric
    lumbric almost 5 years
    @MarcelWilson ah yes, you are right, your answer can be easily used to calculate all pairwise distances and then take the minimum. This is possible but not optimal. For large values (let's say >10 000 points) this will use a lot of memory and take a lot of time. It requires O(n^2) time and memory, while the optimal solution should be O(n*log(n)) in time e.g. using some kind of index like k-trees. Since the OP asked about k-trees, I assumed he was interested in an optimal solution.
  • Marcel Wilson
    Marcel Wilson almost 5 years
    @lumbric While it would be more efficient, the answers would not always be correct. cs.stackexchange.com/a/57619 So when it comes to finding nearest neighbor it's better to limit the number of points to a bounding region around your origin point for exactly the reason you state.
  • lumbric
    lumbric almost 5 years
    @MarcelWilson Ah yes, of course. Trusting on the Euclidean metric is risky if distances are large. I think it should be possible to find algorithms which solve the question in O(n*log(n)) using the Haversine metric. I'm not sure about KDTree, but BallTree in sklearn supports the Haversine metric (I'm not sure if there are any pitfalls).
  • Marcel Wilson
    Marcel Wilson almost 5 years
    If I recall correctly the risk is more pronounced on the top and bottom of the globe rather than at far distances. I'll have to check out BallTree; sounds magical.
  • Shashwat
    Shashwat over 3 years
    This is the best answer.