How to find the nearest neighbors for latitude and longitude point on python?
Solution 1
I honestly don't know if using a kd-tree would work correctly, but my hunch says it would be inaccurate.
I think you need to use something like greater circle distance to get accurate distances.
from math import radians, cos, sin, asin, sqrt, degrees, atan2
def validate_point(p):
lat, lon = p
assert -90 <= lat <= 90, "bad latitude"
assert -180 <= lon <= 180, "bad longitude"
# original formula from http://www.movable-type.co.uk/scripts/latlong.html
def distance_haversine(p1, p2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
Haversine
formula:
a = sin²(Δφ/2) + cos φ1 ⋅ cos φ2 ⋅ sin²(Δλ/2)
_ ____
c = 2 ⋅ atan2( √a, √(1−a) )
d = R ⋅ c
where φ is latitude, λ is longitude, R is earth’s radius (mean radius = 6,371km);
note that angles need to be in radians to pass to trig functions!
"""
lat1, lon1 = p1
lat2, lon2 = p2
for p in [p1, p2]:
validate_point(p)
R = 6371 # km - earths's radius
# convert decimal degrees to radians
lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a)) # 2 * atan2(sqrt(a), sqrt(1-a))
d = R * c
return d
Solution 2
scikit-learn
provides a BallTree
class that supports the Haversine metric. See also this SO question.
Solution 3
I think you are trying to solve the k Nearest Neighbor problem.
Since your data set lies in 2D, then a kd-tree would work nicely, in general, I do not know about spicy.
However, if your points start living in a higher dimension, then kd-tree will not be a smart choice.
Andrei
Updated on June 05, 2022Comments
-
Andrei almost 2 years
Input:
point = (lat, long) places = [(lat1, long1), (lat2, long2), ..., (latN, longN)] count = L
Output:
neighbors
= subset ofplaces
close to thepoint
. (len(neighbors)=L
)Question: Can I use kd-tree for quick nearest-neighbors lookup for points with latitude and longitude? (For example, implementation in scipy)
Is it necessary to transform the geographical coordinates (latitude and longitude) of the point in the coordinates x,y?
Is it the best way to solve this?
-
fmarm almost 5 yearswhere to find the function
validate_point
used in this code ? I assume it checks that latitude and longitude are between -90 and 90 ? -
lumbric almost 5 yearsThis answer is kind of incomplete and does not answer the OP's question. The function
distance_haversine()
calculates the distance in km between two points given in lat/lon, but it does not answer the question how to find the nearest neighbors using this metric. -
lumbric almost 5 yearsData is not given in 2D, points are in lat/lon which cannot be converted accurately to 2D coordinates. One can calculate the distance between points using the Haversine formula but it is not identical to points in 2D (I don't know if kd-tree is applicable or not).
-
Marcel Wilson almost 5 years@lumbric technically you are correct. I provided the means on how to calculate the distances, as part of the question was asking if geography points need to be converted. Which ultimately the answer is you don't need to convert them if using distance_haversine. You simply find the distance to each set of points and pick the smallest.
-
lumbric almost 5 years@MarcelWilson ah yes, you are right, your answer can be easily used to calculate all pairwise distances and then take the minimum. This is possible but not optimal. For large values (let's say >10 000 points) this will use a lot of memory and take a lot of time. It requires O(n^2) time and memory, while the optimal solution should be O(n*log(n)) in time e.g. using some kind of index like k-trees. Since the OP asked about k-trees, I assumed he was interested in an optimal solution.
-
Marcel Wilson almost 5 years@lumbric While it would be more efficient, the answers would not always be correct. cs.stackexchange.com/a/57619 So when it comes to finding nearest neighbor it's better to limit the number of points to a bounding region around your origin point for exactly the reason you state.
-
lumbric almost 5 years@MarcelWilson Ah yes, of course. Trusting on the Euclidean metric is risky if distances are large. I think it should be possible to find algorithms which solve the question in O(n*log(n)) using the Haversine metric. I'm not sure about KDTree, but BallTree in sklearn supports the Haversine metric (I'm not sure if there are any pitfalls).
-
Marcel Wilson almost 5 yearsIf I recall correctly the risk is more pronounced on the top and bottom of the globe rather than at far distances. I'll have to check out BallTree; sounds magical.
-
Shashwat over 3 yearsThis is the best answer.