how to find k-th nearest neighbor of a point in a set of point

matlab nearest-neighbor

10,649

Solution 1

if you will do this check for many points you might want to construct a inter-point distance table first

squareform(pdist([x y]))

Solution 2

If you have the statistics toolbox, you can use the function knnsearch.

Solution 3

The brute force approach looks something like this:

%Create some points
n = 10;
x = randn(n,1);
y = randn(n,1);

%Choose x0
ix0 = randi(n);

%Get distances
d = sqrt(...
    (x - x(ix0) ).^2 + ...
    (y - y(ix0) ).^2 );

%Sort distances
[sorted_Dstances, ixSort] = sort(d);

%Get kth point
k = 3;
kth = [x(ixSort(k+1)); y(ixSort(k+1))]; %+1 since the first element will always be the x0 element.

Solution 4

The free and opensource VLFeat toolbox contains a kd-tree implementation, amongst other useful things.

Solution 5

A brute force algorithm would be something like this:

array x[n] = ()
array y[n] = () 
array d[n] = ()

... populate x and y arrays with your n points ...

/* step over each point and calculate its distance from (x0, y0) */
for i = 1 to n
do
  d[i] = distance((x0, y0), (x[i], y[i])
end 

/* sort the distances in increasing order */
sort(d)

/* the k'th element of d, is the k'th nearest point to (x0, y0) */
return d[k]

View more solutions

10,649

Author by

opmfan

Updated on June 16, 2022

Comments

opmfan almost 2 years

I have a set of point (x,y) on a 2d plane. Given a point (x0,y0), and the number k, how to find the k-th nearest neighbor of (x0,x0) in the point set. In detail, the point set are represented by two array: x and y. The point (x0,y0) is given by the index i0. It means x0=x(i0) and y0=y(i0).

Is there any function or something in Matlab helps me this problem. If Matlab doesn't have such kind of function, can you suggest any other effective ways.

EDIT: I have to calculate this kind of distance for every point (x0,y0) in the set. The size of the set is about 1000. The value of k should be about sqrt(1500). The worst thing is that I do this many times. At each iteration, the set changes, and I calculate the distances again. So, the running time is a critical problem.
Andrey Rubshtein about 12 years

Shouldn't you remove the element itself? Think about the case k=1
Pursuit about 12 years

Good point. I generally don't like changing the sizes of match vectors like this. Maybe add a '+1' to the final indexing. EDIT: although that leaves a gap if there is a point identical to the initial point. If you want to guarantee the answer is a different point, even when some point could be equal, then more work is needed.
ardnew about 12 years

@Pursuit it doesn't make sense to also remove the equal points. if you have 5 equal points to the point you're searching, the 3rd furthest distance should be 0
opmfan about 12 years

thank you very much! but I have to calculate this kind of distance with every point in the set. So, it seems that this approach is a little simpler than I expected. I'm sorry I didn't clarify earlier.
opmfan about 12 years

knnsearch seems to be a solution but I'm not sure about how to apply knnsearch to my problem exactly. I'll find it. Anyway, can you give me more detail about the way using knnsearch. Thank you very much.
opmfan about 12 years

Yes. I will do this for every points in the set. So, some kind of table of distance would help saving the running time. I will find out how squareform function works in my problem. Thank you very much.
zamazalotta about 12 years

the function is pdist actually, squareform just makes the a square matrix out of pdist's vector output
Andrey Rubshtein about 12 years

But only if you have the statistics toolbox.
3lectrologos about 12 years

Have you looked at the Matlab help (added link to my answer above)?
opmfan about 12 years

it works for me. Maybe it's not the best approach, but it's simple to implement and use. To be honest, I don't have much time to finish the code. Thank you very much.
opmfan about 12 years

I read the online document about knnsearch but it's a little bit complicate to me and I really don't have much time to understand and use it. I tried the simpler approach. It takes more time of running but I will try that approach first. Thank you for your help.
Evgeni Sergeev almost 10 years

This is O(N^2), where N is the number of points, with each query then being O(N lg N). A kd-tree, mentioned below, and also used under some conditions by knnsearch, is typically much faster, taking O(N lg^2 N) to construct, O(lg N) for 1 nearest neighbour, and, I'm guessing, around O(k(lg k)(lg N)) for k nearest neighbours when k is small and with a favourable dataset (thinking: via binary search). (Which means, for example, that if you have 10000 points, it should be somewhere around 100 to 1000 times faster.) Therefore knnsearch is much preferred.