Matlab:K-means clustering

22,942

Solution 1

I can't think of a better way to do it than what you described. A built-in function would save one line, but I couldn't find one. Here's the code I would use:

[ids ctrs]=kmeans(A,19);
D = dist([testpoint;ctrs]); %testpoint is 1x10 and D will be 20x20
[distance testpointID] = min(D(1,2:end));

Solution 2

The following is a a complete example on clustering:

%% generate sample data
K = 3;
numObservarations = 100;
dimensions = 3;
data = rand([numObservarations dimensions]);

%% cluster
opts = statset('MaxIter', 500, 'Display', 'iter');
[clustIDX, clusters, interClustSum, Dist] = kmeans(data, K, 'options',opts, ...
    'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3);

%% plot data+clusters
figure, hold on
scatter3(data(:,1),data(:,2),data(:,3), 50, clustIDX, 'filled')
scatter3(clusters(:,1),clusters(:,2),clusters(:,3), 200, (1:K)', 'filled')
hold off, xlabel('x'), ylabel('y'), zlabel('z')

%% plot clusters quality
figure
[silh,h] = silhouette(data, clustIDX);
avrgScore = mean(silh);


%% Assign data to clusters
% calculate distance (squared) of all instances to each cluster centroid
D = zeros(numObservarations, K);     % init distances
for k=1:K
    %d = sum((x-y).^2).^0.5
    D(:,k) = sum( ((data - repmat(clusters(k,:),numObservarations,1)).^2), 2);
end

% find  for all instances the cluster closet to it
[minDists, clusterIndices] = min(D, [], 2);

% compare it with what you expect it to be
sum(clusterIndices == clustIDX)

Solution 3

I don't know if I get your meaning right, but if you want to know which cluster your points belong you can use KnnSearch function easily. It has two arguments and will search in first argument for the first one of them that is closest to argument two.

Solution 4

Assuming you're using squared euclidean distance metric, try this:

for i = 1:size(ctrs,2)
d(:,i) = sum((B-ctrs(repmat(i,size(B,1),1),:)).^2,2);
end
[distances,predicted] = min(d,[],2)

predicted should then contain the index of the closest centroid, and distances should contain the distances to the closest centroid.

Take a look inside the kmeans function, at the subfunction 'distfun'. This shows you how to do the above, and also contains the equivalents for other distance metrics.

Share:
22,942
tguclu
Author by

tguclu

Updated on July 09, 2022

Comments

  • tguclu
    tguclu almost 2 years

    I have a matrice of A(369x10) which I want to cluster in 19 clusters. I use this method

    [idx ctrs]=kmeans(A,19)
    

    which yields idx(369x1) and ctrs(19x10)

    I get the point up to here.All my rows in A is clustered in 19 clusters.

    Now I have an array B(49x10).I want to know where the rows of this B corresponds in the among given 19 clusters.

    How is it possible in MATLAB?

    Thank you in advance