K-means Plotting for 3 Dimensional Data

20,954

Solution 1

Your code is very messy, and unnecessarily long..

Here is smaller example that does the same thing. You'll need the Statistics toolbox to run it (for the kmeans function and Iris dataset):

%# load dataset of 150 instances and 3 dimensions
load fisheriris
X = meas(:,1:3);
[numInst,numDims] = size(X);

%# K-means clustering
%# (K: number of clusters, G: assigned groups, C: cluster centers)
K = 3;
[G,C] = kmeans(X, K, 'distance','sqEuclidean', 'start','sample');

%# show points and clusters (color-coded)
clr = lines(K);
figure, hold on
scatter3(X(:,1), X(:,2), X(:,3), 36, clr(G,:), 'Marker','.')
scatter3(C(:,1), C(:,2), C(:,3), 100, clr, 'Marker','o', 'LineWidth',3)
hold off
view(3), axis vis3d, box on, rotate3d on
xlabel('x'), ylabel('y'), zlabel('z')

pic

Solution 2

You could simply go for scatter():

enter image description here

As you can see from the image, you differentiate colors, size of the clusters. FOr more details check out the examples in the documentation.

Share:
20,954
Alvi Syahrin
Author by

Alvi Syahrin

A wanna be programmer

Updated on July 05, 2022

Comments

  • Alvi Syahrin
    Alvi Syahrin almost 2 years

    I'm working with k-means in MATLAB. I am trying to create the plot/graph, but my data has three dimensional array. Here is my k-means code:

    clc
    clear all
    close all
    load cobat.txt;  % read the file
    
    k=input('Enter a number: ');        % determine the number of cluster
    isRand=0;   % 0 -> sequeantial initialization
                % 1 -> random initialization
    
    [maxRow, maxCol]=size(cobat);
    if maxRow<=k, 
        y=[m, 1:maxRow];
    elseif k>7
        h=msgbox('cant more than 7');
    else
        % initial value of centroid
        if isRand,
            p = randperm(size(cobat,1));      % random initialization
            for i=1:k
                c(i,:)=cobat(p(i),:);  
            end
        else
            for i=1:k
               c(i,:)=cobat(i,:);        % sequential initialization
            end
        end
    
        temp=zeros(maxRow,1);   % initialize as zero vector
        u=0;
        while 1,
            d=DistMatrix3(cobat,c);   % calculate the distance 
            [z,g]=min(d,[],2);      % set the matrix g group
    
            if g==temp,             % if the iteration doesn't change anymore
                break;              % stop the iteration
            else
                temp=g;             % copy the matrix to the temporary variable
            end
            for i=1:k
                f=find(g==i);
                if f                % calculate the new centroid 
                    c(i,:)=mean(cobat(find(g==i),:),1);
                end
            end
            c
            [B,index] = sortrows( c );  % sort the centroids
            g = index(g); % arrange the labels based on centroids
        end
        y=[cobat,g]
    
        hold off;    
    
       %This plot is actually placed in plot 3D code (last line), but I put it into here, because I think this is the plotting line
       f = PlotClusters(cobat,g,y,Colors) %Here is the error
       if Dimensions==2
        for i=1:NumOfDataPoints %plot data points    
            plot(cobat(i,1),cobat(i,2),'.','Color',Colors(g(i),:))
            hold on
        end
        for i=1:NumOfCenters %plot the centers
            plot(y(i,1),y(i,2),'s','Color',Colors(i,:))
        end
    else
        for i=1:NumOfDataPoints %plot data points 
            plot3(cobat(i,1),cobat(i,2),cobat(i,3),'.','Color',Colors(g(i),:)) 
            hold on
        end
        for i=1:NumOfCenters %plot the centers
            plot3(y(i,1),y(i,2),y(i,3),'s','Color',Colors(i,:))
        end 
    
       end
    end
    

    And here is the plot 3D code:

    %This function plots clustering data, for example the one provided by
    %kmeans. To be able to plot, the number of dimensions has to be either 2 or
    %3. 
    %Inputs:
    %       Data - an m-by-d matrix, where m is the number of data points to
    %              cluster and d is the number of dimensions. In my code, it is cobat
    %       IDX - an m-by-1 indices vector, where each element gives the
    %             cluster to which the corresponding data point in Data belongs. In my file, it is 'g'
    %       Centers y - an optional c-by-d matrix, where c is the number of
    %             clusters and d is the dimensions of the problem. The matrix
    %             gives the location of the cluster centers. If this is not
    %             given, the centers will be calculated. In my file, I think, it is 'y'
    %       Colors - an optional color scheme generated by hsv. If this is not
    %             given, a color scheme will be generated.
    %
    function f = PlotClusters(cobat,g,y,Colors)
    %Checking inputs
    switch nargin
        case 1 %Not enough inputs
            error('Clustering data is required to plot clusters. Usage: PlotClusters(Data,IDX,Centers,Colors)')
        case 2 %Need to calculate cluster centers and color scheme
            [NumOfDataPoints,Dimensions]=size(cobat);
            if Dimensions~=2 && Dimensions~=3 %Check ability to plot
                error('It is only possible to plot in 2 or 3 dimensions.')
            end
            if length(g)~=NumOfDataPoints %Check that each data point is assigned to a cluster
                error('The number of data points in Data must be equal to the number of indices in IDX.')
            end
            NumOfClusters=max(g);
            Centers=zeros(NumOfClusters,Dimensions);
            NumOfCenters=NumOfClusters;
            NumOfPointsInCluster=zeros(NumOfClusters,1);
            for i=1:NumOfDataPoints
                Centers(g(i),:)=y(g(i),:)+cobat(i,:);
                NumOfPointsInCluster(g(i))=NumOfPointsInCluster(g(i))+1;
            end
            for i=1:NumOfClusters
                y(i,:)=y(i,:)/NumOfPointsInCluster(i);
            end
            Colors=hsv(NumOfClusters);        
        case 3 %Need to calculate color scheme        
            [NumOfDataPoints,Dimensions]=size(cobat);
            if Dimensions~=2 && Dimensions~=3 %Check ability to plot
                error('It is only possible to plot in 2 or 3 dimensions.')
            end
            if length(g)~=NumOfDataPoints %Check that each data point is assigned to a cluster
                error('The number of data points in Data must be equal to the number of indices in IDX.')
            end
            NumOfClusters=max(g);
            [NumOfCenters,Dims]=size(y);
            if Dims~=Dimensions
                error('The number of dimensions in Data should be equal to the number of dimensions in Centers')
            end
            if NumOfCenters<NumOfClusters %Check that each cluster has a center
                error('The number of cluster centers is smaller than the number of clusters.')
            elseif NumOfCenters>NumOfClusters %Check that each cluster has a center
                disp('There are more centers than clusters, all will be plotted')
            end
            Colors=hsv(NumOfCenters);
        case 4 %All data is given just need to check consistency        
            [NumOfDataPoints,Dimensions]=size(cobat);
            if Dimensions~=2 && Dimensions~=3 %Check ability to plot
                error('It is only possible to plot in 2 or 3 dimensions.')
            end
            if length(g)~=NumOfDataPoints %Check that each data point is assigned to a cluster
                error('The number of data points in Data must be equal to the number of indices in IDX.')
            end
            NumOfClusters=max(g);
            [NumOfCenters,Dims]=size(y);
            if Dims~=Dimensions
                error('The number of dimensions in Data should be equal to the number of dimensions in Centers')
            end
            if NumOfCenters<NumOfClusters %Check that each cluster has a center
                error('The number of cluster centers is smaller than the number of clusters.')
            elseif NumOfCenters>NumOfClusters %Check that each cluster has a center
                disp('There are more centers than clusters, all will be plotted')
            end
            [NumOfColors,RGB]=size(Colors);
            if RGB~=3 || NumOfColors<NumOfCenters
                error('Colors should have at least the same number of rows as number of clusters and 3 columns')
            end            
    end
    %Data is ready. Now plotting
    
    end
    

    Here is the error:

    ??? Undefined function or variable 'Colors'.
    
    Error in ==> clustere at 69
        f = PlotClusters(cobat,g,y,Colors)
    

    Am I wrong call the function like that? What should I do? Your help will be appreciated a lot.

  • Alvi Syahrin
    Alvi Syahrin about 11 years
    Thank you for the answer, Oleg!
  • Alvi Syahrin
    Alvi Syahrin about 11 years
    Amro, thank you a lot! Actually I also wrote in another script using kmeans function on MATLAB, just like yours. But your scatter3() code is really useful for me. It is very efficient. Thank you so much! But my graph looks messy. Is it just because the file I used, or the clustering of mine isn't working properly?
  • Alvi Syahrin
    Alvi Syahrin about 11 years
    Sorry,, it's fixed. I just need to rotate the volume to see the better view. Thank you!
  • Amro
    Amro about 11 years
    use the mouse and the rotate tool for that :)
  • Alvi Syahrin
    Alvi Syahrin almost 11 years
    Hi, Amro. I want to ask; what is 'Marker' for? Is it for the marker type, like '.' and 'o'?And then, what is the meaning of 36 and 100 on that script? The last, can you tell me the meaning of axis vis3d?
  • Amro
    Amro almost 11 years
    @AlviSyahrin: This all can be found in the MATLAB docs: 1) marker type 2) marker area 3) axis vis3d freezes aspect ratio properties to enable rotation of 3-D objects and overrides stretch-to-fill
  • Tak
    Tak almost 11 years
    @Amro I'm trying to use the above code on a 3D data found in this link where the K is 3 and it always gives this error"Empty cluster created at iteration 1." could you please assist me? dropbox.com/s/rgatmmg2cx2z1cv/matlab_X.mat
  • Amro
    Amro almost 11 years
    @user1460166: I posted an answer on your question at stackoverflow.com/questions/18009664/…