How can I get cluster number correspond to data using k-means clustering techniques in R?

r cluster-analysis k-means

29,065

Solution 1

It sounds like you are trying to access the cluster vector that is returned by kmeans(). From the help page for cluster:

A vector of integers (from 1:k) indicating the cluster to which each 
point is allocated.

Using the example on the help page:

x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
(cl <- kmeans(x, 2))

#Access the cluster vector
cl$cluster

> cl$cluster
  [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [45] 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [89] 1 1 1 1 1 1 1 1 1 1 1 1

To address the question in the comments

You can "map" the cluster number to the original data by doing something like this:

out <- cbind(x, clusterNum = cl$cluster)
head(out)

               x          y clusterNum
[1,] -0.42480483 -0.2168085          2
[2,] -0.06272004  0.3641157          2
[3,]  0.08207316  0.2215622          2
[4,] -0.19539844  0.1306106          2
[5,] -0.26429056 -0.3249288          2
[6,]  0.09096253 -0.2158603          2

cbind is the function for column bind, there is also an rbind function for rows. See their help pages for more details ?cbind and ?rbind respectively.

Solution 2

@ Java questioner

You can access the cluster data as followed:

> data_clustered <- kmeans(data)
> data_clustered$cluster

data_clustered$cluster is a vector with the length of the original number of records in data. Each entry is for the that row.

To get all the records belonging to cluster 1:

> data$cluster <- data_clustered$cluster 
> data_clus_1 <- data[data$cluster == 1,]

Number of clusters:

> max(data$cluster)

Good luck with your clustering

29,065

Author by

Java questioner

Updated on November 28, 2020

Comments

Java questioner over 3 years

I clustered data by k-means clustering method, how can i get cluster number correspond to data using k-means clustering techniques in R? In order to get each record belongs to which cluster.

example 12 32 13 => 1. 12,13 2. 32