How to fix kmeans error in r : 'more cluster centers than distinct data points'

18,443

Fix for this is to use :

cells = c(read.csv("c:\\data-files\\kmeans\\cells.csv", header = FALSE))
rnames = c(read.csv("c:\\data-files\\kmeans\\rnames.csv", header = FALSE))
cnames = c(read.csv("c:\\data-files\\kmeans\\cnames.csv", header = FALSE))

instead of

cells = c(read.csv("c:\\data-files\\kmeans\\cells.csv", header = TRUE))
rnames = c(read.csv("c:\\data-files\\kmeans\\rnames.csv", header = TRUE))
cnames = c(read.csv("c:\\data-files\\kmeans\\cnames.csv", header = TRUE))
Share:
18,443
blue-sky
Author by

blue-sky

scala :: java

Updated on June 15, 2022

Comments

  • blue-sky
    blue-sky almost 2 years

    When I run a kmeans algorithm I receive this error :

    Error in kmeans(x, 2, 15) : 
      more cluster centers than distinct data points.
    

    How can this error be fixed and what does it mean ? I think my data points are distinct ?

    Here are my files and the r code I am using to generate kmeans :

    rnames.csv : 
    "a1","a2","a3"
    
    cells.csv : 
    0,1,2,1,4,3,5,3,4
    
    cnames.csv : 
    "google","so","test"
    
    cells = c(read.csv("c:\\data-files\\kmeans\\cells.csv", header = TRUE))
    rnames = c(read.csv("c:\\data-files\\kmeans\\rnames.csv", header = TRUE))
    cnames = c(read.csv("c:\\data-files\\kmeans\\cnames.csv", header = TRUE))
    
    x <- matrix(cells, nrow=3, ncol=3, byrow=TRUE, dimnames=list(rnames, cnames))
    
    # run K-Means
    km <- kmeans(x, 2, 15)