In storage.mode(x) <- "double" : NAs introduced by coercion in kmeans

19,016

kmeans can only be used on numerical columns, because it needs to compute the mean.

Don't use it on "ID" columns, text columns etc. where it does not make sense to compute the mean. It appears that you are trying to run kmeans on such 'bad' columns.

Share:
19,016

Related videos on Youtube

Srihari Mohan
Author by

Srihari Mohan

Updated on June 04, 2022

Comments

  • Srihari Mohan
    Srihari Mohan almost 2 years

    I am trying to split my data into 5 clusters. But I am getting the following error

    > colSums(sapply(train1,is.na))
         train_id              name item_condition_id     category_name 
                0                 0                 0                 0 
       brand_name             price          shipping  item_description 
                0                 0                 0                 0 
    > train1matrix=as.matrix(train1)
    > train1vector=as.vector(train1matrix)
    > k=5
    > set.seed(88)
    > KMC=kmeans(train1vector,centers=k,iter.max=1000)
    Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
    In addition: Warning message:
    In storage.mode(x) <- "double" : NAs introduced by coercion
    

    Can someone please help me? Thank you in advance

    • MKR
      MKR over 6 years
      Please use fput to share data in train1.
    • Srihari Mohan
      Srihari Mohan over 6 years
      Thank you... But I am yet to try..do you mean to tell that to copy data from old data frame to new dataframe, I shud use fput?
    • Marco Sandri
      Marco Sandri over 6 years
      You should share the output of dput(train1) or, at least dput(train1[1:20,])