R - convert from categorical to numeric for KNN

16,227

Solution 1

When data are read in via read.table, the data in the first column are factors. Then

data$iGender = as.integer(data$Gender) 

would work. If they are character, a detour via factor is easiest:

data$iGender= as.integer(as.factor(data$Gender))

Solution 2

The first answer seems like a really bad idea. Coding {"M","F","I"} to {1, 2, 3} implies that Infant = 3 * Male, Male = Female/2 and so on.

KNN via caret does allow categorical values as predictors if you use the formula methods. Otherwise you need to encode them as binary dummy variables.

Also, showing your code and having a reproducible example would help a lot.

Max

Solution 3

One of easiest way to use kNN algorithm in your dataset in which one of its feature is categorical : "M", "F" and "I" as you mentioned is as follows: Just in your CVS or Excel file that your dataset exsits, go ahead in the right column and change M to 1 and F to 2 and I to 3. In this case you have discrete value in your dataset and you can easily use kNN algorithm using R.

Share:
16,227
Minoru
Author by

Minoru

PhD. candidate of Federal University of Pernambuco.

Updated on July 14, 2022

Comments

  • Minoru
    Minoru almost 2 years

    I'm trying to use the Caret package of R to use the KNN applied to the "abalone" database from UCI Machine Learning (link to the data). But it doesn't allow to use KNN when there's categorical values. How do I convert the categorical values (in this database: "M","F","I") to numeric values, such as 1,2,3, respectively?

  • Minoru
    Minoru almost 9 years
    Thank you! I was trying to use data[1] instead of $V1.
  • B_Miner
    B_Miner almost 8 years
    Actually, this type of encoding is quite useful for tree based algorithm (e.g. xgboost) and is used in the LabelEncoder pre-processing library of scikit learn. I was just out looking to see if R had this type of functionality built in.
  • TYZ
    TYZ about 7 years
    This is a bad idea for unordered categorical variables.
  • Rachel Zhang
    Rachel Zhang about 5 years
    May I ask how to use the formula methods in caret to allow categorical variables? Can I directly throw a dataset k = train(y~., method='knn', trControl=train.control, preProcess=c('scale','center','pca'), data=data.frame(train)) with categorical variables?