knn imputation of categorical variables in python

python machine-learning knn imputation

10,945

I was able to impute the categorical variables using the steps listed below. I will gladly welcome any omissions or program that can perform such tasks automatically

Step1: Subsets the object's data types(all) into another container

Step2: Change np.NaN into an object data type, say None. Now, the container is made up of only objects data types

Step3: Change the entire container into categorical datasets

Step4: Encode the data set(i am using .cat.codes)

Step5: Change back the value of encoded None into np.NaN

Step5: Use KNN (from fancyimpute) to impute the missing values

Step6: Re-map the encoded dataset to its initial names

10,945

Author by

KINNI

Updated on June 04, 2022

Comments

KINNI almost 2 years
I am trying to implement kNN from the fancyimpute module on a dataset. I was able to implement the code for continuous variables of the datasets using the code below:
```
knn_impute2=KNN(k=3).complete(train[['LotArea','LotFrontage']]) 
```
It yields the desirable answer as follows: This show how the original dataset looks like and how it has changed using knn imputation

I tried to implement the same code for categorical datasets and I get error :
```
could not convert string to float: 'female'
```
Here is the code I used(I am trying to use Imputer):
```
from sklearn.preprocessing import Imputer
imp = Imputer(missing_values='NaN', strategy='most_frequent', axis=0)
imp.fit(df['sex'])
print(imp.transform(df['sex']))
```
What am I doing wrong?

Recap, I want to use knn imputation on this dataset to impute the sex columns. Below is the dataset.

The dataset i want to impute using knn imputation with k value 2

How can i do that with knnimpute or i need to write my own functions. If yes, can anyone help me. Thnks