knn imputation of categorical variables in python

10,945

I was able to impute the categorical variables using the steps listed below. I will gladly welcome any omissions or program that can perform such tasks automatically

Step1: Subsets the object's data types(all) into another container

Step2: Change np.NaN into an object data type, say None. Now, the container is made up of only objects data types

Step3: Change the entire container into categorical datasets

Step4: Encode the data set(i am using .cat.codes)

Step5: Change back the value of encoded None into np.NaN

Step5: Use KNN (from fancyimpute) to impute the missing values

Step6: Re-map the encoded dataset to its initial names

Share:
10,945
KINNI
Author by

KINNI

Updated on June 04, 2022

Comments

  • KINNI
    KINNI almost 2 years

    I am trying to implement kNN from the fancyimpute module on a dataset. I was able to implement the code for continuous variables of the datasets using the code below:

    knn_impute2=KNN(k=3).complete(train[['LotArea','LotFrontage']]) 
    

    It yields the desirable answer as follows: This show how the original dataset looks like and how it has changed using knn imputation

    I tried to implement the same code for categorical datasets and I get error :

    could not convert string to float: 'female'
    

    Here is the code I used(I am trying to use Imputer):

    from sklearn.preprocessing import Imputer
    imp = Imputer(missing_values='NaN', strategy='most_frequent', axis=0)
    imp.fit(df['sex'])
    print(imp.transform(df['sex']))
    

    What am I doing wrong?

    Recap, I want to use knn imputation on this dataset to impute the sex columns. Below is the dataset.

    The dataset i want to impute using knn imputation with k value 2

    How can i do that with knnimpute or i need to write my own functions. If yes, can anyone help me. Thnks