LabelEncoder specify classes in DataFrame

11,203

Solution 1

You could fit the label encoder and later transform the labels to their normalized encoding as follows:

In [4]: from sklearn import preprocessing
   ...: import numpy as np

In [5]: le = preprocessing.LabelEncoder()

In [6]: le.fit(np.unique(df.values))
Out[6]: LabelEncoder()

In [7]: list(le.classes_)
Out[7]: ['A', 'B', 'C', 'D', 'E']

In [8]: df.apply(le.transform)
Out[8]: 
   Feat1  Feat2  Feat3  Feat4  Feat5
0      0      0      0      0      4
1      1      1      2      2      4
2      2      3      2      2      4
3      3      0      2      3      4

One way to specify labels by default would be:

In [9]: labels = ['A', 'B', 'C', 'D', 'E']

In [10]: enc = le.fit(labels)

In [11]: enc.classes_                       # sorts the labels in alphabetical order
Out[11]: 
array(['A', 'B', 'C', 'D', 'E'], 
      dtype='<U1')

In [12]: enc.transform('E')
Out[12]: 4

Solution 2

You can fit and transform in single statement, Please find the code for encoding single column and assigning back to data frame.

df[columnName] = LabelEncoder().fit_transform(df[columnName])
Share:
11,203

Related videos on Youtube

gbhrea
Author by

gbhrea

Student, experienced in Java, Python. Beginner - Javascript, D3.js

Updated on June 15, 2022

Comments

  • gbhrea
    gbhrea almost 2 years

    I’m applying a LabelEncoder to a pandas DataFrame, df

    Feat1  Feat2  Feat3  Feat4  Feat5
      A      A      A      A      E
      B      B      C      C      E
      C      D      C      C      E
      D      A      C      D      E
    

    I'm applying a label encoder to a dataframe like this -

    from sklearn import preprocessing
    le = preprocessing.LabelEncoder()
    intIndexed = df.apply(le.fit_transform)
    

    This is how the labels are mapped

    A = 0
    B = 1
    C = 2
    D = 3
    E = 0
    

    I'm guessing that E isn't given the value of 4 as it doesn't appear in any other column other than Feat 5 .

    I want E to be given the value of 4 - but don't know how to do this in a DataFrame.

    • Zero
      Zero over 6 years
      You could use df.replace({'A': 0, 'B': 1, 'C': 2, 'D': 3, 'E': 4})?
  • gbhrea
    gbhrea over 7 years
    Thanks for your answer Nickil, but this has changed the mapping to A = 1, B = 2, C = 3, D = 4, E = 0. Can I specify what values I want ?
  • Nickil Maveli
    Nickil Maveli over 7 years
    Yes, you can specify the labels that needs to be encoded [see edited answer]. But theLabelEncoder sorts them internally and returns the sorted list.
  • Willie D
    Willie D almost 7 years
    Is there a way to put this into a pipeline?