LabelEncoder specify classes in DataFrame
11,203
Solution 1
You could fit
the label encoder and later transform
the labels to their normalized encoding as follows:
In [4]: from sklearn import preprocessing
...: import numpy as np
In [5]: le = preprocessing.LabelEncoder()
In [6]: le.fit(np.unique(df.values))
Out[6]: LabelEncoder()
In [7]: list(le.classes_)
Out[7]: ['A', 'B', 'C', 'D', 'E']
In [8]: df.apply(le.transform)
Out[8]:
Feat1 Feat2 Feat3 Feat4 Feat5
0 0 0 0 0 4
1 1 1 2 2 4
2 2 3 2 2 4
3 3 0 2 3 4
One way to specify labels by default would be:
In [9]: labels = ['A', 'B', 'C', 'D', 'E']
In [10]: enc = le.fit(labels)
In [11]: enc.classes_ # sorts the labels in alphabetical order
Out[11]:
array(['A', 'B', 'C', 'D', 'E'],
dtype='<U1')
In [12]: enc.transform('E')
Out[12]: 4
Solution 2
You can fit and transform in single statement, Please find the code for encoding single column and assigning back to data frame.
df[columnName] = LabelEncoder().fit_transform(df[columnName])
Related videos on Youtube
Author by
gbhrea
Student, experienced in Java, Python. Beginner - Javascript, D3.js
Updated on June 15, 2022Comments
-
gbhrea almost 2 years
I’m applying a LabelEncoder to a pandas DataFrame,
df
Feat1 Feat2 Feat3 Feat4 Feat5 A A A A E B B C C E C D C C E D A C D E
I'm applying a label encoder to a dataframe like this -
from sklearn import preprocessing le = preprocessing.LabelEncoder() intIndexed = df.apply(le.fit_transform)
This is how the labels are mapped
A = 0 B = 1 C = 2 D = 3 E = 0
I'm guessing that
E
isn't given the value of4
as it doesn't appear in any other column other thanFeat 5
.I want
E
to be given the value of4
- but don't know how to do this in a DataFrame.-
Zero over 6 yearsYou could use
df.replace({'A': 0, 'B': 1, 'C': 2, 'D': 3, 'E': 4})
?
-
-
gbhrea over 7 yearsThanks for your answer Nickil, but this has changed the mapping to A = 1, B = 2, C = 3, D = 4, E = 0. Can I specify what values I want ?
-
Nickil Maveli over 7 yearsYes, you can specify the labels that needs to be encoded [see edited answer]. But the
LabelEncoder
sorts them internally and returns the sorted list. -
Willie D almost 7 yearsIs there a way to put this into a pipeline?