Sklearn StratifiedKFold: ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead
Solution 1
keras.utils.to_categorical
produces a one-hot encoded class vector, i.e. the multilabel-indicator
mentioned in the error message. StratifiedKFold
is not designed to work with such input; from the split
method docs:
split
(X, y, groups=None)[...]
y : array-like, shape (n_samples,)
The target variable for supervised learning problems. Stratification is done based on the y labels.
i.e. your y
must be a 1-D array of your class labels.
Essentially, what you have to do is simply to invert the order of the operations: split first (using your intial y_train
), and convert to_categorical
afterwards.
Solution 2
Call to split()
like this:
for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical.argmax(1))):
x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]
y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]
Solution 3
I bumped into the same problem and found out that you can check the type of the target with this util
function:
from sklearn.utils.multiclass import type_of_target
type_of_target(y)
'multilabel-indicator'
From its docstring:
- 'binary':
y
contains <= 2 discrete values and is 1d or a column vector.- 'multiclass':
y
contains more than two discrete values, is not a sequence of sequences, and is 1d or a column vector.- 'multiclass-multioutput':
y
is a 2d array that contains more than two discrete values, is not a sequence of sequences, and both dimensions are of size > 1.- 'multilabel-indicator':
y
is a label indicator matrix, an array of two dimensions with at least two columns, and at most 2 unique values.
With LabelEncoder
you can transform your classes into an 1d array of numbers (given your target labels are in an 1d array of categoricals/object):
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(target_labels)
Solution 4
If your target variable is continuous then use simple KFold cross validation instead of StratifiedKFold.
from sklearn.model_selection import KFold
kfold = KFold(n_splits=5, shuffle=True, random_state=42)
Solution 5
Complementing what @desertnaut said, in order to convert your one-hot-encoding
back to 1-D array you will only need to do is:
class_labels = np.argmax(y_train, axis=1)
This will convert back to the initial representation of your classes.
jKraut
Updated on October 26, 2021Comments
-
jKraut over 2 years
Working with Sklearn stratified kfold split, and when I attempt to split using multi-class, I received on error (see below). When I tried and split using binary, it works no problem.
num_classes = len(np.unique(y_train)) y_train_categorical = keras.utils.to_categorical(y_train, num_classes) kf=StratifiedKFold(n_splits=5, shuffle=True, random_state=999) # splitting data into different folds for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical)): x_train_kf, x_val_kf = x_train[train_index], x_train[val_index] y_train_kf, y_val_kf = y_train[train_index], y_train[val_index] ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.
-
Mehraban almost 6 yearsWhat is the point of using
StratifiedKFold
if you do not pass the labels to it? Simply useKFold
instead. -
Shadi almost 6 years
StratifiedKFold
would normally use the target, but in my particular shortcut, I'm passing 0's for the target, so you're right -
Minions over 5 yearsi din't think that this is a good idea, because in a unbalanced dataset with multi-class classiffication problem, maybe the validation part what you want to convert it's labels doesn't contain all the classes. So, when you call to_categorical(val, n_class) it will raise an error ..
-
desertnaut over 5 years@Minion this is not correct;
StratifiedKFold
takes care that "The folds are made by preserving the percentage of samples for each class" (docs). In very special cases where some of the classes are very under-represented some extra caution (and manual checks) is obviously recommended, but the answer here is about the general case only and not for other, hypothetical ones... -
Minions over 5 yearsGood, thanx for clarififcation .. just to ensure
-
Elvin Aghammadzada about 3 yearsDon't know the reason but it actually didn't work for me