Sklearn StratifiedKFold: ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead

53,879

Solution 1

keras.utils.to_categorical produces a one-hot encoded class vector, i.e. the multilabel-indicator mentioned in the error message. StratifiedKFold is not designed to work with such input; from the split method docs:

split(X, y, groups=None)

[...]

y : array-like, shape (n_samples,)

The target variable for supervised learning problems. Stratification is done based on the y labels.

i.e. your y must be a 1-D array of your class labels.

Essentially, what you have to do is simply to invert the order of the operations: split first (using your intial y_train), and convert to_categorical afterwards.

Solution 2

Call to split() like this:

for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical.argmax(1))):
    x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]
    y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]

Solution 3

I bumped into the same problem and found out that you can check the type of the target with this util function:

from sklearn.utils.multiclass import type_of_target
type_of_target(y)

'multilabel-indicator'

From its docstring:

  • 'binary': y contains <= 2 discrete values and is 1d or a column vector.
  • 'multiclass': y contains more than two discrete values, is not a sequence of sequences, and is 1d or a column vector.
  • 'multiclass-multioutput': y is a 2d array that contains more than two discrete values, is not a sequence of sequences, and both dimensions are of size > 1.
  • 'multilabel-indicator': y is a label indicator matrix, an array of two dimensions with at least two columns, and at most 2 unique values.

With LabelEncoder you can transform your classes into an 1d array of numbers (given your target labels are in an 1d array of categoricals/object):

from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
y = label_encoder.fit_transform(target_labels)

Solution 4

If your target variable is continuous then use simple KFold cross validation instead of StratifiedKFold.

from sklearn.model_selection import KFold
kfold = KFold(n_splits=5, shuffle=True, random_state=42)

Solution 5

Complementing what @desertnaut said, in order to convert your one-hot-encoding back to 1-D array you will only need to do is:

class_labels = np.argmax(y_train, axis=1)

This will convert back to the initial representation of your classes.

Share:
53,879
jKraut
Author by

jKraut

Updated on October 26, 2021

Comments

  • jKraut
    jKraut over 2 years

    Working with Sklearn stratified kfold split, and when I attempt to split using multi-class, I received on error (see below). When I tried and split using binary, it works no problem.

    num_classes = len(np.unique(y_train))
    y_train_categorical = keras.utils.to_categorical(y_train, num_classes)
    kf=StratifiedKFold(n_splits=5, shuffle=True, random_state=999)
    
    # splitting data into different folds
    for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical)):
        x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]
        y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]
    
    ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.
    
  • Mehraban
    Mehraban almost 6 years
    What is the point of using StratifiedKFold if you do not pass the labels to it? Simply use KFold instead.
  • Shadi
    Shadi almost 6 years
    StratifiedKFold would normally use the target, but in my particular shortcut, I'm passing 0's for the target, so you're right
  • Minions
    Minions over 5 years
    i din't think that this is a good idea, because in a unbalanced dataset with multi-class classiffication problem, maybe the validation part what you want to convert it's labels doesn't contain all the classes. So, when you call to_categorical(val, n_class) it will raise an error ..
  • desertnaut
    desertnaut over 5 years
    @Minion this is not correct; StratifiedKFold takes care that "The folds are made by preserving the percentage of samples for each class" (docs). In very special cases where some of the classes are very under-represented some extra caution (and manual checks) is obviously recommended, but the answer here is about the general case only and not for other, hypothetical ones...
  • Minions
    Minions over 5 years
    Good, thanx for clarififcation .. just to ensure
  • Elvin Aghammadzada
    Elvin Aghammadzada about 3 years
    Don't know the reason but it actually didn't work for me