ValueError: Can't handle mix of multilabel-indicator and binary
Following Vivek's comment, I used the original (not one-hot-encoded) target array, and I configured (in my Keras model, see code) the loss sparse_categorical_crossentropy
, as per the comments to this issue.
arch.compile(
optimizer='sgd',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
gc5
Updated on June 13, 2022Comments
-
gc5 almost 2 years
I am using Keras with the scikit-learn wrapper. In particular, I want to use GridSearchCV for hyper-parameters optimisation.
This is a multi-class problem, i.e. the target variable can have only one label chosen on a set of n classes. For instance, the target variable can be 'Class1', 'Class2' ... 'Classn'.
# self._arch creates my model nn = KerasClassifier(build_fn=self._arch, verbose=0) clf = GridSearchCV( nn, param_grid={ ... }, # I use f1 score macro averaged scoring='f1_macro', n_jobs=-1) # self.fX is the data matrix # self.fy_enc is the target variable encoded with one-hot format clf.fit(self.fX.values, self.fy_enc.values)
The problem is that, when score is computed during cross-validation, the true label for validation samples is encoded one-hot, while the prediction for some reason collapses to binary label (when the target variable has only two classes). For instance, this is the last part of the stack trace:
........................................................................... /Users/fbrundu/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/metrics/classification.py in _check_targets(y_true=array([[ 0., 1.], [ 0., 1.], [ 0... 0., 1.], [ 0., 1.], [ 0., 1.]]), y_pred=array([1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1,...0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1])) 77 if y_type == set(["binary", "multiclass"]): 78 y_type = set(["multiclass"]) 79 80 if len(y_type) > 1: 81 raise ValueError("Can't handle mix of {0} and {1}" ---> 82 "".format(type_true, type_pred)) type_true = 'multilabel-indicator' type_pred = 'binary' 83 84 # We can't have more than one value on y_type => The set is no more needed 85 y_type = y_type.pop() 86 ValueError: Can't handle mix of multilabel-indicator and binary
How can I instruct Keras/sklearn to give back predictions in one-hot encoding?