How does TensorFlow SparseCategoricalCrossentropy work?

15,715

Solution 1

SparseCategoricalCrossentropy and CategoricalCrossentropy both compute categorical cross-entropy. The only difference is in how the targets/labels should be encoded.

When using SparseCategoricalCrossentropy the targets are represented by the index of the category (starting from 0). Your outputs have shape 4x2, which means you have two categories. Therefore, the targets should be a 4 dimensional vector with entries that are either 0 or 1. For example:

scce = tf.keras.losses.SparseCategoricalCrossentropy();
Loss = scce(
  tf.constant([ 0,    0,    0,    1   ], tf.float32),
  tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32))

This in contrast to CategoricalCrossentropy where the labels should be one-hot encoded:

cce = tf.keras.losses.CategoricalCrossentropy();
Loss = cce(
  tf.constant([ [1,0]    [1,0],    [1, 0],   [0, 1]   ], tf.float32),
  tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32))

SparseCategoricalCrossentropy is more efficient when you have a lot of categories.

Solution 2

I wanted to add a few more things that may be confusing. The SparseCategoricalCrossentropy has two arguments which are very important to specify. The first is from_logits; recall logits are the outputs of a network that HASN'T been normalized via a Softmax(or Sigmoid). The second is reduction. It is normally set to 'auto', which computes the categorical cross-entropy as normal, which is the average of label*log(pred). But setting the value to 'none' will actually give you each element of the categorical cross-entropy label*log(pred), which is of shape (batch_size). Computing a reduce_mean on this list will give you the same result as with reduction='auto'.

# Assuming TF2.x
import tensorflow as tf

model_predictions = tf.constant([[1,2], [3,4], [5,6], [7,8]], tf.float32)
labels_sparse = tf.constant([1, 0, 0, 1 ], tf.float32)
labels_dense = tf.constant([[1,0], [1,0], [1,0], [0,1]], tf.float32)

loss_obj_scc = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True,
    reduction='auto'
)
loss_from_scc = loss_obj_scc(
    labels_sparse,
    model_predictions,
  )


loss_obj_cc = tf.keras.losses.CategoricalCrossentropy(
    from_logits=True,
    reduction='auto'
)
loss_from_cc = loss_obj_cc(
    labels_dense,
    model_predictions,
  )


print(loss_from_scc, loss_from_cc)
>> (<tf.Tensor: shape=(), dtype=float32, numpy=0.8132617>,
 <tf.Tensor: shape=(), dtype=float32, numpy=1.0632616>)
# With `reduction='none'`
loss_obj_scc_red = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True,
    reduction='none')

loss_from_scc_red = loss_obj_scc_red(
    labels_sparse,
    model_predictions,
  )

print(loss_from_scc_red, tf.math.reduce_mean(loss_from_scc_red))

>> (<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0.31326166, 1.3132616 , 
1.3132616 , 0.31326166], dtype=float32)>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.8132617>)
Share:
15,715
Dee
Author by

Dee

A diligent software engineer

Updated on June 16, 2022

Comments

  • Dee
    Dee almost 2 years

    I'm trying to understand this loss function in TensorFlow but I don't get it. It's SparseCategoricalCrossentropy. All other loss functions need outputs and labels of the same shape, this specific loss function doesn't.

    Source code:

    import tensorflow as tf;
    
    scce = tf.keras.losses.SparseCategoricalCrossentropy();
    Loss = scce(
      tf.constant([ 1,    1,    1,    2   ], tf.float32),
      tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32)
    );
    print("Loss:", Loss.numpy());
    

    The error is:

    InvalidArgumentError: Received a label value of 2 which is outside the valid range of [0, 2).  
    Label values: 1 1 1 2 [Op:SparseSoftmaxCrossEntropyWithLogits]
    

    How to provide proper params to the loss function SparseCategoricalCrossentropy?

  • mauriii
    mauriii over 3 years
    That third sentence was the reason my NN wasn't working! I've look everywhere and this was the only place that clarified that clearly! +1
  • Aral Roca
    Aral Roca about 3 years
    And I should use the softmax activation function un the last layer, in the same way than categoricalCrossentropy?
  • Bersan
    Bersan about 3 years
    @AralRoca Based on the example on the tensorflow page, if you set from_logits=True then you don't need to specify the activation of the last layer (tensorflow.org/tutorials/images/…). It shouldn't matter but it makes it more numerically stable (stackoverflow.com/a/57304538/2076973)