Tags: multiple outputs, sigmoid, lecture-16, softmax
A neural network classifies images into 5 categories.
Suppose the categories are mutually exclusive (each image belongs to exactly one category). Which activation function should be used at the output layer: sigmoid or softmax?
Softmax.
Since the categories are mutually exclusive, we want the output probabilities to represent a single probability distribution over the 5 classes. Softmax enforces this by ensuring the outputs sum to 1.
True or False: with the activation from part (a), the 5 outputs must sum to 1.
True.
The softmax function produces outputs that sum to 1 by definition:
Now suppose an image can belong to multiple categories simultaneously. Which activation function should be used at the output layer: sigmoid or softmax?
Sigmoid.
Since the categories are not mutually exclusive, each output node independently predicts the probability that the image belongs to that category. Sigmoid is applied independently to each output.
True or False: with the activation from part (c), the 5 outputs must sum to 1.
False.
Sigmoid is applied independently to each output node, so there is no constraint that the outputs sum to 1. For example, if the image contains both a cat and a dog, the network might output high probabilities for both.