DSC 140B
Quiz 08

Quiz 08

Practice problems for topics on Quiz 08.

Tags in this problem set:

Problem #161

Tags: lecture-15, convolutional neural networks

A grayscale image of size \(32 \times 32 \times 1\) is convolved with a filter of size \(5 \times 5\). No padding is applied, and the stride is 1. What is the shape of the output response map?

Solution

\(28 \times 28 \times 1\).

With no padding and stride 1, the output height and width are each \(32 - 5 + 1 = 28\). The filter slides over each \(5 \times 5\) block of the image from left to right and top to bottom, producing one output value per position.

Problem #162

Tags: lecture-15, convolutional neural networks

An input \(5 \times 5\) grayscale image \((I)\) is represented by the matrix below.

\[ I = \begin{pmatrix} 0.2 & 0.1 & 0.4 & 0 & 0.3 \\ 0 & 0.5 & 0.2 & 0.7 & 0 \\ 0.3 & 0 & 0.6 & 0.1 & 0.5 \\ 0.1 & 0.4 & 0 & 0.3 & 0.2 \\ 0 & 0.2 & 0.5 & 0 & 0.4 \end{pmatrix}\]

Suppose you convolve \(I\) with the \(3 \times 3\) filter

\[ F = \begin{pmatrix} 1 & 0 & -1 \\ 0 & 1 & 0 \\ -1 & 0 & 1 \end{pmatrix}\]

to get the response map \(I'\)(with stride 1 and no padding). What is the value of \(I'_{11}\) -- the entry in the 1st row and 1st column of the output?

Solution

\(I'_{11} = 0.6\).

The \(3 \times 3\) patch at the top-left corner of \(I\) is:

\[\begin{pmatrix} 0.2 & 0.1 & 0.4 \\ 0 & 0.5 & 0.2 \\ 0.3 & 0 & 0.6 \end{pmatrix}\]

Applying the filter element-wise and summing:

$$\begin{align*} I'_{11}&= 0.2 \cdot 1 + 0.1 \cdot 0 + 0.4 \cdot(-1) + 0 \cdot 0 + 0.5 \cdot 1 + 0.2 \cdot 0 \\&\quad + 0.3 \cdot(-1) + 0 \cdot 0 + 0.6 \cdot 1 \\&= 0.2 - 0.4 + 0.5 - 0.3 + 0.6 \\&= 0.6 \end{align*}$$

Problem #163

Tags: lecture-15, convolutional neural networks

An input image has shape \(80 \times 80 \times 11\), where \(11\) is the number of channels. We wish to convolve this image with a 3D filter of shape \(5 \times 5 \times k\). What must the value of \(k\) be for the convolution to work?

Solution

\(k = 11\).

A 3D convolution filter must have the same number of channels as the input. The filter slides spatially across the height and width of the image, but at each position it computes a dot product across all channels. Therefore the filter's third dimension must match the input's channel count.

Problem #164

Tags: lecture-15, convolutional neural networks

Consider a convolutional neural network with the following architecture. The input is a \(10 \times 10 \times 1\) grayscale image. It passes through Conv layer 1 (3 filters of size \(3 \times 3\), stride 1, no padding), producing an output of shape \(8 \times 8 \times 3\). Then \(2 \times 2\) max pooling is applied, producing an output of shape \(4 \times 4 \times 3\). Next is Conv layer 2 (5 filters of size \(3 \times 3 \times 3\), stride 1, no padding), producing an output of shape \(2 \times 2 \times 5\). This is flattened and fed into a fully connected layer with \(n\) nodes, followed by an output layer with 1 node.

Part 1)

What is the value of \(n\)?

Solution

\(n = 20\).

The output of Conv layer 2 is \(2 \times 2 \times 5\). Flattening this gives \(2 \times 2 \times 5 = 20\) values, so the fully connected layer has \(20\) nodes.

Part 2)

What is the total number of learnable parameters in the network, excluding biases?

Solution

\(182\).

Conv layer 1 has 3 filters of shape \(3 \times 3\), each with \(9\) weights, for \(3 \times 9 = 27\) parameters. Conv layer 2 has 5 filters of shape \(3 \times 3 \times 3\), each with \(27\) weights, for \(5 \times 27 = 135\) parameters. The fully connected layer connects to the output: \(20 \times 1 = 20\) parameters. The grand total is \(27 + 135 + 20 = 182\).

Note that max pooling has no learnable parameters.

Problem #165

Tags: lecture-15, convolutional neural networks

An input \(4 \times 4\) grayscale image \((I)\) is represented by the matrix below.

\[ I = \begin{pmatrix} 0.7 & 0.2 & 0.1 & 0.8 \\ 0.3 & 0.5 & 0.4 & 0.2 \\ 0.6 & 0.1 & 0.9 & 0.3 \\ 0.2 & 0.8 & 0.5 & 0.6 \end{pmatrix}\]

\(2 \times 2\) max pooling is applied to this image. What is the resulting output?

Solution

\(\begin{pmatrix} 0.7 & 0.8 \\ 0.8 & 0.9 \end{pmatrix}\).

With \(2 \times 2\) max pooling, we divide the \(4 \times 4\) image into four non-overlapping \(2 \times 2\) blocks and take the maximum of each. Top-left: \(\max\{0.7, 0.2, 0.3, 0.5\} = 0.7\). Top-right: \(\max\{0.1, 0.8, 0.4, 0.2\} = 0.8\). Bottom-left: \(\max\{0.6, 0.1, 0.2, 0.8\} = 0.8\). Bottom-right: \(\max\{0.9, 0.3, 0.5, 0.6\} = 0.9\).

Problem #166

Tags: multiple outputs, lecture-16, softmax

A neural network with 3 output nodes uses the softmax activation function. The pre-activation values (logits) at the output layer are \(\vec z = (0, 2, 0)\). Compute the softmax output \(\vec h = (h_1, h_2, h_3)\).

Leave your answer in terms of \(e\).

Solution
\[\vec h = \left(\frac{1}{2 + e^2},\;\frac{e^2}{2 + e^2},\;\frac{1}{2 + e^2}\right)\]

By the softmax formula, \(h_k = \frac{e^{z_k}}{\sum_{j=1}^{3} e^{z_j}}\). The denominator is:

\[ e^{0} + e^{2} + e^{0} = 1 + e^2 + 1 = 2 + e^2 \]

Therefore:

$$\begin{align*} h_1 &= \frac{e^0}{2 + e^2} = \frac{1}{2 + e^2}\\ h_2 &= \frac{e^2}{2 + e^2}\\ h_3 &= \frac{e^0}{2 + e^2} = \frac{1}{2 + e^2}\end{align*}$$

Problem #167

Tags: multiple outputs, lecture-16, softmax

A neural network with 4 output nodes uses the softmax activation function. The pre-activation values (logits) at the output layer are \(\vec z = (1, 3, 1, 3)\). Compute the softmax output \(\vec h = (h_1, h_2, h_3, h_4)\).

Leave your answer in terms of \(e\).

Solution
\[\vec h = \left(\frac{1}{2(1 + e^2)},\;\frac{e^2}{2(1 + e^2)},\;\frac{1}{2(1 + e^2)},\;\frac{e^2}{2(1 + e^2)}\right)\]

By the softmax formula, \(h_k = \frac{e^{z_k}}{\sum_{j=1}^{4} e^{z_j}}\). The denominator is:

\[ e^{1} + e^{3} + e^{1} + e^{3} = 2e + 2e^3 = 2e(1 + e^2) \]

Therefore:

$$\begin{align*} h_1 = h_3 &= \frac{e}{2e(1 + e^2)} = \frac{1}{2(1 + e^2)}\\ h_2 = h_4 &= \frac{e^3}{2e(1 + e^2)} = \frac{e^2}{2(1 + e^2)}\end{align*}$$

Problem #168

Tags: multiple outputs, lecture-16, softmax

A neural network with 3 output nodes uses the softmax activation function. The pre-activation values (logits) at the output layer are \(\vec z = (1, 2, 3)\). Compute the softmax output \(\vec h = (h_1, h_2, h_3)\).

Leave your answer in terms of \(e\).

Solution
\[\vec h = \left(\frac{1}{1 + e + e^2},\;\frac{e}{1 + e + e^2},\;\frac{e^2}{1 + e + e^2}\right)\]

By the softmax formula, \(h_k = \frac{e^{z_k}}{\sum_{j=1}^{3} e^{z_j}}\). The denominator is:

\[ e^{1} + e^{2} + e^{3} = e(1 + e + e^2) \]

Therefore:

$$\begin{align*} h_1 &= \frac{e}{e(1 + e + e^2)} = \frac{1}{1 + e + e^2}\\ h_2 &= \frac{e^2}{e(1 + e + e^2)} = \frac{e}{1 + e + e^2}\\ h_3 &= \frac{e^3}{e(1 + e + e^2)} = \frac{e^2}{1 + e + e^2}\end{align*}$$

Problem #169

Tags: binary cross-entropy, lecture-16, multiple outputs

A multi-label classifier has 3 output nodes with sigmoid activations. The true labels are \(\vec y = (1, 0, 1)\) and the predicted probabilities are \(\vec h = (0.9, 0.2, 0.8)\).

Compute the binary cross-entropy loss. Leave your answer in terms of \(\log\).

Solution

\(-\log(0.9) - \log(0.8) - \log(0.8) = -\log(0.9) - 2\log(0.8)\).

By the binary cross-entropy formula:

\[\ell(\vec h, \vec y) = -\sum_{k=1}^{3}\begin{cases} \log h_k, & \text{if } y_k = 1 \\ \log(1 - h_k), & \text{if } y_k = 0 \end{cases}\]

Evaluating each term:

$$\begin{align*} k = 1&: \quad y_1 = 1, \text{ so } -\log(0.9) \\ k = 2&: \quad y_2 = 0, \text{ so } -\log(1 - 0.2) = -\log(0.8) \\ k = 3&: \quad y_3 = 1, \text{ so } -\log(0.8) \end{align*}$$

The total is \(-\log(0.9) - 2\log(0.8)\).

Problem #170

Tags: binary cross-entropy, lecture-16, multiple outputs

A multi-label classifier has 4 output nodes with sigmoid activations. The true labels are \(\vec y = (0, 1, 0, 1)\) and the predicted probabilities are \(\vec h = (0.3, 0.7, 0.1, 0.9)\).

Compute the binary cross-entropy loss. Leave your answer in terms of \(\log\).

Solution

\(-2\log(0.7) - 2\log(0.9)\).

By the binary cross-entropy formula:

\[\ell(\vec h, \vec y) = -\sum_{k=1}^{4}\begin{cases} \log h_k, & \text{if } y_k = 1 \\ \log(1 - h_k), & \text{if } y_k = 0 \end{cases}\]

Evaluating each term:

$$\begin{align*} k = 1&: \quad y_1 = 0, \text{ so } -\log(1 - 0.3) = -\log(0.7) \\ k = 2&: \quad y_2 = 1, \text{ so } -\log(0.7) \\ k = 3&: \quad y_3 = 0, \text{ so } -\log(1 - 0.1) = -\log(0.9) \\ k = 4&: \quad y_4 = 1, \text{ so } -\log(0.9) \end{align*}$$

The total is \(-2\log(0.7) - 2\log(0.9)\).

Problem #171

Tags: binary cross-entropy, lecture-16, multiple outputs

A multi-label classifier has 3 output nodes with sigmoid activations. The true labels are \(\vec y = (1, 1, 0)\) and the predicted probabilities are \(\vec h = (0.8, 0.6, 0.4)\).

Compute the binary cross-entropy loss. Leave your answer in terms of \(\log\).

Solution

\(-\log(0.8) - 2\log(0.6)\).

By the binary cross-entropy formula:

\[\ell(\vec h, \vec y) = -\sum_{k=1}^{3}\begin{cases} \log h_k, & \text{if } y_k = 1 \\ \log(1 - h_k), & \text{if } y_k = 0 \end{cases}\]

Evaluating each term:

$$\begin{align*} k = 1&: \quad y_1 = 1, \text{ so } -\log(0.8) \\ k = 2&: \quad y_2 = 1, \text{ so } -\log(0.6) \\ k = 3&: \quad y_3 = 0, \text{ so } -\log(1 - 0.4) = -\log(0.6) \end{align*}$$

The total is \(-\log(0.8) - 2\log(0.6)\).

Problem #172

Tags: multiple outputs, lecture-16, categorical cross-entropy

A multi-class classifier has 4 output nodes with softmax activation. The true label is \(\vec y = (0, 0, 1, 0)\) and the softmax outputs are \(\vec h = (0.1, 0.2, 0.6, 0.1)\).

Compute the categorical cross-entropy loss. Leave your answer in terms of \(\log\).

Solution

\(-\log(0.6)\).

By the categorical cross-entropy formula:

\[\ell(\vec h, \vec y) = -\sum_{k=1}^{4}\begin{cases} \log h_k, & \text{if } y_k = 1 \\ 0, & \text{if } y_k = 0 \end{cases}\]

Only \(y_3 = 1\) contributes, so the loss is \(-\log(h_3) = -\log(0.6)\).

Problem #173

Tags: multiple outputs, lecture-16, categorical cross-entropy

A multi-class classifier has 3 output nodes with softmax activation. The true label is \(\vec y = (0, 1, 0)\) and the softmax outputs are \(\vec h = (0.3, 0.5, 0.2)\).

Compute the categorical cross-entropy loss. Leave your answer in terms of \(\log\).

Solution

\(-\log(0.5) = \log 2\).

By the categorical cross-entropy formula:

\[\ell(\vec h, \vec y) = -\sum_{k=1}^{3}\begin{cases} \log h_k, & \text{if } y_k = 1 \\ 0, & \text{if } y_k = 0 \end{cases}\]

Only \(y_2 = 1\) contributes, so the loss is \(-\log(h_2) = -\log(0.5) = \log 2\).

Problem #174

Tags: multiple outputs, lecture-16, categorical cross-entropy

A multi-class classifier has 4 output nodes with softmax activation. The true label is \(\vec y = (1, 0, 0, 0)\) and the softmax outputs are \(\vec h = (0.4, 0.3, 0.2, 0.1)\).

Compute the categorical cross-entropy loss. Leave your answer in terms of \(\log\).

Solution

\(-\log(0.4)\).

By the categorical cross-entropy formula:

\[\ell(\vec h, \vec y) = -\sum_{k=1}^{4}\begin{cases} \log h_k, & \text{if } y_k = 1 \\ 0, & \text{if } y_k = 0 \end{cases}\]

Only \(y_1 = 1\) contributes, so the loss is \(-\log(h_1) = -\log(0.4)\).

Problem #175

Tags: multiple outputs, sigmoid, lecture-16, softmax

A neural network classifies images into 5 categories.

Part 1)

Suppose the categories are mutually exclusive (each image belongs to exactly one category). Which activation function should be used at the output layer: sigmoid or softmax?

Solution

Softmax.

Since the categories are mutually exclusive, we want the output probabilities to represent a single probability distribution over the 5 classes. Softmax enforces this by ensuring the outputs sum to 1.

Part 2)

True or False: with the activation from part (a), the 5 outputs must sum to 1.

True False
Solution

True.

The softmax function produces outputs that sum to 1 by definition:

\[\sum_{k=1}^{K} h_k = \sum_{k=1}^{K}\frac{e^{z_k}}{\sum_{j} e^{z_j}} = 1 \]

Part 3)

Now suppose an image can belong to multiple categories simultaneously. Which activation function should be used at the output layer: sigmoid or softmax?

Solution

Sigmoid.

Since the categories are not mutually exclusive, each output node independently predicts the probability that the image belongs to that category. Sigmoid is applied independently to each output.

Part 4)

True or False: with the activation from part (c), the 5 outputs must sum to 1.

True False
Solution

False.

Sigmoid is applied independently to each output node, so there is no constraint that the outputs sum to 1. For example, if the image contains both a cat and a dog, the network might output high probabilities for both.

Problem #176

Tags: multiple outputs, lecture-16, regression

A neural network with 3 output nodes is trained to predict temperature, humidity, and wind speed simultaneously. The network uses the multi-target regression loss:

\[\ell(\vec h, \vec y) = \|\vec h - \vec y\|^2 = \sum_{k=1}^{3}(h_k - y_k)^2 \]

For a particular data point, the network's predictions are \(\vec h = (5, 3, 7)\) and the true values are \(\vec y = (3, 4, 5)\). Compute the loss.

Solution

\(9\).

$$\begin{align*}\ell(\vec h, \vec y) &= (5 - 3)^2 + (3 - 4)^2 + (7 - 5)^2 \\&= 4 + 1 + 4 \\&= 9 \end{align*}$$

Problem #177

Tags: lecture-16, autoencoders

An autoencoder is trained on data in \(\mathbb{R}^{8}\) with a bottleneck (hidden) layer of dimension 3.

Part 1)

What are the input and output dimensions of the encoder?

Solution

The encoder maps \(\mathbb{R}^8 \to\mathbb{R}^3\). Its input dimension is 8 and its output dimension is 3.

Part 2)

What are the input and output dimensions of the decoder?

Solution

The decoder maps \(\mathbb{R}^3 \to\mathbb{R}^8\). Its input dimension is 3 and its output dimension is 8.

Part 3)

What is the output dimension of the full autoencoder, \(H(\vec x) = \operatorname{decode}(\operatorname{encode}(\vec x))\)?

Solution

\(8\), the same as the input dimension.

The autoencoder maps \(\mathbb{R}^8 \to\mathbb{R}^8\). Its goal is to reconstruct the input, so the output must have the same dimensionality.

Part 4)

True or False: training this autoencoder is a supervised learning problem.

True False
Solution

False.

Training an autoencoder is an unsupervised learning problem. The network is trained to reconstruct its own input --- there are no separate labels. The loss function is the reconstruction error \(\sum_{i=1}^n \|\vec{x}^{(i)} - H(\vec{x}^{(i)})\|^2\).

Problem #179

Tags: lecture-16, autoencoders

True or False: if an autoencoder has 10 input nodes, 10 hidden nodes, and 10 output nodes, the smallest reconstruction error it can possibly achieve is zero.

True False
Solution

True.

Since the hidden layer has the same dimensionality as the input, the autoencoder can learn the identity function, mapping each input to itself exactly. This results in zero reconstruction error.