Example CNN ConvNet

1 Introduction

Python Source: https://keras.io/examples/vision/mnist_convnet/

2 Model definition


model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

The output of model.summary() is:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1600)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1600)              0         
_________________________________________________________________
dense (Dense)                (None, 10)                16010     
=================================================================
Total params: 34,826
Trainable params: 34,826
Non-trainable params: 0
_________________________________________________________________

3 Understanding Param #

The number of parameters Param # is explained below for each layer.

3.1 First Convolution layer

This model is designed with a first convolution layer taking an image \((28\times 28 \times 1)\) (MNIST are grey level images of width 28 and height 28) and computing 32 maps using 32 filters of size \((3\times 3\times 1)\) creating a output tensor of size \(26 \times 26 \times 32\) (as no padding is applied). The 32 filters have \(3\times3\times 32=288\) parameters and in addition 32 biases need to be estimated leading to a total of \(288+32=320\) parameters to be estimated in this first convolution.

Convolution of the input image with 32 filters of size 3x3 on the first layer
Convolution of the input image with 32 filters of size 3x3 on the first layer

3.2 Maxpooling

The maxpooling operation down-samples by 2 in spatial dimensions creating a tensor of size \((\frac{26}{2} \times \frac{26}{2} \times 32)=(13 \times 13 \times 32)\).

3.3 Second convolution layer

The second convolution layer operates on the input tensor \((13 \times 13 \times 32)\) using 64 filters of size \((3\times 3 \times 32)\) creating an output tensor of size \((11 \times 11 \times 64)\) (no padding used). The 64 filters have \(3\times3\times 32\times 64=18432\) parameters and in addition 64 biases need to be estimated leading to a total of \(18432+64=18496\) parameters to be estimated in this second convolution layer.

Second convolutional layer with 64 filters
Second convolutional layer with 64 filters

3.4 Maxpooling and flattening

The maxpooling operation down-samples by 2 in spatial dimension creating a tensor of size \((\lfloor\frac{11}{2}\rfloor \times \lfloor\frac{11}{2}\rfloor \times 64)=(5 \times 5 \times 64)\). This tensor is flatten into a column vector of \(5 \times 5 \times 64=1600\) dimensions.

3.5 Dense layer and Softmax

The dense layer takes the \(1600\times 1\) vector \(\mathbf{x}\) to be multiplied by a weight matrix \(\mathrm{W}\) of dimension \(10\times 1600\) and \(10\times 1\) vector of biases \(\mathbf{b}\) creating the \(10\times 1\) output vector \(\mathbf{z}\) to be fed into a softmax function for classification into 10 classes (the 10 digits). \[\mathrm{W}\mathbf{x}+\mathbf{b}=\mathbf{z} \rightarrow \text{softmax}(\mathbf{z})\]

4 Train the model

Setting up the batch size and the number of epoch

batch_size = 128
epochs = 15

The model parameters are estimated with the training set made of \(0.9\times 60000=54000\) samples with \(6000\) samples in the validation set:

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

gives output

Epoch 1/15
422/422 [==============================] - 13s 29ms/step - loss: 0.7840 - accuracy: 0.7643 - val_loss: 0.0780 - val_accuracy: 0.9780
Epoch 2/15
422/422 [==============================] - 13s 31ms/step - loss: 0.1199 - accuracy: 0.9639 - val_loss: 0.0559 - val_accuracy: 0.9843
Epoch 3/15
422/422 [==============================] - 14s 33ms/step - loss: 0.0845 - accuracy: 0.9737 - val_loss: 0.0469 - val_accuracy: 0.9877
Epoch 4/15
422/422 [==============================] - 14s 33ms/step - loss: 0.0762 - accuracy: 0.9756 - val_loss: 0.0398 - val_accuracy: 0.9895
Epoch 5/15
422/422 [==============================] - 15s 35ms/step - loss: 0.0621 - accuracy: 0.9812 - val_loss: 0.0378 - val_accuracy: 0.9890
Epoch 6/15
422/422 [==============================] - 17s 40ms/step - loss: 0.0547 - accuracy: 0.9825 - val_loss: 0.0360 - val_accuracy: 0.9910
Epoch 7/15
422/422 [==============================] - 17s 41ms/step - loss: 0.0497 - accuracy: 0.9840 - val_loss: 0.0311 - val_accuracy: 0.9920
Epoch 8/15
422/422 [==============================] - 16s 39ms/step - loss: 0.0443 - accuracy: 0.9862 - val_loss: 0.0346 - val_accuracy: 0.9910
Epoch 9/15
422/422 [==============================] - 17s 39ms/step - loss: 0.0436 - accuracy: 0.9860 - val_loss: 0.0325 - val_accuracy: 0.9915
Epoch 10/15
422/422 [==============================] - 16s 38ms/step - loss: 0.0407 - accuracy: 0.9865 - val_loss: 0.0301 - val_accuracy: 0.9920
Epoch 11/15
422/422 [==============================] - 16s 37ms/step - loss: 0.0406 - accuracy: 0.9874 - val_loss: 0.0303 - val_accuracy: 0.9920
Epoch 12/15
237/422 [===============>..............] - ETA: 7s - loss: 0.0398 - accuracy: 0.9877

At each epoch, the loss and accuracy are reported for both datasets training and validation. The number 422 corresponds to the number of batches per epoch: \[ \frac{\# \text{training samples}}{\text{batch size}}=\frac{54000}{128}\simeq 422 \]

5 Evaluate the model

Processing the test set with the trained model:

score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

gives output:

Test loss: 0.023950600996613503
Test accuracy: 0.9922000169754028