Example CNN ConvNet
1 Introduction
Python Source: https://keras.io/examples/vision/mnist_convnet/
2 Model definition
model = keras.Sequential(
[
keras.Input(shape=input_shape),
layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dropout(0.5),
layers.Dense(num_classes, activation="softmax"),
]
)
model.summary()
The output of model.summary() is:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 11, 11, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 1600) 0
_________________________________________________________________
dropout (Dropout) (None, 1600) 0
_________________________________________________________________
dense (Dense) (None, 10) 16010
=================================================================
Total params: 34,826
Trainable params: 34,826
Non-trainable params: 0
_________________________________________________________________
3 Understanding
Param #
The number of parameters Param # is explained below for
each layer.
3.1 First Convolution layer
This model is designed with a first convolution layer taking an image \((28\times 28 \times 1)\) (MNIST are grey level images of width 28 and height 28) and computing 32 maps using 32 filters of size \((3\times 3\times 1)\) creating a output tensor of size \(26 \times 26 \times 32\) (as no padding is applied). The 32 filters have \(3\times3\times 32=288\) parameters and in addition 32 biases need to be estimated leading to a total of \(288+32=320\) parameters to be estimated in this first convolution.
3.2 Maxpooling
The maxpooling operation down-samples by 2 in spatial dimensions creating a tensor of size \((\frac{26}{2} \times \frac{26}{2} \times 32)=(13 \times 13 \times 32)\).
3.3 Second convolution layer
The second convolution layer operates on the input tensor \((13 \times 13 \times 32)\) using 64 filters of size \((3\times 3 \times 32)\) creating an output tensor of size \((11 \times 11 \times 64)\) (no padding used). The 64 filters have \(3\times3\times 32\times 64=18432\) parameters and in addition 64 biases need to be estimated leading to a total of \(18432+64=18496\) parameters to be estimated in this second convolution layer.
3.4 Maxpooling and flattening
The maxpooling operation down-samples by 2 in spatial dimension creating a tensor of size \((\lfloor\frac{11}{2}\rfloor \times \lfloor\frac{11}{2}\rfloor \times 64)=(5 \times 5 \times 64)\). This tensor is flatten into a column vector of \(5 \times 5 \times 64=1600\) dimensions.
3.5 Dense layer and Softmax
The dense layer takes the \(1600\times 1\) vector \(\mathbf{x}\) to be multiplied by a weight matrix \(\mathrm{W}\) of dimension \(10\times 1600\) and \(10\times 1\) vector of biases \(\mathbf{b}\) creating the \(10\times 1\) output vector \(\mathbf{z}\) to be fed into a softmax function for classification into 10 classes (the 10 digits). \[\mathrm{W}\mathbf{x}+\mathbf{b}=\mathbf{z} \rightarrow \text{softmax}(\mathbf{z})\]
4 Train the model
Setting up the batch size and the number of epoch
batch_size = 128
epochs = 15
The model parameters are estimated with the training set made of \(0.9\times 60000=54000\) samples with \(6000\) samples in the validation set:
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)
gives output
Epoch 1/15
422/422 [==============================] - 13s 29ms/step - loss: 0.7840 - accuracy: 0.7643 - val_loss: 0.0780 - val_accuracy: 0.9780
Epoch 2/15
422/422 [==============================] - 13s 31ms/step - loss: 0.1199 - accuracy: 0.9639 - val_loss: 0.0559 - val_accuracy: 0.9843
Epoch 3/15
422/422 [==============================] - 14s 33ms/step - loss: 0.0845 - accuracy: 0.9737 - val_loss: 0.0469 - val_accuracy: 0.9877
Epoch 4/15
422/422 [==============================] - 14s 33ms/step - loss: 0.0762 - accuracy: 0.9756 - val_loss: 0.0398 - val_accuracy: 0.9895
Epoch 5/15
422/422 [==============================] - 15s 35ms/step - loss: 0.0621 - accuracy: 0.9812 - val_loss: 0.0378 - val_accuracy: 0.9890
Epoch 6/15
422/422 [==============================] - 17s 40ms/step - loss: 0.0547 - accuracy: 0.9825 - val_loss: 0.0360 - val_accuracy: 0.9910
Epoch 7/15
422/422 [==============================] - 17s 41ms/step - loss: 0.0497 - accuracy: 0.9840 - val_loss: 0.0311 - val_accuracy: 0.9920
Epoch 8/15
422/422 [==============================] - 16s 39ms/step - loss: 0.0443 - accuracy: 0.9862 - val_loss: 0.0346 - val_accuracy: 0.9910
Epoch 9/15
422/422 [==============================] - 17s 39ms/step - loss: 0.0436 - accuracy: 0.9860 - val_loss: 0.0325 - val_accuracy: 0.9915
Epoch 10/15
422/422 [==============================] - 16s 38ms/step - loss: 0.0407 - accuracy: 0.9865 - val_loss: 0.0301 - val_accuracy: 0.9920
Epoch 11/15
422/422 [==============================] - 16s 37ms/step - loss: 0.0406 - accuracy: 0.9874 - val_loss: 0.0303 - val_accuracy: 0.9920
Epoch 12/15
237/422 [===============>..............] - ETA: 7s - loss: 0.0398 - accuracy: 0.9877
At each epoch, the loss and accuracy are reported for both datasets training and validation. The number 422 corresponds to the number of batches per epoch: \[ \frac{\# \text{training samples}}{\text{batch size}}=\frac{54000}{128}\simeq 422 \]
5 Evaluate the model
Processing the test set with the trained model:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])
gives output:
Test loss: 0.023950600996613503
Test accuracy: 0.9922000169754028