Introduction

The maintenance and control of road infrastructures are becoming more and more complex. We have seen in recent times that some infrastructures are too old. In order to avoid disasters such as the crash of the bridge in Genoa (Italy), detecting surface cracks is vital.

The concrete on the bridges is monitored by drone cameras. After a photographic scan of a structure, a tool to detect if there is a crack in the concrete would allow the technicians to focus only on damaged structures.

Research question/objective

The purpose of our model is to predict if there is a crack or not (a binary classifier) in the photos taken by the drones or by the technicians.

Previous analysis

In a previous analysis, people divided the data set into a test set, a validation set, and a training set. They made a data augmentation and used the convolutional neural networks and global average pooling in order to analyze the images. They loaded a preprocessing function by using vgg16 and froze layers in pretrained model. This transfer learning method is more commonly named “feature extraction”.

In another study, other people used convolutional neural networks, max pooling and dropout layers. They did not use any pretrained model or any transfer learning.

Data

We obtained our data on kaggle. It contains images of various concrete surfaces that are divided into two negative (without crack) and positive (with crack) folders. Each class has 20’000 observations with a total of 40’000 images having 227 x 227 pixels.

Methodology

The data

First, we have to prepare the data. At the beginning, we only have two folders of 20’000 images each.

  • Negative
  • Positive

We need to split these images into a training and a testing folder. Thus, we choose randomly 14’000 images from folder “Negative” and moved them into a training folder, the 6’000 remaining images are moved into a testing folder. We follow the same process for the positive images. This steps are done in the script data-spltting.

Our final architecture for our data is the following:

  • Test
    • Negative
    • Postive
  • Train
    • Negative
    • Positive

The models

We do not perform any data augmentation as we firstly mentionned is our project proposal because we already have a consequent number of images in our training set (32’000). In addition, our high resolution images have a lot of variance in terms of surface finish and illumination condition, as it is reported on kaggle.

In order to classify our images we will use the convolutional neural networks approach. We will try two different models, but in both, we will use the max pooling and pretrained model such as vgg16 and vgg19. Also, in both models and unlike previous studies, we will use the fine-tuning as a transfer learning. Thus, we will freeze the convolutional base, compile and fit the model. Then, we will unfreeze some upper layers, compile it and fit it again. We will explain the models with more details in the modelling part.

Moreover, we will use the Google cloud console in order to have the ressources to train the models.

Modelling part

First model

Before building the model, we split our data into 3 sets: a training set, a validation set and a testing set. The architecture of our model will be as the following:

Type Maps Size Receptive field Activation
Fully connected - 2 - Softmax
Fully connected - 512 - ReLU
Max pooling 256 12x12 2x2 ReLU
Convoluton 256 24x24 3x3 ReLU
Max pooling 128 26x26 2x2 ReLU
Convolution 128 53x53 3x3 ReLU
Max pooling 64 55x55 2x2 ReLU
Convolution 64 111x111 3x3 ReLU
Max pooling 32 113x113 2x2 ReLU
Convolution 32 227x227 3x3 ReLU
Input 1 227x227 - -

The filters in each convolutional layer are increasing. We decide to start with 32 as the smallest filter value. To compile the model, we choose a RMSprop optimizer and we need to find the optimal learning rate. To fit the model, we have a number of epochs of 15. We also use a stopping time technique. Because we choose to tune the learning rate hyperparameter, we create the tuning.yml file to make a grid search and we will use the flag method. The grid search will take all possible combinations of hyperparameters. We set the hyperparameters to tune (0.00001, 0.0001, 0.001, 0.01).

Also, to explore different models, we will use several scripts in order to play around the pretained model (vgg16 or vgg19) and the number of unfrozen upper layers of convolutional base (block4_conv1 or block5_conv1).

This model, named “first-model”, is saved in /results and will be used for vgg16_1, vgg16_2, vgg19_1, vgg19_2.

Further, we use the Google cloud console to make the tuning.

VGG16_1

For this model, we use the vgg16 pretrained model and unfreeze convolution blocks from block 5 while keeping the first four blocks frozen (block5_conv1).

Below, we display the table of hyperparameter combinations and their respective accuracies.

After tuning the hyperparamters, we find that the flag_rate = 0.0001 is the best hyperparameters because it gives the best metric validation accuracy (0.9968).

flag_learning_rate metric_val_acc samples epochs epochs_completed
1e-04 0.9968 2800 15 12
1e-05 0.9929 2800 15 13
1e-03 0.9789 2800 15 10
1e-02 0.5000 2800 15 7

Finally, by evaluating our model using the test set, we get an accuracy of 0.9959.

flag_learning_rate eval_acc samples epochs epochs_completed
1e-04 0.9959 2800 15 12

VGG16_2

Then, as in vgg16_1, we use the vgg16 pretrained model but unfreeze convolution blocks from block 4 while keeping the first three blocks frozen (block4_conv1).

Below, we display the table of hyperparameter combinations and their respective accuracies.

After tuning the hyperparamters, we find that the flag_rate = 0.00001 is the best hyperparameters because it gives the best metric validation accuracy (0.9961).

flag_learning_rate metric_val_acc samples epochs epochs_completed
1e-05 0.9961 2800 15 14
1e-03 0.9948 2800 15 7
1e-04 0.9945 2800 15 8
1e-02 0.5000 2800 15 6

Finally, by evaluating our model using the test set, we get an accuracy of 0.9957.

flag_learning_rate eval_acc samples epochs epochs_completed
1e-05 0.9957 2800 15 14

VGG19_1

Here, we use the vgg19 pretrained model and unfreeze convolution blocks from block 5 while keeping the first four blocks frozen (block5_conv1).

After tuning the hyperparamters, we find that the flag_rate = 0.00001 is the best hyperparameters because it gives the best metric validation accuracy (0.9952).

flag_learning_rate metric_val_acc samples epochs epochs_completed
1e-05 0.9952 2800 15 6
1e-04 0.9950 2800 15 8
1e-03 0.9921 2800 15 8
1e-02 0.5000 2800 15 10

Finally, by evaluating our model using the test set, we get an accuracy of 0.9967. However, the best accuracy one the testing set is found with the learning rate = 0.0001. It is not the same best hyperparameter that we found when tuning the model. But, this is mainly because the difference between the accuracy on the validation set of the two hyperparameters is only 0.0002. We have the same difference on the testing set as well.

flag_learning_rate eval_acc samples epochs epochs_completed
1e-04 0.9967 2800 15 8

VGG19_2

As previsouly in vgg19_1, we use the vgg19 pretrained model but this time, we unfreeze convolution blocks from block 4 while keeping the first three blocks frozen (block4_conv1).

After tuning the hyperparamters, we find that the flag_rate = 0.0001 is the best hyperparameters because it gives the best metric validation accuracy (0.9982).

flag_learning_rate metric_val_acc samples epochs epochs_completed
1e-04 0.9982 2800 15 7
1e-05 0.9957 2800 15 10
1e-03 0.9955 2800 15 7
1e-02 0.5000 2800 15 7

By evaluating our model using the test set, we get an accuracy of 0.9971.

flag_learning_rate eval_acc samples epochs epochs_completed
1e-04 0.9971 2800 15 7

Second model - dropout

For the second model, we follow the same methods than for the first model except that we use the dropout method to regularise the model as it was done in a previous study. This method prevents overfitting. We also use, as in the first model, a pretrained model (vgg16 or vgg19) and a fine-tuning for the transfer learning. We unfroze the weights from block 5 or block 4 (block5_conv1 or block4_conv1). However, we modify the achitecture of our model as it follows:

Type Maps Size Receptive field Activation
Fully connected - 2 - Softmax
Dropout - 128 - ReLU
Fully connected - 128 - ReLU
Dropout 64 56x56 - ReLU
Max pooling 64 56x56 2x2 ReLU
Convolution 64 113x113 3x3 ReLU
Convolution 64 113x113 3x3 ReLU
Dropout 32 113x113 - ReLU
Max pooling 32 113x113 2x2 ReLU
Convolution 32 227x227 3x3 ReLU
Convolution 32 227x227 3x3 ReLU
Input 1 227x227 - -

Also, the value of the first dropout is 0.2, the second one is 0.4 and the last one is 0.5. Because we choose to compile the model with a RMSprop optimizer, we need to tune the learning rates in order to find the best one. As we decide to fix the dropout rates and not making a grid search for them, we use, one more time, the tuning.yml file.

Again, we tune the model using a grid search. We use the same hyperparameters to tune as for the first model (0.00001, 0.0001, 0.001, 0.01).

Just as the first model, we use several scripts in order to play around the pretained model (vgg16 and vgg19) and the number of unfrozen upper layers of convolutional base (block4_conv1 or block5_conv1).

This model, named second-model-dropout, is saved in /results and will be used for dropout_vgg16_1, dropout_vgg16_2, dropout_vgg19_1, dropout_vgg19_2.

Then, we exploit the Google cloud console to make the tuning.

Dropout_VGG16_1

We use the vgg16 pretrained model and unfreeze convolution blocks from block 5 while keeping the first four blocks frozen (block5_conv1).

Below we display the runs of the tuning part. We find that the flag_rate = 0.001 is the best hyperparameters because it gives the best metric validation accuracy (0.9909).

flag_learning_rate metric_val_acc samples epochs epochs_completed
1e-03 0.9909 2800 15 15
1e-05 0.9880 2800 15 6
1e-04 0.9841 2800 15 6
1e-02 0.5000 2800 15 7

We select the hyperparameter whose model has the best validation accuracy.

Ultimately, we evaluate our model and displays its accuracy.

Our accuracy on the test set is 0.9932.

flag_learning_rate eval_acc samples epochs epochs_completed
0.001 0.9932 2800 15 15

Dropout_VGG16_2

We use the vgg16 pretrained model and unfreeze convolution blocks from block 4 while keeping the first three blocks frozen (block4_conv1).

We find that the flag_rate = 0.001 is the best hyperparameters because it gives the best metric validation accuracy (0.9943).

flag_learning_rate metric_val_acc samples epochs epochs_completed
1e-03 0.9943 2800 15 7
1e-04 0.9898 2800 15 6
1e-05 0.9125 2800 15 15
1e-02 0.5000 2800 15 7

We select the hyperparameter whose model has the best validation accuracy.

Our accuracy on the test set is 0.9931.

flag_learning_rate eval_acc samples epochs epochs_completed
0.001 0.9931 2800 15 7

Dropout_VGG19_1

We use the vgg19 pretrained model and unfreeze convolution blocks from block 5 while keeping the first four blocks frozen (block5_conv1).

We find that the flag_rate = 0.001 is the best hyperparameters because it gives the best metric validation accuracy (0.98953).

flag_learning_rate metric_val_acc samples epochs epochs_completed
1e-03 0.9895 2800 15 15
1e-05 0.9784 2800 15 6
1e-04 0.9625 2800 15 9
1e-02 0.5000 2800 15 8

We select the hyperparameter whose model has the best validation accuracy.

Our accuracy on the test set is 0.9931.

flag_learning_rate eval_acc samples epochs epochs_completed
0.001 0.9931 2800 15 15

Dropout_VGG19_2

We use the vgg19 pretrained model and unfreeze convolution blocks from block 4 while keeping the first three blocks frozen (block4_conv1).

We find that the flag_rate = 0.001 is the best hyperparameters because it gives the best metric validation accuracy (0.9934).

flag_learning_rate metric_val_acc samples epochs epochs_completed
1e-03 0.9934 2800 15 8
1e-05 0.9823 2800 15 6
1e-04 0.9807 2800 15 15
1e-02 0.5000 2800 15 7

We select the hyperparameter whose model has the best validation accuracy.

Our accuracy on the test set is 0.992.

flag_learning_rate eval_acc samples epochs epochs_completed
0.001 0.992 2800 15 8

Conclusion

In this project, we used two different model architectures and in both we tuned its learning rate. We played around the pretrained model and the number of unfrozen layers for both models.

We would like to precise that we tried and adjusted different models before reaching the two final ones. For example, at the beginning of our study, we tried to train the model with 20 epochs and a batch size of 2 and our results were really close with an accuracy on the testing set of 0.9929 for the first model. However, we judged that is was useless to show it in this report as we preferred to try two different model architectures and to play around the pretrained models, the unfrozen weights and the learning rates.

To compare the results of the two different models, we remark that we can reach an excellent accuracy with both models, but the first one is doing better. As we can see in the table of results, evaluation accuracy is always higher for the first model. Moreover, the best model is using a vgg19, unfreezes the weights from block 4 and uses a learning rate of 0.0001.

Table of results
Model Name Pretrained model Unfrozen weights Learning rate Accuracy
First model vgg16_1.R vgg16 block5_conv1 1e-04 0.9959
First model vgg16_2.R vgg16 block4_conv1 1e-05 0.9957
First model vgg19_1.R vgg19 block5_conv1 1e-04 0.9967
First model vgg19_2.R vgg19 block4_conv1 1e-04 0.9971
Second model dropout_vgg16_1.R vgg16 block5_conv1 1e-03 0.9932
Second model dropout_vgg16_2.R vgg16 block4_conv1 1e-03 0.9931
Second model dropout_vgg19_1.R vgg19 block5_conv1 1e-03 0.9931
Second model dropout_vgg16_2.R vgg19 block4_conv1 1e-03 0.9931
Best model
Model Name Pretrained model Unfrozen weights Learning rate Accuracy
First model vgg19_2.R vgg19 block4_conv1 1e-04 0.9971

We could spend hours and hours to tune the models and try different ones but we showed great results in terms of accuracy and is was our main goal.

To conclude, we are convinced that our model can be used in real life. Indeed, buildings or infrastructures are recorded by drones or by technicians in order to ascertain the existence or not of cracks. This process does not require urgent actions and therefore our models can be used. Cracks will be recognized in a few hours thanks to our system analysing the pictures. Then, building trades can decide whether to act and renovate damaged infrastructure.