The maintenance and control of road infrastructures are becoming more and more complex. We have seen in recent times that some infrastructures are too old. In order to avoid disasters such as the crash of the bridge in Genoa (Italy), detecting surface cracks is vital.
The concrete on the bridges is monitored by drone cameras. After a photographic scan of a structure, a tool to detect if there is a crack in the concrete would allow the technicians to focus only on damaged structures.
The purpose of our model is to predict if there is a crack or not (a binary classifier) in the photos taken by the drones or by the technicians.
In a previous analysis, people divided the data set into a test set, a validation set, and a training set. They made a data augmentation and used the convolutional neural networks and global average pooling in order to analyze the images. They loaded a preprocessing function by using vgg16 and froze layers in pretrained model. This transfer learning method is more commonly named “feature extraction”.
In another study, other people used convolutional neural networks, max pooling and dropout layers. They did not use any pretrained model or any transfer learning.
We obtained our data on kaggle. It contains images of various concrete surfaces that are divided into two negative (without crack) and positive (with crack) folders. Each class has 20’000 observations with a total of 40’000 images having 227 x 227 pixels.
First, we have to prepare the data. At the beginning, we only have two folders of 20’000 images each.
We need to split these images into a training and a testing folder. Thus, we choose randomly 14’000 images from folder “Negative” and moved them into a training folder, the 6’000 remaining images are moved into a testing folder. We follow the same process for the positive images. This steps are done in the script data-spltting
.
Our final architecture for our data is the following:
We do not perform any data augmentation as we firstly mentionned is our project proposal because we already have a consequent number of images in our training set (32’000). In addition, our high resolution images have a lot of variance in terms of surface finish and illumination condition, as it is reported on kaggle.
In order to classify our images we will use the convolutional neural networks approach. We will try two different models, but in both, we will use the max pooling and pretrained model such as vgg16 and vgg19. Also, in both models and unlike previous studies, we will use the fine-tuning as a transfer learning. Thus, we will freeze the convolutional base, compile and fit the model. Then, we will unfreeze some upper layers, compile it and fit it again. We will explain the models with more details in the modelling part.
Moreover, we will use the Google cloud console in order to have the ressources to train the models.
Before building the model, we split our data into 3 sets: a training set, a validation set and a testing set. The architecture of our model will be as the following:
Type | Maps | Size | Receptive field | Activation |
---|---|---|---|---|
Fully connected | - | 2 | - | Softmax |
Fully connected | - | 512 | - | ReLU |
Max pooling | 256 | 12x12 | 2x2 | ReLU |
Convoluton | 256 | 24x24 | 3x3 | ReLU |
Max pooling | 128 | 26x26 | 2x2 | ReLU |
Convolution | 128 | 53x53 | 3x3 | ReLU |
Max pooling | 64 | 55x55 | 2x2 | ReLU |
Convolution | 64 | 111x111 | 3x3 | ReLU |
Max pooling | 32 | 113x113 | 2x2 | ReLU |
Convolution | 32 | 227x227 | 3x3 | ReLU |
Input | 1 | 227x227 | - | - |
The filters in each convolutional layer are increasing. We decide to start with 32 as the smallest filter value. To compile the model, we choose a RMSprop optimizer and we need to find the optimal learning rate. To fit the model, we have a number of epochs of 15. We also use a stopping time technique. Because we choose to tune the learning rate hyperparameter, we create the tuning.yml
file to make a grid search and we will use the flag method. The grid search will take all possible combinations of hyperparameters. We set the hyperparameters to tune (0.00001, 0.0001, 0.001, 0.01).
Also, to explore different models, we will use several scripts in order to play around the pretained model (vgg16 or vgg19) and the number of unfrozen upper layers of convolutional base (block4_conv1 or block5_conv1).
This model, named “first-model”, is saved in /results
and will be used for vgg16_1
, vgg16_2
, vgg19_1
, vgg19_2
.
Further, we use the Google cloud console to make the tuning.
For this model, we use the vgg16 pretrained model and unfreeze convolution blocks from block 5 while keeping the first four blocks frozen (block5_conv1).
Below, we display the table of hyperparameter combinations and their respective accuracies.
After tuning the hyperparamters, we find that the flag_rate = 0.0001 is the best hyperparameters because it gives the best metric validation accuracy (0.9968).
flag_learning_rate | metric_val_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
1e-04 | 0.9968 | 2800 | 15 | 12 |
1e-05 | 0.9929 | 2800 | 15 | 13 |
1e-03 | 0.9789 | 2800 | 15 | 10 |
1e-02 | 0.5000 | 2800 | 15 | 7 |
Finally, by evaluating our model using the test set, we get an accuracy of 0.9959.
flag_learning_rate | eval_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
1e-04 | 0.9959 | 2800 | 15 | 12 |
Then, as in vgg16_1
, we use the vgg16 pretrained model but unfreeze convolution blocks from block 4 while keeping the first three blocks frozen (block4_conv1).
Below, we display the table of hyperparameter combinations and their respective accuracies.
After tuning the hyperparamters, we find that the flag_rate = 0.00001 is the best hyperparameters because it gives the best metric validation accuracy (0.9961).
flag_learning_rate | metric_val_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
1e-05 | 0.9961 | 2800 | 15 | 14 |
1e-03 | 0.9948 | 2800 | 15 | 7 |
1e-04 | 0.9945 | 2800 | 15 | 8 |
1e-02 | 0.5000 | 2800 | 15 | 6 |
Finally, by evaluating our model using the test set, we get an accuracy of 0.9957.
flag_learning_rate | eval_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
1e-05 | 0.9957 | 2800 | 15 | 14 |
Here, we use the vgg19 pretrained model and unfreeze convolution blocks from block 5 while keeping the first four blocks frozen (block5_conv1).
After tuning the hyperparamters, we find that the flag_rate = 0.00001 is the best hyperparameters because it gives the best metric validation accuracy (0.9952).
flag_learning_rate | metric_val_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
1e-05 | 0.9952 | 2800 | 15 | 6 |
1e-04 | 0.9950 | 2800 | 15 | 8 |
1e-03 | 0.9921 | 2800 | 15 | 8 |
1e-02 | 0.5000 | 2800 | 15 | 10 |
Finally, by evaluating our model using the test set, we get an accuracy of 0.9967. However, the best accuracy one the testing set is found with the learning rate = 0.0001. It is not the same best hyperparameter that we found when tuning the model. But, this is mainly because the difference between the accuracy on the validation set of the two hyperparameters is only 0.0002. We have the same difference on the testing set as well.
flag_learning_rate | eval_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
1e-04 | 0.9967 | 2800 | 15 | 8 |
As previsouly in vgg19_1
, we use the vgg19 pretrained model but this time, we unfreeze convolution blocks from block 4 while keeping the first three blocks frozen (block4_conv1).
After tuning the hyperparamters, we find that the flag_rate = 0.0001 is the best hyperparameters because it gives the best metric validation accuracy (0.9982).
flag_learning_rate | metric_val_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
1e-04 | 0.9982 | 2800 | 15 | 7 |
1e-05 | 0.9957 | 2800 | 15 | 10 |
1e-03 | 0.9955 | 2800 | 15 | 7 |
1e-02 | 0.5000 | 2800 | 15 | 7 |
By evaluating our model using the test set, we get an accuracy of 0.9971.
flag_learning_rate | eval_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
1e-04 | 0.9971 | 2800 | 15 | 7 |
For the second model, we follow the same methods than for the first model except that we use the dropout method to regularise the model as it was done in a previous study. This method prevents overfitting. We also use, as in the first model, a pretrained model (vgg16 or vgg19) and a fine-tuning for the transfer learning. We unfroze the weights from block 5 or block 4 (block5_conv1 or block4_conv1). However, we modify the achitecture of our model as it follows:
Type | Maps | Size | Receptive field | Activation |
---|---|---|---|---|
Fully connected | - | 2 | - | Softmax |
Dropout | - | 128 | - | ReLU |
Fully connected | - | 128 | - | ReLU |
Dropout | 64 | 56x56 | - | ReLU |
Max pooling | 64 | 56x56 | 2x2 | ReLU |
Convolution | 64 | 113x113 | 3x3 | ReLU |
Convolution | 64 | 113x113 | 3x3 | ReLU |
Dropout | 32 | 113x113 | - | ReLU |
Max pooling | 32 | 113x113 | 2x2 | ReLU |
Convolution | 32 | 227x227 | 3x3 | ReLU |
Convolution | 32 | 227x227 | 3x3 | ReLU |
Input | 1 | 227x227 | - | - |
Also, the value of the first dropout is 0.2, the second one is 0.4 and the last one is 0.5. Because we choose to compile the model with a RMSprop optimizer, we need to tune the learning rates in order to find the best one. As we decide to fix the dropout rates and not making a grid search for them, we use, one more time, the tuning.yml
file.
Again, we tune the model using a grid search. We use the same hyperparameters to tune as for the first model (0.00001, 0.0001, 0.001, 0.01).
Just as the first model, we use several scripts in order to play around the pretained model (vgg16 and vgg19) and the number of unfrozen upper layers of convolutional base (block4_conv1 or block5_conv1).
This model, named second-model-dropout
, is saved in /results
and will be used for dropout_vgg16_1
, dropout_vgg16_2
, dropout_vgg19_1
, dropout_vgg19_2
.
Then, we exploit the Google cloud console to make the tuning.
We use the vgg16 pretrained model and unfreeze convolution blocks from block 5 while keeping the first four blocks frozen (block5_conv1).
Below we display the runs of the tuning part. We find that the flag_rate = 0.001 is the best hyperparameters because it gives the best metric validation accuracy (0.9909).
flag_learning_rate | metric_val_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
1e-03 | 0.9909 | 2800 | 15 | 15 |
1e-05 | 0.9880 | 2800 | 15 | 6 |
1e-04 | 0.9841 | 2800 | 15 | 6 |
1e-02 | 0.5000 | 2800 | 15 | 7 |
We select the hyperparameter whose model has the best validation accuracy.
Ultimately, we evaluate our model and displays its accuracy.
Our accuracy on the test set is 0.9932.
flag_learning_rate | eval_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
0.001 | 0.9932 | 2800 | 15 | 15 |
We use the vgg16 pretrained model and unfreeze convolution blocks from block 4 while keeping the first three blocks frozen (block4_conv1).
We find that the flag_rate = 0.001 is the best hyperparameters because it gives the best metric validation accuracy (0.9943).
flag_learning_rate | metric_val_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
1e-03 | 0.9943 | 2800 | 15 | 7 |
1e-04 | 0.9898 | 2800 | 15 | 6 |
1e-05 | 0.9125 | 2800 | 15 | 15 |
1e-02 | 0.5000 | 2800 | 15 | 7 |
We select the hyperparameter whose model has the best validation accuracy.
Our accuracy on the test set is 0.9931.
flag_learning_rate | eval_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
0.001 | 0.9931 | 2800 | 15 | 7 |
We use the vgg19 pretrained model and unfreeze convolution blocks from block 5 while keeping the first four blocks frozen (block5_conv1).
We find that the flag_rate = 0.001 is the best hyperparameters because it gives the best metric validation accuracy (0.98953).
flag_learning_rate | metric_val_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
1e-03 | 0.9895 | 2800 | 15 | 15 |
1e-05 | 0.9784 | 2800 | 15 | 6 |
1e-04 | 0.9625 | 2800 | 15 | 9 |
1e-02 | 0.5000 | 2800 | 15 | 8 |
We select the hyperparameter whose model has the best validation accuracy.
Our accuracy on the test set is 0.9931.
flag_learning_rate | eval_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
0.001 | 0.9931 | 2800 | 15 | 15 |
We use the vgg19 pretrained model and unfreeze convolution blocks from block 4 while keeping the first three blocks frozen (block4_conv1).
We find that the flag_rate = 0.001 is the best hyperparameters because it gives the best metric validation accuracy (0.9934).
flag_learning_rate | metric_val_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
1e-03 | 0.9934 | 2800 | 15 | 8 |
1e-05 | 0.9823 | 2800 | 15 | 6 |
1e-04 | 0.9807 | 2800 | 15 | 15 |
1e-02 | 0.5000 | 2800 | 15 | 7 |
We select the hyperparameter whose model has the best validation accuracy.
Our accuracy on the test set is 0.992.
flag_learning_rate | eval_acc | samples | epochs | epochs_completed |
---|---|---|---|---|
0.001 | 0.992 | 2800 | 15 | 8 |
In this project, we used two different model architectures and in both we tuned its learning rate. We played around the pretrained model and the number of unfrozen layers for both models.
We would like to precise that we tried and adjusted different models before reaching the two final ones. For example, at the beginning of our study, we tried to train the model with 20 epochs and a batch size of 2 and our results were really close with an accuracy on the testing set of 0.9929 for the first model. However, we judged that is was useless to show it in this report as we preferred to try two different model architectures and to play around the pretrained models, the unfrozen weights and the learning rates.
To compare the results of the two different models, we remark that we can reach an excellent accuracy with both models, but the first one is doing better. As we can see in the table of results, evaluation accuracy is always higher for the first model. Moreover, the best model is using a vgg19, unfreezes the weights from block 4 and uses a learning rate of 0.0001.
Model | Name | Pretrained model | Unfrozen weights | Learning rate | Accuracy |
---|---|---|---|---|---|
First model | vgg16_1.R | vgg16 | block5_conv1 | 1e-04 | 0.9959 |
First model | vgg16_2.R | vgg16 | block4_conv1 | 1e-05 | 0.9957 |
First model | vgg19_1.R | vgg19 | block5_conv1 | 1e-04 | 0.9967 |
First model | vgg19_2.R | vgg19 | block4_conv1 | 1e-04 | 0.9971 |
Second model | dropout_vgg16_1.R | vgg16 | block5_conv1 | 1e-03 | 0.9932 |
Second model | dropout_vgg16_2.R | vgg16 | block4_conv1 | 1e-03 | 0.9931 |
Second model | dropout_vgg19_1.R | vgg19 | block5_conv1 | 1e-03 | 0.9931 |
Second model | dropout_vgg16_2.R | vgg19 | block4_conv1 | 1e-03 | 0.9931 |
Model | Name | Pretrained model | Unfrozen weights | Learning rate | Accuracy |
---|---|---|---|---|---|
First model | vgg19_2.R | vgg19 | block4_conv1 | 1e-04 | 0.9971 |
We could spend hours and hours to tune the models and try different ones but we showed great results in terms of accuracy and is was our main goal.
To conclude, we are convinced that our model can be used in real life. Indeed, buildings or infrastructures are recorded by drones or by technicians in order to ascertain the existence or not of cracks. This process does not require urgent actions and therefore our models can be used. Cracks will be recognized in a few hours thanks to our system analysing the pictures. Then, building trades can decide whether to act and renovate damaged infrastructure.