Introduction

The maintenance and control of road infrastructures are becoming more and more complex. We have seen in recent times that some infrastructures are too old. In order to avoid disasters such as the crash of the bridge in Genoa (Italy), detecting surface cracks is vital.

The concrete on the bridges is monitored by drone cameras. After a photographic scan of a structure, a tool to detect if there is a crack in the concrete would allow the technicians to focus only on damaged structures.

Research question/objective

The purpose of our model is to predict if there is a crack or not (a binary classifier) in the photos taken by the drones or by the technicians.

Previous analysis

In a previous analysis, people divided the data set into a test set, a validation set, and a training set. They made a data augmentation and used the convolutional neural networks and global average pooling in order to analyze the images. They loaded a preprocessing function by using vgg16 and froze layers in pretrained model. This transfer learning method is more commonly named “feature extraction”.

In another study, other people used convolutional neural networks, max pooling and dropout layers. They did not use any pretrained model or any transfer learning.

Data

We obtained our data on kaggle. It contains images of various concrete surfaces that are divided into two negative (without crack) and positive (with crack) folders. Each class has 20’000 observations with a total of 40’000 images having 227 x 227 pixels.

Methodology

The data

First, we have to prepare the data. At the beginning, we only have two folders of 20’000 images each.

Negative
Positive

We need to split these images into a training and a testing folder. Thus, we choose randomly 14’000 images from folder “Negative” and moved them into a training folder, the 6’000 remaining images are moved into a testing folder. We follow the same process for the positive images. This steps are done in the script data-spltting.

Our final architecture for our data is the following:

Test
- Negative
- Postive
Train
- Negative
- Positive

The models

We do not perform any data augmentation as we firstly mentionned is our project proposal because we already have a consequent number of images in our training set (32’000). In addition, our high resolution images have a lot of variance in terms of surface finish and illumination condition, as it is reported on kaggle.

In order to classify our images we will use the convolutional neural networks approach. We will try two different models, but in both, we will use the max pooling and pretrained model such as vgg16 and vgg19. Also, in both models and unlike previous studies, we will use the fine-tuning as a transfer learning. Thus, we will freeze the convolutional base, compile and fit the model. Then, we will unfreeze some upper layers, compile it and fit it again. We will explain the models with more details in the modelling part.

Moreover, we will use the Google cloud console in order to have the ressources to train the models.

Modelling part

First model

Before building the model, we split our data into 3 sets: a training set, a validation set and a testing set. The architecture of our model will be as the following:

Type	Maps	Size	Receptive field	Activation
Fully connected	-	2	-	Softmax
Fully connected	-	512	-	ReLU
Max pooling	256	12x12	2x2	ReLU
Convoluton	256	24x24	3x3	ReLU
Max pooling	128	26x26	2x2	ReLU
Convolution	128	53x53	3x3	ReLU
Max pooling	64	55x55	2x2	ReLU
Convolution	64	111x111	3x3	ReLU
Max pooling	32	113x113	2x2	ReLU
Convolution	32	227x227	3x3	ReLU
Input	1	227x227	-	-

The filters in each convolutional layer are increasing. We decide to start with 32 as the smallest filter value. To compile the model, we choose a RMSprop optimizer and we need to find the optimal learning rate. To fit the model, we have a number of epochs of 15. We also use a stopping time technique. Because we choose to tune the learning rate hyperparameter, we create the tuning.yml file to make a grid search and we will use the flag method. The grid search will take all possible combinations of hyperparameters. We set the hyperparameters to tune (0.00001, 0.0001, 0.001, 0.01).

Also, to explore different models, we will use several scripts in order to play around the pretained model (vgg16 or vgg19) and the number of unfrozen upper layers of convolutional base (block4_conv1 or block5_conv1).

This model, named “first-model”, is saved in /results and will be used for vgg16_1, vgg16_2, vgg19_1, vgg19_2.

Further, we use the Google cloud console to make the tuning.

VGG16_1

For this model, we use the vgg16 pretrained model and unfreeze convolution blocks from block 5 while keeping the first four blocks frozen (block5_conv1).

Below, we display the table of hyperparameter combinations and their respective accuracies.

After tuning the hyperparamters, we find that the flag_rate = 0.0001 is the best hyperparameters because it gives the best metric validation accuracy (0.9968).

flag_learning_rate	metric_val_acc	samples	epochs	epochs_completed
1e-04	0.9968	2800	15	12
1e-05	0.9929	2800	15	13
1e-03	0.9789	2800	15	10
1e-02	0.5000	2800	15	7

Finally, by evaluating our model using the test set, we get an accuracy of 0.9959.

flag_learning_rate	eval_acc	samples	epochs	epochs_completed
1e-04	0.9959	2800	15	12

VGG16_2

Then, as in vgg16_1, we use the vgg16 pretrained model but unfreeze convolution blocks from block 4 while keeping the first three blocks frozen (block4_conv1).

Below, we display the table of hyperparameter combinations and their respective accuracies.

After tuning the hyperparamters, we find that the flag_rate = 0.00001 is the best hyperparameters because it gives the best metric validation accuracy (0.9961).

flag_learning_rate	metric_val_acc	samples	epochs	epochs_completed
1e-05	0.9961	2800	15	14
1e-03	0.9948	2800	15	7
1e-04	0.9945	2800	15	8
1e-02	0.5000	2800	15	6

Finally, by evaluating our model using the test set, we get an accuracy of 0.9957.

flag_learning_rate	eval_acc	samples	epochs	epochs_completed
1e-05	0.9957	2800	15	14

VGG19_1

Here, we use the vgg19 pretrained model and unfreeze convolution blocks from block 5 while keeping the first four blocks frozen (block5_conv1).

After tuning the hyperparamters, we find that the flag_rate = 0.00001 is the best hyperparameters because it gives the best metric validation accuracy (0.9952).

flag_learning_rate	metric_val_acc	samples	epochs	epochs_completed
1e-05	0.9952	2800	15	6
1e-04	0.9950	2800	15	8
1e-03	0.9921	2800	15	8
1e-02	0.5000	2800	15	10

Finally, by evaluating our model using the test set, we get an accuracy of 0.9967. However, the best accuracy one the testing set is found with the learning rate = 0.0001. It is not the same best hyperparameter that we found when tuning the model. But, this is mainly because the difference between the accuracy on the validation set of the two hyperparameters is only 0.0002. We have the same difference on the testing set as well.

flag_learning_rate	eval_acc	samples	epochs	epochs_completed
1e-04	0.9967	2800	15	8

VGG19_2

As previsouly in vgg19_1, we use the vgg19 pretrained model but this time, we unfreeze convolution blocks from block 4 while keeping the first three blocks frozen (block4_conv1).

After tuning the hyperparamters, we find that the flag_rate = 0.0001 is the best hyperparameters because it gives the best metric validation accuracy (0.9982).

flag_learning_rate	metric_val_acc	samples	epochs	epochs_completed
1e-04	0.9982	2800	15	7
1e-05	0.9957	2800	15	10
1e-03	0.9955	2800	15	7
1e-02	0.5000	2800	15	7

By evaluating our model using the test set, we get an accuracy of 0.9971.

flag_learning_rate	eval_acc	samples	epochs	epochs_completed
1e-04	0.9971	2800	15	7

Second model - dropout

For the second model, we follow the same methods than for the first model except that we use the dropout method to regularise the model as it was done in a previous study. This method prevents overfitting. We also use, as in the first model, a pretrained model (vgg16 or vgg19) and a fine-tuning for the transfer learning. We unfroze the weights from block 5 or block 4 (block5_conv1 or block4_conv1). However, we modify the achitecture of our model as it follows:

Type	Maps	Size	Receptive field	Activation
Fully connected	-	2	-	Softmax
Dropout	-	128	-	ReLU
Fully connected	-	128	-	ReLU
Dropout	64	56x56	-	ReLU
Max pooling	64	56x56	2x2	ReLU
Convolution	64	113x113	3x3	ReLU
Convolution	64	113x113	3x3	ReLU
Dropout	32	113x113	-	ReLU
Max pooling	32	113x113	2x2	ReLU
Convolution	32	227x227	3x3	ReLU
Convolution	32	227x227	3x3	ReLU
Input	1	227x227	-	-

Also, the value of the first dropout is 0.2, the second one is 0.4 and the last one is 0.5. Because we choose to compile the model with a RMSprop optimizer, we need to tune the learning rates in order to find the best one. As we decide to fix the dropout rates and not making a grid search for them, we use, one more time, the tuning.yml file.

Again, we tune the model using a grid search. We use the same hyperparameters to tune as for the first model (0.00001, 0.0001, 0.001, 0.01).

Just as the first model, we use several scripts in order to play around the pretained model (vgg16 and vgg19) and the number of unfrozen upper layers of convolutional base (block4_conv1 or block5_conv1).

This model, named second-model-dropout, is saved in /results and will be used for dropout_vgg16_1, dropout_vgg16_2, dropout_vgg19_1, dropout_vgg19_2.

Then, we exploit the Google cloud console to make the tuning.

Dropout_VGG16_1

We use the vgg16 pretrained model and unfreeze convolution blocks from block 5 while keeping the first four blocks frozen (block5_conv1).

Below we display the runs of the tuning part. We find that the flag_rate = 0.001 is the best hyperparameters because it gives the best metric validation accuracy (0.9909).

flag_learning_rate	metric_val_acc	samples	epochs	epochs_completed
1e-03	0.9909	2800	15	15
1e-05	0.9880	2800	15	6
1e-04	0.9841	2800	15	6
1e-02	0.5000	2800	15	7

We select the hyperparameter whose model has the best validation accuracy.

Ultimately, we evaluate our model and displays its accuracy.

Our accuracy on the test set is 0.9932.

flag_learning_rate	eval_acc	samples	epochs	epochs_completed
0.001	0.9932	2800	15	15

Dropout_VGG16_2

We use the vgg16 pretrained model and unfreeze convolution blocks from block 4 while keeping the first three blocks frozen (block4_conv1).

We find that the flag_rate = 0.001 is the best hyperparameters because it gives the best metric validation accuracy (0.9943).

flag_learning_rate	metric_val_acc	samples	epochs	epochs_completed
1e-03	0.9943	2800	15	7
1e-04	0.9898	2800	15	6
1e-05	0.9125	2800	15	15
1e-02	0.5000	2800	15	7

We select the hyperparameter whose model has the best validation accuracy.

Our accuracy on the test set is 0.9931.

flag_learning_rate	eval_acc	samples	epochs	epochs_completed
0.001	0.9931	2800	15	7

Dropout_VGG19_1

We use the vgg19 pretrained model and unfreeze convolution blocks from block 5 while keeping the first four blocks frozen (block5_conv1).

We find that the flag_rate = 0.001 is the best hyperparameters because it gives the best metric validation accuracy (0.98953).

flag_learning_rate	metric_val_acc	samples	epochs	epochs_completed
1e-03	0.9895	2800	15	15
1e-05	0.9784	2800	15	6
1e-04	0.9625	2800	15	9
1e-02	0.5000	2800	15	8

We select the hyperparameter whose model has the best validation accuracy.

Our accuracy on the test set is 0.9931.

flag_learning_rate	eval_acc	samples	epochs	epochs_completed
0.001	0.9931	2800	15	15

Dropout_VGG19_2

We use the vgg19 pretrained model and unfreeze convolution blocks from block 4 while keeping the first three blocks frozen (block4_conv1).

We find that the flag_rate = 0.001 is the best hyperparameters because it gives the best metric validation accuracy (0.9934).

flag_learning_rate	metric_val_acc	samples	epochs	epochs_completed
1e-03	0.9934	2800	15	8
1e-05	0.9823	2800	15	6
1e-04	0.9807	2800	15	15
1e-02	0.5000	2800	15	7

We select the hyperparameter whose model has the best validation accuracy.

Our accuracy on the test set is 0.992.

flag_learning_rate	eval_acc	samples	epochs	epochs_completed
0.001	0.992	2800	15	8

Conclusion

In this project, we used two different model architectures and in both we tuned its learning rate. We played around the pretrained model and the number of unfrozen layers for both models.

We would like to precise that we tried and adjusted different models before reaching the two final ones. For example, at the beginning of our study, we tried to train the model with 20 epochs and a batch size of 2 and our results were really close with an accuracy on the testing set of 0.9929 for the first model. However, we judged that is was useless to show it in this report as we preferred to try two different model architectures and to play around the pretrained models, the unfrozen weights and the learning rates.

To compare the results of the two different models, we remark that we can reach an excellent accuracy with both models, but the first one is doing better. As we can see in the table of results, evaluation accuracy is always higher for the first model. Moreover, the best model is using a vgg19, unfreezes the weights from block 4 and uses a learning rate of 0.0001.

Table of results
Model	Name	Pretrained model	Unfrozen weights	Learning rate	Accuracy
First model	vgg16_1.R	vgg16	block5_conv1	1e-04	0.9959
First model	vgg16_2.R	vgg16	block4_conv1	1e-05	0.9957
First model	vgg19_1.R	vgg19	block5_conv1	1e-04	0.9967
First model	vgg19_2.R	vgg19	block4_conv1	1e-04	0.9971
Second model	dropout_vgg16_1.R	vgg16	block5_conv1	1e-03	0.9932
Second model	dropout_vgg16_2.R	vgg16	block4_conv1	1e-03	0.9931
Second model	dropout_vgg19_1.R	vgg19	block5_conv1	1e-03	0.9931
Second model	dropout_vgg16_2.R	vgg19	block4_conv1	1e-03	0.9931

Best model
Model	Name	Pretrained model	Unfrozen weights	Learning rate	Accuracy
First model	vgg19_2.R	vgg19	block4_conv1	1e-04	0.9971

We could spend hours and hours to tune the models and try different ones but we showed great results in terms of accuracy and is was our main goal.

To conclude, we are convinced that our model can be used in real life. Indeed, buildings or infrastructures are recorded by drones or by technicians in order to ascertain the existence or not of cracks. This process does not require urgent actions and therefore our models can be used. Cracks will be recognized in a few hours thanks to our system analysing the pictures. Then, building trades can decide whether to act and renovate damaged infrastructure.

Detection of surface cracks

Cédric Vuignier, Gaëtan Lovey

2020-05-27

Introduction

Research question/objective

Previous analysis

Data

Methodology

The data

The models

Modelling part

First model

VGG16_1

VGG16_2

VGG19_1

VGG19_2

Second model - dropout

Dropout_VGG16_1

Dropout_VGG16_2

Dropout_VGG19_1

Dropout_VGG19_2

Conclusion