The second session of Machine Learning of AI Saturdays Quito was focused on the classification of images. The problem of image classification is described as follows: given a set of images (train set), which have a label or category (dogs or cats in our case) we are asked to predict these categories for a new set of images (test set). To solve this problem, we will “feed” the computer with many images of each category and use an algorithm to observe the images and the characteristics of each class and learn about the visual appearance of each category. In other words, the algorithm gets familiarized with each one of the classes by giving them images and their respective labels; then the algorithm will be able to differentiate and classify images never seen before. This classification is considered a supervised problem.
Convolutional Neural Networks (CNN)
A CNN algorithm is perhaps the most popular model for solving image classification problems. The operation of the model could be described as if it were to read a book page with a magnifying glass, eventually we will end up reading the whole page, but only a certain number of words were read at a time. A convolution is a weighted sum of the values of each pixel of an image. Consider an image of size 256 x 256 pixels, then the convolution algorithm will proceed to “scan” the image using a “patch” or “window” that can be of any size, let’s say 5 x 5 pixels. The result is another image of the same size (256 x 256) where each pixel is the weighted sum of the pixels observed through the patch. This window or patch can be moved over the original image pixel by pixel or according to the necessary requirements.
Data to use
The data to be used by this algorithm are located in Amazon S3 (Amazon Simple Storage Service) which is an internet storage service offered by Amazon.
Step 1. Jupyter Notebook Configuration
The following commands (Figure 1) are used to display images and graphics inside the Jupyter notebook (in line) instead of being displayed in a new window.
We import the packages to be used (Figure 2), in this case FastAI.
Step 2. Obtaining the data
The following code lines allow the data import. The function untar_data allows the data set downloading process the set of data indicated below (Figure 3). URLs is a data class of fastai.datasets and PETS is a string that contains the path where the data is stored in AmazonS3. We print the destination path of the downloaded data set.
The command path.ls () let us obtain the destination path of the disaggregated data set in images and names (Figure 4).
We assign the names path_anno and path_img to the annotations and images paths respectively (Figure 5).
The get_image_files function allows us to extract the label of each image and assign it to fnames (Figure 6). We can see what fnames contains by running the command print (fnames) or we can visualize the first 5 labels with fnames [: 20] (Figure 7).
fnames contains the names of the image files with the name of the dogs and cats’ breeds. To extract the name of the breed we’ll use regular expressions (Figure 8).
Next, we will load the training, validation and test data by using the ImageDataBunch function (Figure 9), transformations and normalization of the images are also carried out so that they have equality and there be no quality loss during the convolution process.
Now we can take a look at some images of the data set with the following command (Figure 10), the argument rows are the number of rows and columns to be displayed and figsize defines their size.
If we want to see the labels, we can run the command data.classes, we’ll get a list of all the classes available. The number of classes can be seen with the command len(data.classes),data.c (Figure 11).
Step 3. Training the model
To train the model we will use a convolutional neuronal network. It is important to know that this model takes input images and the output shows the predicted probability for each one of the categories. The model will be trained 4 cycles using our data.
The cnn_learner method uses as parameters: the data, the resnet34 model and the list of metrics (in this case error_rate). By executing this method for the first time, the pretrained resnet34 weights are downloaded. By saying pretrained we mean that the model has already seen many images and therefore the model already knows some of them. Epoch=4 means that the model will assess the images 4 times (Figure 12).
The training results are shown in Figure 13.
Then we save the model in .pth format. We must also save the weights calculated by the algorithm (Figure 14).
The code (Figure 15) let us see the del results.
Step 4. Analysis of the results
We classify the images using the classification model (Figure 16).
The images in Figure 17 show prediction data and compares them with the real names of each pet. Each image considers 4 components: the breed of the pet in which the image was classified, the real breed of the pet, the loss value and the probability of the real class. To get more information we use the line of code shown in Figure 18.
We also use a confusion matrix (Figure 19), which allows us to evaluate the performance of the classification algorithm by counting the hits and errors of each of the classes. Meaning that the confusion matrix will have a 37x37 size where the numbers that appear on the diagonal correspond to the hits made in the classification for each class. This matrix shows the real names of the categories on the left side of the table and at the bottom shows the names after the prediction (Figure 20).
The most_confused function (Figure 21) can be used when the data contains a large number of classes, which makes it difficult to read the confusion matrix. The following results show a column where each row contains 2 names of the breed of pets and then a number where the first name corresponds to the real breed of the pet, the second name corresponds to the breed in which it was misclassified and the number indicates how many errors of this type appear.
Step 5. Improving the model
The previous model was trained with a model that was pre-trained and for that reason the execution process is quite fast. But what we really want to do is train the model entirely, that is, train the model from the beginning. For this we apply the function “unfreeze” which will allow the model to train from the beginning, we can do it with only one epoch (Figure 22).
We can see that the “error rate” value is now higher than it was in the previous model because, roughly speaking, we can indicate that the model wants to recognize simple things such as diagonal lines or gradients while trying to recognize the details more accurate as the eyes of a cat or a dog. So, to improve things, what we have to do is return to the previously saved model with the following command (Figure 23):
The next thing we are going to do is observe how fast we can train the model without generating errors (Figure 24), for this we plot the learning rate graph for the model. This graph indicates on the Y axis the value of the loss (error) when the learning rate of the model increases (learning rate).
As seen in Figure 25, the optimal learning rate, to which the loss is lower, goes a little beyond 10–4. Therefore, to set an optimal learning rate (so that we do not have greater losses), what we do is to add this parameter in the function “learn.fit_one_cycle”. The interval of the optimal learning rate is [10–6, 10–4] (Figure 26).
Detecting teddy bears using Google images
We will train the previously seen image classification model using images found on the internet, for which we will download from https://images.google.com/. We will also put our classifier to work and we will do small prediction tests with evaluation images that are not in the original database.
We download images of teddy bears (teddys), black bears (black) and grizzly bears (grizzly). For this we write “black bears” in images.google.com and type F12 (in google chrome) to open the console where we must write the code to download the URL of these images (Figure 27).
As seen in Figure 28 (bottom left corner), the file has been downloaded, we must ensure that it has .txt format and we must change its name, in this case we label it URLs_black. We do the same for the other set of images, that is, grizzly bears and teddy bears. The names assigned to these downloaded files are: URLs_grizzly and URLs_teddys.
Once downloaded the files we must create folders in Google Colabs where the downloaded images will be hosted, for this we must enter some lines of code (Figures 29–30).
Prior to executing the previous lines of code, we must ensure that the file urls_black.txt is hosted in the folder called “black”. To upload this file, we must right click on the “black” folder and click on “upload” as shown in Figure 31.
Next, we must fix the name (label) of each of the classes and assign them to our downloaded images (Figure 32). We process the data in size and normalization and then observe a sample of them (Figure 33).
Next, we execute the resnet34 model (Figure 34) and plot the curve of the learning rate in order to choose the rate that optimizes the model (the one which generates less “Loss” error). In this case and as shown in Figure 35 we choose the interval [1e-02,1e-01] as the optimal learning rate and we execute the model with 5 “epochs” showing the results (Figure 36).
We save and load the previously executed model and show the confusion matrix (Figure 37).
Now we will perform some prediction tests using images that are not in the original database. For this we must load the image (“peluche.jpg”) selected in the “bears” folder (Figure 38) and assign it the name “img1” (Figure 39).
We can see that the prediction made for this image is correct since the result of the command “learn.predict” is: “teddys” as Figure 40 shows.
Next, we perform tests with other images and observe the results (Figures 41–43).
In the following example we can see that the prediction is not correct since the chosen image corresponds to a “grizzly” bear (Figure 44) but the prediction labels it as a “black” bear (Figure 45).
Finally, we feed our model a dog image to be able to see what is the result of the prediction (Figure 46).
We can see that we do not have a good prediction. However, we can fine tune our model, in the following publication we will see how to do it.
ML section redacted by Martha San Andrés.
DL section redacted by David Vivas.
ML and DL sections translated and edited by David Francisco Dávila Ortega, MSc. — Eng.
Reviewed by Paolo Paste, Co-funder AI Saturdays