In the lasts years of the software industry, the computational power at disposition of developers has increased to the point where for a set of complex problems, the creation of numerical models that represent the understanding on the subject, is now feasible. This models are built using machine learning algorithms, and this represents a new type of challenge in code auditing and application security.

**Understanding the target**

As a case study, we have chosen to classify what type of object is in an image. To solve these types of complex problems, years of development and expertise are no longer needed in order to specify in a declarative manner the context of the problem and all the possible nuances to which the software is going to be exposed. All the libraries that once may have been written to process the classification of an image (involving Computer Vision algorithms like, Canny edge detection, Hough Line Transform, etc. and heuristics hacks that developers often use to approximate the solution to what it is expected) now can be achieved through the training of a **Convolutional** **Neural Network (CNN)**.

A **CNN** is a type of neural network, commonly utilized for computer vision applications. Without going into much detail, **CNN** consist of an input layer (in our case the representation of the image) and an output layer (in our case the classification of the image), as well as multiple hidden layers. The hidden layers convolve the input and apply an **activation function** (commonly **ReLU**), the convolution is a specialized kind of linear operation applied intention of extracting the relevant features of the original image, once this is done a “Pooling” layer reduces the dimension of the data and passes it to the next hidden layer.

As we can see a **CNN** has a lot of flexibility in its definition, and different architectures of neural networks have been proposed. But the core of the effectiveness of the algorithm depends on how the features are extracted, and this is highly dependent on how the convolution is done. For this we must understand how **Neural Networks** are trained.

To train a **Convolutional** **Neural Network (NN)** in a supervised manner, we must have a fully classified representative dataset of our problem to solve, and large enough so we can test and correct the evaluation of our model to the point where it can predict the result as expected.

In every convolution, our **CNN** will apply some internal **weights** and **bias** to the processed input. Learning involves adjusting the **“weights”** of the network to improve the accuracy. This is done through the calculation of a **cost function**, and optimizing it. The optimization is commonly done with **gradient descent** [1]

**Misclassification attack**

As we have seen, the fact that our program now has a numerical representation of knowledge implies that a lot of the understanding of the problem comes from the understanding of the underlying mathematical concepts. This alone, is a hint that attacks done to **CNN**, will most likely abuse the algorithm chosen to create them.

In this practical example, we will take a look to the paper “*Explaining And Harnessing Adversarial Examples*” published by Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy [2], where the “**fast gradient sign method (FGSM)** ” is defined. This method proposes the creation of a vector.

Where is **θ** is the model, **x** the original image, **y** the label associated with **x** and **J(θ, x, y)** be the **cost function** used to train the neural network. This comes as a solution of maximizing **wX = wx + wη ** where **X** is a alteration of the original image **x**, **X = x + η**.

PyTorch has implementations of different CNNs, all vulnerable to this type of misclassification attack [3]. All of them pre-trained with the widely used ImageNet Dataset.

Our implementation [4] of this attack uses the **PyTorch’s** pre-trained **GoogLeNet** to generate a misclassified image of a Giant Panda that it is not discernible for humans. The image of the misclassification vector is amplified for demonstration purposes

**Attacks mitigations**

Although these kinds of attacks are inherent to the type of algorithm used to train the **NN**, there are methods to detect and protect against it. Studies have shown, that the statistical distribution of data corresponding to this, and other types of adversarial examples can be detected. Increasing the accuracy and reliability of our solutions.

Sources

[1] https://en.wikipedia.org/wiki/Gradient_descent

[2] https://arxiv.org/pdf/1412.6572.pdf

[3] https://pytorch.org/docs/stable/torchvision/models.html

[4] https://github.com/pucarasec/Fast_gradient_sign_method

## One thought on “Deceiving machine learning models – The final frontiers of code review.”