Verification and Validation for AI: Learning process verification
Testing and Understanding Model Performance
The model was trained using fast gradient sign method (FGSM) adversarial training, which is a method for training networks so that they are robust to adversarial examples. After training the model, particularly following adversarial training, it is crucial to assess its accuracy using an independent test set. The model we developed achieved an accuracy exceeding 90%, which not only meets our predefined requirement but also surpasses the benchmarks reported in the foundational research for comparable neural networks. To gain a more nuanced understanding of the model’s performance, we examine the confusion matrix, which sheds light on the types of errors the model makes. Figure 2: Confusion chart for the adversarially-trained model. Explainability techniques like Grad-CAM offer a visual understanding of the influential regions in the input image that drive the model’s predictions, enhancing interpretability and trust in the AI model’s decision-making process. Grad-CAM highlights the regions of the input image that contributed most to the final prediction. Figure 3: Understanding network predictions using Gradient-weighted Class Activation Mapping (Grad-CAM).Verify Robustness of Deep Learning Models
Adversarial Examples
Robustness of the AI model is one of the main concerns when deploying neural networks in safety-critical situations. It has been shown that neural networks can misclassify inputs due to small imperceptible changes. Consider the case of an X-ray image that a model correctly identifies as indicative of pneumonia. When a subtle perturbation is applied to this image (that is, a small change is applied to each pixel of the image), the model’s output shifts, erroneously classifying the X-ray as normal. Figure 4: Adversarial examples: effect of input perturbation to image classification.L-infinity norm
To understand and quantify these perturbations, we turn to the concept of the l-infinity norm. Imagine you have a chest X-ray image. A perturbation with an l-infinity norm of, say, 5 means adding or subtracting any number from 0 to 5 to any number of pixels. In one scenario, you might add 5 to every pixel within a specific image region. Alternatively, you could adjust various pixels by different values within the range of -5 to 5 or alter just a single pixel. Figure 5: L-infinity norm: examples of possible input perturbations. However, the challenge is that we need to account for all possible combinations of perturbations within the -5 to 5 range, which essentially presents us with an infinite number of scenarios to test. To navigate this complexity, we employ formal verification methods, which provide a systematic approach to testing and ensuring the robustness of our neural network against a vast landscape of potential adversarial examples.Formal verification
Given one of the images in the test set, we can choose a perturbation that defines a collection of perturbed images for this specific image. It is important to note that this collection of images is extremely large (the images depicted in the volume in Figure 5 are just a representative sample), and it is not practical to test each perturbed image individually. Deep Learning Toolbox Verification Library allows you to verify and test robustness of deep learning networks using formal verification methods, such as abstract interpretation. The library enables you to verify whether the network you have trained is adversarially robust with respect to the class label provided an input perturbation. Figure 6: Formal verification using abstract interpretation. Formal verification methods offer a mathematical approach that may be used to have formal proof of the correctness of a system. It allows us to conduct rigorous tests across the entire volume of perturbed images to see if the network’s output is affected. There are three potential outcomes for each of the images:- Verified - The output label remains consistent.
- Violated - The output label changes.
- Unproven - Further verification efforts or model improvement is needed.
perturbation = 0.01; XLower = XTest - perturbation; XUpper = XTest + perturbation; XLower = dlarray(XLower,"SSCB"); XUpper = dlarray(XUpper,"SSCB");We are now ready to use the verifyNetworkRobustness function. We specify the trained network, the lower and upper bounds, and the ground truth labels for the images.
result = verifyNetworkRobustness(net,XLower,XUpper,TTest); summary(result)
verified 402
violated 13
unproven 209
The outcome reveals over 400 images verified, 13 violations, and more than 200 unproven results. We’ll have to go back to those images where the robustness test returned violated or unproven results and see if there is anything we can learn. But for over 400 images, we were able to formally prove that no adversarial example within a 1% perturbation range alters the network’s output—and that’s a significant assurance of robustness. Another question that we can answer with formal verification is if adversarial training contributed to network robustness. In the second post of the series, we began with a reference model and investigated various training techniques, ultimately adopting an adversarially trained model. Had we used the original network, we would have faced unproven results for nearly all images. And in a safety-critical context, you’ll likely need to treat the unproven results as violations. While data augmentation contributed to verification success, adversarial training enabled the verification of substantially more images, leading to a superiorly robust network that satisfies our robustness requirements. Figure 7: Comparing verification results from various trained networks.
评论
要发表评论,请点击 此处 登录到您的 MathWorks 帐户或创建一个新帐户。