Verification and Validation for AI: Learning process verification

Testing and Understanding Model Performance
The model was trained using fast gradient sign method (FGSM) adversarial training, which is a method for training networks so that they are robust to adversarial examples. After training the model, particularly following adversarial training, it is crucial to assess its accuracy using an independent test set. The model we developed achieved an accuracy exceeding 90%, which not only meets our predefined requirement but also surpasses the benchmarks reported in the foundational research for comparable neural networks. To gain a more nuanced understanding of the model’s performance, we examine the confusion matrix, which sheds light on the types of errors the model makes.

Verify Robustness of Deep Learning Models
Adversarial Examples
Robustness of the AI model is one of the main concerns when deploying neural networks in safety-critical situations. It has been shown that neural networks can misclassify inputs due to small imperceptible changes. Consider the case of an X-ray image that a model correctly identifies as indicative of pneumonia. When a subtle perturbation is applied to this image (that is, a small change is applied to each pixel of the image), the model’s output shifts, erroneously classifying the X-ray as normal.
L-infinity norm
To understand and quantify these perturbations, we turn to the concept of the l-infinity norm. Imagine you have a chest X-ray image. A perturbation with an l-infinity norm of, say, 5 means adding or subtracting any number from 0 to 5 to any number of pixels. In one scenario, you might add 5 to every pixel within a specific image region. Alternatively, you could adjust various pixels by different values within the range of -5 to 5 or alter just a single pixel.
Formal verification
Given one of the images in the test set, we can choose a perturbation that defines a collection of perturbed images for this specific image. It is important to note that this collection of images is extremely large (the images depicted in the volume in Figure 5 are just a representative sample), and it is not practical to test each perturbed image individually. Deep Learning Toolbox Verification Library allows you to verify and test robustness of deep learning networks using formal verification methods, such as abstract interpretation. The library enables you to verify whether the network you have trained is adversarially robust with respect to the class label provided an input perturbation.
- Verified - The output label remains consistent.
- Violated - The output label changes.
- Unproven - Further verification efforts or model improvement is needed.
perturbation = 0.01; XLower = XTest - perturbation; XUpper = XTest + perturbation; XLower = dlarray(XLower,"SSCB"); XUpper = dlarray(XUpper,"SSCB");We are now ready to use the verifyNetworkRobustness function. We specify the trained network, the lower and upper bounds, and the ground truth labels for the images.
result = verifyNetworkRobustness(net,XLower,XUpper,TTest); summary(result)
verified 402
violated 13
unproven 209
The outcome reveals over 400 images verified, 13 violations, and more than 200 unproven results. We’ll have to go back to those images where the robustness test returned violated or unproven results and see if there is anything we can learn. But for over 400 images, we were able to formally prove that no adversarial example within a 1% perturbation range alters the network’s output—and that’s a significant assurance of robustness. Another question that we can answer with formal verification is if adversarial training contributed to network robustness. In the second post of the series, we began with a reference model and investigated various training techniques, ultimately adopting an adversarially trained model. Had we used the original network, we would have faced unproven results for nearly all images. And in a safety-critical context, you’ll likely need to treat the unproven results as violations. While data augmentation contributed to verification success, adversarial training enabled the verification of substantially more images, leading to a superiorly robust network that satisfies our robustness requirements.
Out-of-Distribution Detection
A trustworthy AI system should produce accurate predictions in a known context. Still, it should also be able to identify unknown examples to the model and reject them or defer them to a human expert for safe handling. Deep Learning Toolbox Verification Library also includes functionality for out-of-distribution (OOD) detection. Consider a sample image from our test set. To evaluate the model’s ability to handle OOD data, we can derive new test sets by applying meaningful transformations to the original images, as shown in the following figure.

댓글
댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.