Chapter 4

Transfer Learning for the Fault Detection Classifier

For the hex nut classifier, the MobileNet-v2 network is a good choice, as it is faster and has higher accuracy with a smaller footprint than many of the other options. We will reuse the feature extraction layers and replace the classification layers.

Using the Deep Network Designer app, we can complete the whole process:

  1. Drag and drop to add the new layers and create new connections.
  2. View and edit layer properties.
  3. Analyze the network to ensure that the architecture is defined correctly.
  4. Import the image data and select augmentation options.
  5. Visualize and monitor training progress.
  6. Generate plots to assess network accuracy.

Evaluating Network Performance

After the model classifies the images as good or bad, one way to find out whether it’s accurate is to generate a confusion matrix.

The confusion matrix shows the results of the model for each class and can show if there is any confusion between the classes.

Here we see that the true class matches the predicted class for all 40 test samples (20 good, 20 bad). The confusion matrix tends to give more insightful information when there are misclassifications and more than two classes of objects.

The confusion matrix answers some questions about the network performance, but not all. How do we know that the model is, in fact, classifying the right features? And how does the network ... work?

A deep learning network used to be considered a “black box” that offered little insight into why it predicted what it did.

Explainable AI techniques such as class activation mapping (CAM) and gradient class activation mapping (Grad-CAM) provide visual explanations of the CNN’s predictions.

Figure 1

Class Activation Mapping

Class activation mapping (CAM) generates a heatmap that highlights those parts of an input image that the network used to make its classification. You can use the results to reinforce correct predictions or to see why the network was “confused.”

For example, here are two CAM visualizations.

The mouse is clearly identifiable by the network as a mouse.

The mouse is clearly identifiable by the network as a mouse.

Figure 2 The network incorrectly classified the coffee mug as a buckle due to the presence of the watch.

The network incorrectly classified the coffee mug as a buckle due to the presence of the watch.

In the figure on the left, the presence of the keyboard helps the model determine that this is a picture of a mouse. That may or may not be a valid way make this classification.

In the figure on the right, the network classifies the image of a coffee cup as a buckle. The network detects and focuses on the watch wristband, not the coffee cup. Which is correct: the model or the ground truth?

You might assume the model is wrong when, in fact, it is working well but basing its predictions on the wrong features. You can correct bias in the training set by collecting data that is more representative of the scenarios you want the network to focus on.

We use CAM on our hex nut model. On the left is the captured image, on the right is the classification and CAM. 

Figure 2

If the network classified a nut as bad, it hones in on that particular section. As a result, we see not only that the images are being classified correctly but also why the model made its decisions.