Main Content

View Network Behavior Using tsne

This example shows how to use the tsne function to view activations in a trained network. This view can help you understand how a network works.

The tsne (Statistics and Machine Learning Toolbox) function in Statistics and Machine Learning Toolbox™ implements t-distributed stochastic neighbor embedding (t-SNE) [1]. This technique maps high-dimensional data (such as network activations in a layer) to two dimensions. The technique uses a nonlinear map that attempts to preserve distances. By using t-SNE to visualize the network activations, you can gain an understanding of how the network responds.

You can use t-SNE to visualize how deep learning networks change the representation of input data as it passes through the network layers. You can also use t-SNE to find issues with the input data and to understand which observations the network classifies incorrectly.

For example, t-SNE can reduce the multidimensional activations of a softmax layer to a 2-D representation with a similar structure. Tight clusters in the resulting t-SNE plot correspond to classes that the network usually classifies correctly. The visualization allows you to find points that appear in the wrong cluster, indicating an observation that the network classifies incorrectly. The observation might be labeled incorrectly, or the network might predict that an observation is an instance of a different class because it appears similar to other observations from that class. Note that the t-SNE reduction of the softmax activations uses only those activations, not the underlying observations.

Download Data Set

This example uses the Example Food Images data set, which contains 978 photographs of food in nine classes and is approximately 77 MB in size. Download the data set into your temporary directory by calling the downloadExampleFoodImagesData helper function; the code for this helper function appears at the end of this example.

dataDir = fullfile(tempdir,"ExampleFoodImageDataset");
url = "";

if ~exist(dataDir,"dir")

Downloading MathWorks Example Food Image dataset...
This can take several minutes to download...
Download finished...
Unzipping file...
Unzipping finished...

Train Network to Classify Food Images

Create an imageDatastore containing paths to the image data. Split the datastore into training and validation sets, using 65% of the data for training and the rest for validation. Because the data set is fairly small, overfitting is a significant issue. To minimize overfitting, augment the training set with random flips and scaling.

imds = imageDatastore(dataDir, ...

aug = imageDataAugmenter(RandXReflection=true, ...
    RandYReflection=true, ...
    RandXScale=[0.8 1.2], ...
    RandYScale=[0.8 1.2]);

trainingFraction = 0.65;
[trainImds,valImds] = splitEachLabel(imds,trainingFraction);

augImdsTrain = augmentedImageDatastore([227 227],trainImds, ...
augImdsVal = augmentedImageDatastore([227 227],valImds);
classNames = categories(trainImds.Labels);
numClasses = numel(classNames);

Load a pretrained SqueezeNet network and the corresponding class names. This requires the Deep Learning Toolbox™ Model for SqueezeNet Network support package. If this support package is not installed, then the software provides a download link. For a list of all available networks, see Pretrained Deep Neural Networks. To return a neural network ready for retraining for the new data, also specify the number of classes.

net = imagePretrainedNetwork("squeezenet",NumClasses=numClasses);

Create training options.

opts = trainingOptions("adam", ...
    InitialLearnRate=1e-4, ...
    MaxEpochs=50, ...
    ValidationData=augImdsVal, ...
    Plots="training-progress", ...
rng default

Train the neural network using the trainnet function. For classification, use the cross-entropy loss function. By default, the trainnet function uses a GPU if one is available. Training on a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). Otherwise, the trainnet function uses the CPU.

net = trainnet(augImdsTrain,net,"crossentropy",opts);

Classify Validation Data

Classify the test images. To make predictions with multiple observations, use the minibatchpredict function. To convert the prediction scores to labels, use the scores2label function. The minibatchpredict function automatically uses a GPU if one is available. Using a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). Otherwise, the function uses the CPU.

scores = minibatchpredict(net,augImdsVal);
YPred = scores2label(scores,classNames);


The network classifies several images well. The network appears to have trouble with sushi images, classifying many as sushi but some as pizza or hamburger. The network does not classify any images into the hot dog class.

Compute Activations for Several Layers

To continue to analyze the network performance, compute activations for every observation in the data set at an early max pooling layer, the final convolutional layer, and the final softmax layer. Output the activations as an NxM matrix, where N is the number of observations and M is the number of dimensions of the activation. M is the product of spatial and channel dimensions. Each row is an observation, and each column is a dimension. At the softmax layer M = 9, because the food data set has nine classes. Each row in the matrix contains nine elements, corresponding to the probabilities that an observation belongs to each of the nine classes of food.

earlyLayerName = "pool1";
finalConvLayerName = "conv10";
softmaxLayerName = "prob_flatten";
pool1Activations = minibatchpredict(net,...
finalConvActivations = minibatchpredict(net,...
softmaxActivations = minibatchpredict(net,...

Reshape activation matrices in 2-D. The t-SNE function requires 2-D inputs.

numValObservations = augImdsVal.numobservations;
pool1Activations = reshape(pool1Activations,numValObservations,[]);
finalConvActivations = reshape(finalConvActivations,numValObservations,[]);
softmaxActivations = reshape(softmaxActivations,numValObservations,[]);

Ambiguity of Classifications

You can use the softmax activations to calculate the image classifications that are most likely to be incorrect. Define the ambiguity of a classification as the ratio of the second-largest probability to the largest probability. The ambiguity of a classification is between zero (nearly certain classification) and 1 (nearly as likely to be classified to the most likely class as the second class). An ambiguity of near 1 means the network is unsure of the class in which a particular image belongs. This uncertainty might be caused by two classes whose observations appear so similar to the network that it cannot learn the differences between them. Or, a high ambiguity can occur because a particular observation contains elements of more than one class, so the network cannot decide which classification is correct. Note that low ambiguity does not necessarily imply correct classification; even if the network has a high probability for a class, the classification can still be incorrect.

[R,RI] = maxk(softmaxActivations,2,2);
ambiguity = R(:,2)./R(:,1);

Find the most ambiguous images.

[ambiguity,ambiguityIdx] = sort(ambiguity,"descend");

View the most probable classes of the ambiguous images and the true classes.

classList = unique(valImds.Labels);
top10Idx = ambiguityIdx(1:10);
top10Ambiguity = ambiguity(1:10);
mostLikely = classList(RI(ambiguityIdx,1));
secondLikely = classList(RI(ambiguityIdx,2));
    VariableNames=["Image #","Ambiguity","Likeliest","Second","True Class"])
ans=10×5 table
    Image #    Ambiguity     Likeliest        Second         True Class 
    _______    _________    ___________    _____________    ____________

      180       0.96684     sashimi        pizza            pizza       
       26       0.95833     sushi          pizza            french_fries
      291       0.92104     sashimi        hamburger        sashimi     
      286       0.91898     pizza          hot_dog          sashimi     
      322       0.88626     hot_dog        sushi            sushi       
       89       0.87801     hamburger      french_fries     hamburger   
      242        0.8727     greek_salad    sushi            pizza       
       84       0.85619     greek_salad    sushi            greek_salad 
      245       0.85266     greek_salad    pizza            pizza       
      145       0.84436     sushi          caprese_salad    hamburger   

The network predicts that image 27 is most likely hamburger or pizza. However, this image is actually French fries. View the image to see why this misclassification might occur.

v = 27;
title(sprintf("Observation: %i\n" + ...
    "Actual: %s. Predicted: %s", v, ...
    string(valImds.Labels(v)),string(YPred(v))), ...

The image contains several distinct regions, some of which might confuse the network.

Compute 2-D Representations of Data Using t-SNE

Calculate a low-dimensional representation of the network data for an early max pooling layer, the final convolutional layer, and the final softmax layer. Use the tsne function to reduce the dimensionality of the activation data from M to 2. The larger the dimensionality of the activations, the longer the t-SNE computation takes. Therefore, computation for the early max pooling layer, where activations have 200,704 dimensions, takes longer than for the final softmax layer. Set the random seed for reproducibility of the t-SNE result.

rng default
pool1tsne = tsne(pool1Activations);
finalConvtsne = tsne(finalConvActivations);
softmaxtsne = tsne(softmaxActivations);

Compare Network Behavior for Early and Later Layers

The t-SNE technique tries to preserve distances so that points near each other in the high-dimensional representation are also near each other in the low-dimensional representation. As shown in the confusion matrix, the network is effective at classifying into different classes. Therefore, images that are semantically similar (or of the same type), such as caesar salad and caprese salad, are near each other in the softmax activations space. t-SNE captures this proximity in a 2-D representation that is easier to understand and plot than the nine-dimensional softmax scores.

Early layers tend to operate on low-level features such as edges and colors. Deeper layers have learned high-level features with more semantic meaning, such as the difference between a pizza and a hot dog. Therefore, activations from early layers do not show any clustering by class. Two images that are similar pixelwise (for example, they both contain a lot of green pixels) are near each other in the high-dimensional space of the activations, regardless of their semantic contents. Activations from later layers tend to cluster points from the same class together. This behavior is most pronounced at the softmax layer and is preserved in the two-dimensional t-SNE representation.

Plot the t-SNE data for the early max pooling layer, the final convolutional layer, and the final softmax layer using the gscatter function. Observe that the early max pooling activations do not exhibit any clustering between images of the same class. Activations of the final convolutional layer are clustered by class to some extent, but less so than the softmax activations. Different colors correspond to observations of different classes.

doLegend = "off";
markerSize = 7;

gscatter(pool1tsne(:,1),pool1tsne(:,2),valImds.Labels, ...
title("Max pooling \newline activations");

gscatter(finalConvtsne(:,1),finalConvtsne(:,2),valImds.Labels, ...
title("Final conv \newline activations");

gscatter(softmaxtsne(:,1),softmaxtsne(:,2),valImds.Labels, ...
title("Softmax \newline activations");

Explore Observations in t-SNE Plot

Create a larger plot of the softmax activations, including a legend labeling each class. From the t-SNE plot, you can understand more about the structure of the posterior probability distribution.

For example, the plot shows a distinct, separate cluster of French fries observations, whereas the sashimi and sushi clusters are not resolved very well. Similar to the confusion matrix, the plot suggests that the network is more accurate at predicting into the French fries class.

numClasses = length(classList);
colors = lines(numClasses);
h = figure;

l = legend;
l.Interpreter = "none";
l.Location = "bestoutside";

You can also use t-SNE to determine which images are misclassified by the network and why. Incorrect observations are often isolated points of the wrong color for their surrounding cluster. For example, a misclassified image of hamburger is very near the French fries region (the green dot nearest the center of the orange cluster). This dot is observation 99. Circle this observation on the t-SNE plot, and display the image with imshow.

obs = 99;
hold on;
hs = scatter(softmaxtsne(obs, 1), softmaxtsne(obs, 2), ...
l.String{end} = "Hamburger";
hold off;
title(sprintf("Observation: %i\n" + ...
    "Actual: %s. Predicted: %s",obs, ...
    string(valImds.Labels(obs)),string(YPred(obs))), ...

If an image contains multiple types of food, the network can get confused. In this case, the network classifies the image as French fries even though the food in the foreground is hamburger. The French fries visible at the edge of the image cause the confusion.

Similarly, the ambiguous image 27 (shown earlier in the example) has multiple regions. Examine the t-SNE plot highlighting the ambiguous aspect of this French fries image.

obs = 27;
hold on;
h = scatter(softmaxtsne(obs, 1), softmaxtsne(obs, 2), ...
l.String{end} = "French Fries";
hold off;

The image is not in a well-defined cluster in the plot, which indicates that the classification is likely incorrect. The image is far from the French fries cluster, and close to the hamburger cluster.

The why of a misclassification must be provided by other information, typically a hypothesis based on the contents of the image. You can then test the hypothesis using other data, or using tools that indicate which spatial regions of an image are important to network classification. For examples, see occlusionSensitivity and Grad-CAM Reveals the Why Behind Deep Learning Decisions.


[1] van der Maaten, Laurens, and Geoffrey Hinton. "Visualizing Data using t-SNE." Journal of Machine Learning Research 9, 2008, pp. 2579–2605.

Helper Function

function downloadExampleFoodImagesData(url, dataDir)
% Download the Example Food Image data set, containing 978 images of
% different types of food split into 9 classes.

% Copyright 2019 The MathWorks, Inc.

fileName = "";
fileFullPath = fullfile(dataDir, fileName);

% Download the .zip file into a temporary directory.
if ~exist(fileFullPath, "file")
    fprintf("Downloading MathWorks Example Food Image dataset...\n");
    fprintf("This can take several minutes to download...\n");
    websave(fileFullPath, url);
    fprintf("Download finished...\n");
    fprintf("Skipping download, file already exists...\n");

% Unzip the file.
% Check if the file has already been unzipped by checking for the presence
% of one of the class directories.
exampleFolderFullPath = fullfile(dataDir, "pizza");
if ~exist(exampleFolderFullPath, "dir")
    fprintf("Unzipping file...\n");
    unzip(fileFullPath, dataDir);
    fprintf("Unzipping finished...\n");
    fprintf("Skipping unzipping, file already unzipped...\n");


See Also

| | | | | | | | | (Statistics and Machine Learning Toolbox)

Related Topics