This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materials including this page, select Japan from the country navigator on the bottom of this page.


Semantic image segmentation using deep learning


C = semanticseg(I,network)
[C,score,allScores] = semanticseg(I,network)
[___] = semanticseg(I,network,roi)
pxds = semanticseg(imds,network)
[___] = semanticseg(___,Name,Value)



C = semanticseg(I,network) returns a semantic segmentation of the input image using deep learning. The input network must be either a SeriesNetwork or DAGNetwork object.

This function supports parallel computing using multiple MATLAB® workers when processing an ImageDatastore object. You can enable parallel computing using the preferences dialog.

[C,score,allScores] = semanticseg(I,network) returns a semantic segmentation of the input image with the classification scores for each categorical label in C. The scores are returned in a categorical array that corresponds to each pixel in the input image. allScores contains the scores for all label categories that the input network can classify.

[___] = semanticseg(I,network,roi) returns a semantic segmentation for a rectangular subregion of the input image.

pxds = semanticseg(imds,network) returns the semantic segmentation for a collection of images in imds, an ImageDatastore object.

[___] = semanticseg(___,Name,Value) returns semantic segmentation with additional options specified by one or more Name,Value pair arguments.


collapse all

Overlay segmentation results on image and display the results.

Load a pretrained network.

data = load('triangleSegmentationNetwork');
net =
net = 
  SeriesNetwork with properties:

    Layers: [10x1 nnet.cnn.layer.Layer]

List the network layers.

ans = 
  10x1 Layer array with layers:

     1   'imageinput'        Image Input                  32x32x1 images with 'zerocenter' normalization
     2   'conv_1'            Convolution                  64 3x3x1 convolutions with stride [1  1] and padding [1  1  1  1]
     3   'relu_1'            ReLU                         ReLU
     4   'maxpool'           Max Pooling                  2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     5   'conv_2'            Convolution                  64 3x3x64 convolutions with stride [1  1] and padding [1  1  1  1]
     6   'relu_2'            ReLU                         ReLU
     7   'transposed-conv'   Transposed Convolution       64 4x4x64 transposed convolutions with stride [2  2] and output cropping [1  1]
     8   'conv_3'            Convolution                  2 1x1x64 convolutions with stride [1  1] and padding [0  0  0  0]
     9   'softmax'           Softmax                      softmax
    10   'classoutput'       Pixel Classification Layer   Class weighted cross-entropy loss with classes 'triangle' and 'background'

Read and display the test image.

I = imread('triangleTest.jpg');

Perform semantic image segmentation.

[C,scores] = semanticseg(I,net);

Overlay segmentation results on the image and display the results.

B = labeloverlay(I, C);

Display the classification scores.

axis square

Create a binary mask with only the triangles.

BW = C == 'triangle';

Load pretrained network.

data = load('triangleSegmentationNetwork');
net =;

Load test images using imageDatastore.

dataDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
testImageDir = fullfile(dataDir,'testImages');
imds = imageDatastore(testImageDir)
imds = 
  ImageDatastore with properties:

       Files: {
              ' .../toolbox/vision/visiondata/triangleImages/testImages/image_001.jpg';
              ' .../toolbox/vision/visiondata/triangleImages/testImages/image_002.jpg';
              ' .../toolbox/vision/visiondata/triangleImages/testImages/image_003.jpg'
               ... and 97 more
    ReadSize: 1
      Labels: {}
     ReadFcn: @readDatastoreImage

Load ground truth test labels.

testLabelDir = fullfile(dataDir,'testLabels');
classNames = ["triangle" "background"];
pixelLabelID = [255 0];
pxdsTruth = pixelLabelDatastore(testLabelDir,classNames,pixelLabelID);

Run semantic segmentation on all of the test images.

pxdsResults = semanticseg(imds,net,'WriteLocation',tempdir);
Running semantic segmentation network
* Processing 100 images.
* Progress: 100.00%

Compare results against ground truth.

metrics = evaluateSemanticSegmentation(pxdsResults,pxdsTruth)
Evaluating semantic segmentation results
----------------------------------[==================================================] 100%
Elapsed time: 00:00:06
Estimated time remaining: 00:00:00
* Finalizing... Done.
* Data set metrics:

    GlobalAccuracy    MeanAccuracy    MeanIoU    WeightedIoU    MeanBFScore
    ______________    ____________    _______    ___________    ___________

    0.90624           0.95085         0.61588    0.87529        0.40652    
metrics = 
  semanticSegmentationMetrics with properties:

              ConfusionMatrix: [2x2 table]
    NormalizedConfusionMatrix: [2x2 table]
               DataSetMetrics: [1x5 table]
                 ClassMetrics: [2x3 table]
                 ImageMetrics: [100x5 table]

Input Arguments

collapse all

Input image, specified as a single image or a 4-D array of images. The first three dimensions of the array index the height, width, and channels of an image. The 4th dimension indexes the individual image.

Data Types: uint8 | uint16 | int16 | double | single | logical

Network, specified as either a SeriesNetwork or a DAGNetwork object.

Region of interest, specified as a 4-element vector in the format [x,y,width,height]. The vector defines a rectangular region of interest fully contained in the input image. Image pixels outside the region of interest are assigned the <undefined> categorical label. If you use a 4-D input image, the function applies the same roi to all images.

Collection of images, specified as an ImageDatastore object. The function returns the semantic segmentation as a categorical array that relates a label to each pixel in the input image.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'ExecutionEnvironment','gpu'

collapse all

Returned segmentation type, specified as either 'categorical' or 'double'. When you select 'double', the function returns the segmentation results as a label matrix containing label IDs. The IDs are integer values that correspond to the class names defined in the pixelClassificationLayer used in the input network.

The OutputType property cannot be used with an ImageDatastore object input.

Group of images, specified as an integer. Images are grouped and processed together as a batch. They are used for processing a large collection of images and they improve computational efficiency. Increasing the MiniBatchSize value increases the efficiency, but it also takes up more memory.

Hardware resource used to process images with a network, specified as 'auto', 'gpu', or 'cpu'.

'auto'Use a GPU if available. Otherwise, use the CPU. To use a GPU, (requires Parallel Computing Toolbox™, and a CUDA®-enabled NVIDIA® GPU with compute capability 3.0 or higher.)
'gpu'Use the GPU. If a suitable GPU is not available, the function returns an error message.
'cpu'Use the CPU.

Folder location, specified as pwd (your current working folder), a string scalar, or a character vector. The specified folder must exist and have write permissions.

This property applies only when using an ImageDatastore object input.

Prefix applied to output file names, specified as a string scalar or character vector. The image files are named as follows:

  • prefix_N.png, where N corresponds to the index of the input image file, imds.Files(N).

This property applies only when using an ImageDatastore object input.

Display progress information, specified as 'true' or 'false'.

This property applies only when using an ImageDatastore object input.

Output Arguments

collapse all

Categorical labels, returned as a 2-D categorical array. The elements of the label array correspond to the pixel elements of the input image. If you selected an ROI, then the labels are limited to the area within the ROI. Image pixels outside the region of interest are assigned the <undefined> categorical label.

Semantic segmentation results, returned as a pixelLabelDatastore object. The object contains the semantic segmentation results for all the images contained in the imds input object. The result for each image is saved as separate uint8 label matrices of PNG images. You can use read(pxds) to return the categorical labels assigned to the images in imds.

Classification scores for each categorical label in C, returned as categorical array. The scores represents the confidence in the predicted labels C. The classification score of pixel(i,j) relates to score(i,j).

Scores for all label categories the input network is capable of classifying, returned as a 4-D array. The first three dimensions represent the height, width, and number of categories in C. The fourth dimension indexes each individual image.

Introduced in R2017b