This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.


Create SegNet layers for semantic segmentation


lgraph = segnetLayers(imageSize,numClasses,model)
lgraph = segnetLayers(imageSize,numClasses,encoderDepth)
lgraph = segnetLayers(imageSize,numClasses,encoderDepth,Name,Value)



lgraph = segnetLayers(imageSize,numClasses,model) returns SegNet layers, lgraph, that is preinitialized with layers and weights from a pretrained model.

SegNet is a convolutional neural network for semantic image segmentation. The network uses a pixelClassificationLayer to predict the categorical label for every pixel in an input image.

Use segnetLayers to create the network architecture for SegNet. You must train the network using the Neural Network Toolbox™ function trainNetwork.

lgraph = segnetLayers(imageSize,numClasses,encoderDepth) returns uninitialized SegNet layers configured using the specified encoder depth.

lgraph = segnetLayers(imageSize,numClasses,encoderDepth,Name,Value) returns a SegNet layer with additional options specified by one or more Name,Value pair arguments.


collapse all

Load training images and pixel labels.

  dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
  imageDir = fullfile(dataSetDir,'trainingImages');
  labelDir = fullfile(dataSetDir,'trainingLabels');

Create an imageDatastore holding the training images.

  imds = imageDatastore(imageDir);

Define the class names and their associated label IDs.

  classNames = ["triangle", "background"];
  labelIDs   = [255 0];

Create a pixelLabelDatastore holding the ground truth pixel labels for the training images.

  pxds = pixelLabelDatastore(labelDir, classNames, labelIDs);

Create SegNet.

  imageSize = [32 32];
  numClasses = 2;
  lgraph = segnetLayers(imageSize,numClasses,2)
lgraph = 
  LayerGraph with properties:

         Layers: [31×1 nnet.cnn.layer.Layer]
    Connections: [34×2 table]

Create data source for training a semantic segmentation network.

  datasource = pixelLabelImageSource(imds,pxds);

Setup training options.

  options = trainingOptions('sgdm','InitialLearnRate',1e-3, ...

Train network.

  net = trainNetwork(datasource, lgraph, options)
Training on single CPU.
Initializing image normalization.
|     Epoch    |   Iteration  | Time Elapsed |  Mini-batch  |  Mini-batch  | Base Learning|
|              |              |  (seconds)   |     Loss     |   Accuracy   |     Rate     |
|            1 |            1 |       105.55 |       0.7456 |       47.75% |       0.0010 |
|           10 |           10 |       969.67 |       0.7211 |       58.76% |       0.0010 |
|           20 |           20 |      1894.39 |       0.6786 |       70.07% |       0.0010 |
net = 
  DAGNetwork with properties:

         Layers: [31×1 nnet.cnn.layer.Layer]
    Connections: [34×2 table]

Display network.


Create SegNet layers with an encoder/decoder depth of 4.

imageSize = [480 640 3];
numClasses = 5;
encoderDepth = 4;
lgraph = segnetLayers(imageSize,numClasses,encoderDepth)
lgraph = 
  LayerGraph with properties:

         Layers: [59×1 nnet.cnn.layer.Layer]
    Connections: [66×2 table]

Display network.


Input Arguments

collapse all

Network input image size, specified as a:

  • 2-element vector in the format [height, width].

  • 3-element vector in the format [height, width, depth]. depth is the number of image channels. Set depth to 3 for RGB images, or 1 for grayscale images.

Number of classes in the semantic segmentation, specified as an integer greater than 1.

Pretrained network model, specified as 'vgg16' or 'vgg19'. These models have an encoder depth of 5.

Encoder depth, specified as a positive integer.

SegNet is composed of an encoder and corresponding decoder subnetwork. The depth of these networks determines the number of times the input image is downsampled or upsampled as it is processed. The encoder network downsamples the input image by a factor of 2D, where D is the value of encoderDepth. The decoder network upsamples the encoder network output by a factor of 2D.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'NumConvolutionLayers',1

collapse all

Number of convolutional layers in each encoder and decoder section, specified as a positive integer or vector of positive integers.

scalarThe same number of layers is used for all encoder and decoder sections.
vectorThe kth element of NumConvolutionLayers is the number of convolution layers in the kth encoder section and corresponding decoder section. Typical values are in the range [1, 3].

Number of output channels for each section in the SegNet encoder network, specified as a positive integer or vector of positive integers. segnetLayers sets the number of output channels in the decoder to match the corresponding encoder section.

scalarThe same number of output channels is used for all encoder and decoder sections.
vectorThe kth element of NumOutputChannels is the number of output channels of the kth encoder section and corresponding decoder section.

Convolutional layer filter size, specified as a positive odd integer or a 2-element row vector of positive odd integers. Typical values are in the range [3, 7].

scalarThe filter is square.
2-element row vector

The filter has the size [height width].

Output Arguments

collapse all

Network layers, returned as a DAGNetwork.


  • The sections within the SegNet encoder and decoder subnetworks are made up of convolutional, batch normalization, and ReLU layers.

  • All convolutional layers are configured such that the bias term is fixed to zero.

  • Convolution layer weights in the encoder and decoder subnetworks are initialized using the 'MSRA' weight initialization method. For 'vgg16' or 'vgg19' models, only the decoder subnetwork is initialized using MSRA.[1]


[1] He, K., X. Zhang, S. Ren, and J. Sun. "Delving Deep Into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification." Proceedings of the IEEE International Conference on Computer Vision. 2015, 1026–1034.

[2] Badrinarayanan, V., A. Kendall, and R. Cipolla. "Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." arXiv. Preprint arXiv: 1511.0051, 2015.

Introduced in R2017b