Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

convolution2dLayer

Create 2-D convolutional layer

Syntax

convlayer = convolution2dLayer(filterSize,numFilters)
convlayer = convolution2dLayer(filterSize,numFilters,Name,Value)

Description

convlayer = convolution2dLayer(filterSize,numFilters) returns a layer for 2-D convolution.

example

convlayer = convolution2dLayer(filterSize,numFilters,Name,Value) returns the convolutional layer, with additional options specified by one or more Name,Value pair arguments.

Examples

collapse all

Create a convolutional layer with 96 filters, each with a height and width of 11. Use a stride (step size) of 4 in the horizontal and vertical directions.

convlayer = convolution2dLayer(11,96,'Stride',4);

Create a convolutional layer with 32 filters, each with a height and width of 5. Pad the input image with 2 pixels along its border. Set the learning rate factor for the bias to 2. Manually initialize the weights from a Gaussian distribution with a standard deviation of 0.0001.

layer = convolution2dLayer(5,32,'Padding',2,'BiasLearnRateFactor',2);

Suppose the input has color images. Manually initialize the weights from a Gaussian distribution with standard deviation of 0.0001.

layer.Weights = randn([5 5 3 32])*0.0001;

The size of the local regions in the layer is 5-by-5. The number of color channels for each region is 3. The number of feature maps is 32 (the number of filters). Therefore, there are 5*5*3*32 weights in the layer.

randn([5 5 3 32]) returns a 5-by-5-by-3-by-32 array of values from a Gaussian distribution with a mean of 0 and a standard deviation of 1. Multiplying the values by 0.0001 sets the standard deviation of the Gaussian distribution equal to 0.0001.

Similarly, initialize the biases from a Gaussian distribution with a mean of 1 and a standard deviation of 0.00001.

layer.Bias = randn([1 1 32])*0.00001+1;

There are 32 feature maps, and therefore 32 biases. randn([1 1 32]) returns a 1-by-1-by-32 array of values from a Gaussian distribution with a mean of 0 and a standard deviation of 1. Multiplying the values by 0.00001 sets the standard deviation of values equal to 0.00001, and adding 1 sets the mean of the Gaussian distribution equal to 1.

Suppose the size of the input image is 28-by-28-1. Create a convolutional layer with 16 filters that have a height of 6 and a width of 4, that traverses the image with a stride of 4 both horizontally and vertically. Make sure the convolution covers the images completely.

For the convolution to fully cover the input image, both the horizontal and vertical output dimensions must be integer numbers. For the horizontal output dimension to be an integer, one row zero padding is required on the top and bottom of the image: (28 – 6+ 2*1)/4 + 1 = 7. For the vertical output dimension to be an integer, no zero padding is required: (28 – 4+ 2*0)/4 + 1 = 7. Construct the convolutional layer as follows:

convlayer = convolution2dLayer([6 4],16,'Stride',4,'Padding',[1 0]);

Input Arguments

collapse all

Height and width of the filters, specified as an integer value or a vector of two integer values. filterSize defines the size of the local regions to which the neurons connect in the input.

  • If filterSize is a scalar value, then the filters have the same height and width.

  • If filterSize is a vector, then it must be of the form [h w], where h is the height and w is the width.

Example: [5,5]

Data Types: single | double

Number of filters, specified as an integer value. numFilters represents the number of neurons in the convolutional layer that connect to the same region in the input. This parameter determines the number of channels (feature maps) in the output of the convolutional layer.

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'WeightInitializer',0.05,'WeightLearnRateFactor',1.5,'Name','conv1' specifies the initial value of weights as 0.05, the learning rate for this layer as 1.5 times the global learning rate, and the name of the layer as conv1.

collapse all

Step size for traversing the input vertically and horizontally, specified as the comma-separated pair consisting of Stride and a scalar value or a vector of two scalar values.

  • If Stride is a scalar value, then the software uses the same value for both dimensions.

  • If Stride is a vector, then it must be of the form [u,v], where u is the vertical stride and v is the horizontal stride.

Example: For 'Stride',[2,3], the software first moves to the third observation horizontally as it moves through the input. Once it covers the input horizontally, it moves to the second observation vertically and again covers the input horizontally with horizontal strides of 3. It repeats this process until it moves through the whole input.

Data Types: single | double

Size of zero padding to apply to input borders vertically and horizontally, specified as the comma-separated pair consisting of Padding and a scalar value or a vector of two scalar values.

  • If Padding is a scalar value, then the software uses the same value for both dimensions.

  • If Padding is a vector, then it must be of the form [a,b], where a is the padding to be applied to the top and the bottom of the input data and b is the padding to be applied to the left and right.

Note that the padding dimensions (Padding) must be less than the pooling region dimensions (poolSize).

Example: To add one row of zeros to the top and bottom, and one column of zeros to the left and right of the input data, specify 'Padding',[1,1].

Data Types: single | double

Number of channels (also referred to as feature maps) for each filter, specified as the comma-separated pair consisting of NumChannels and 'auto' or an integer value.

This parameter is always equal to the channels of the input to this convolutional layer. For example, if the input is a color image, then the number of channels for the input is 3. If the number of filters for the convolutional layer prior to the current layer is 16, then the number of channels for this layer is 16.

If 'NumChannels' is auto, then the software infers the correct value for the number of channels during training time.

Example: 'NumChannels',256

Data Types: single | double | char

Multiplier for the learning rate of the weights, specified as the comma-separated pair consisting of WeightLearnRateFactor and a scalar value.

trainNetwork multiplies this factor with the global learning rate to determine the learning rate for the weights in this layer. It determines the global learning rate based on the settings specified using trainingOptions.

Example: 'WeightLearnRateFactor',2 specifies that the learning rate for the weights in this layer is twice the global learning rate.

Data Types: single | double

Multiplier for the learning rate of the bias, specified as the comma-separated pair consisting of BiasLearnRateFactor and a scalar value.

The software multiplies this factor with the global learning rate to determine the learning rate for the bias in this layer.

The software determines the global learning rate based on the settings specified using the trainingOptions function.

Example: 'BiasLearnRateFactor',2 specifies that the learning rate for the bias in this layer is twice the global learning rate.

Data Types: single | double

L2 regularization factor for the weights, specified as the comma-separated pair consisting of WeightL2Factor and a scalar value.

The software multiplies this factor with the global L2 regularization factor to determine the learning rate for the weights in this layer.

You can specify the global L2 regularization factor using the trainingOptions function.

Example: 'WeightL2Factor',2 specifies that the L2 regularization for the weights in this layer is twice the global L2 regularization factor.

Data Types: single | double

Multiplier for the L2 weight regularizer for the biases, specified as the comma-separated pair consisting of BiasL2Factor and a scalar value.

You can specify the global L2 regularization factor using the trainingOptions function.

Example: 'BiasL2Factor',2 specifies that the L2 regularization for the bias in this layer is twice the global L2 regularization factor.

Data Types: single | double

Name for the layer, specified as the comma-separated pair consisting of Name and a character vector.

Example: 'Name','conv2'

Data Types: char

Output Arguments

collapse all

2-D convolutional layer for convolutional neural networks, returned as a Convolution2DLayer object.

For information on concatenating layers to construct convolutional neural network architecture, see Layer.

More About

collapse all

Convolutional Layer

A convolutional layer consists of neurons that connect to small regions of the input or the layer before it. These regions are called filters. You can specify the size of these regions using the filterSize input argument.

For each region, the software computes a dot product of the weights and the input, and then adds a bias term. The filter then moves along the input vertically and horizontally, repeating the same computation for each region, i.e., convolving the input. The step size with which it moves is called a stride. You can specify this step size with the Stride name-value pair argument. These local regions that the neurons connect to might overlap depending on the filterSize and Stride.

The number of weights used for a filter is h*w*c, where h is the height, and w is the width of the filter size, and c is the number of channels in the input (for example, if the input is a color image, the number of channels is three). As a filter moves along the input, it uses the same set of weights and bias for the convolution, forming a feature map. The convolution layer usually has multiple feature maps, each with a different set of weights and a bias. The number of feature maps is determined by the number of filters.

The total number of parameters in a convolutional layer is ((h*w*c + 1)*Number of Filters), where 1 is for the bias.

The output height and width of the convolutional layer is (Input SizeFilter Size + 2*Padding)/Stride + 1. This value must be an integer for the whole image to be fully covered. If the combination of these parameters does not lead the image to be fully covered, the software by default ignores the remaining part of the image along the right and bottom edge in the convolution.

The total number of neurons in a feature map, say Map Size, is the product of the output height and width. The total number of neurons (output size) in a convolutional layer, then, is Map Size*Number of Filters.

For example, suppose that the input image is a 28-by-28-by-3 color image. For a convolutional layer with 16 filters, and a filter size of 8-by-8, the number of weights per filter is 8*8*3 = 192, and the total number of parameters in the layer is (192+1) * 16 = 3088. Assuming stride is 4 in each direction, the total number of neurons in each feature map is 6-by-6 ((28 – 8+0)/4 + 1 = 6). Then, the total number of neurons in the layer is 6*6*16 = 256. Usually, the results from these neurons pass through some form of nonlinearity, such as rectified linear units (ReLU).

Algorithms

The default for the initial weights is a Gaussian distribution with mean 0 and standard deviation 0.01. The default for the initial bias is 0. You can manually change the initialization for the weights and bias. See Specify Initial Weight and Biases in Convolutional Layer.

References

[1] LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D., et al. ''Handwritten Digit Recognition with a Back-propagation Network.'' In Advances of Neural Information Processing Systems, 1990.

[2] LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner. ''Gradient-based Learning Applied to Document Recognition.'' Proceedings of the IEEE. Vol 86, pp. 2278–2324, 1998.

[3] Murphy, K. P. Machine Learning: A Probabilistic Perspective. Cambridge, Massachusetts: The MIT Press, 2012.

Introduced in R2016a

Was this topic helpful?