Main Content

Group normalization layer

A group normalization layer divides the channels of the input data into groups and normalizes the activations across each group. To speed up training of convolutional neural networks and reduce the sensitivity to network initialization, use group normalization layers between convolutional layers and nonlinearities, such as ReLU layers. You can perform instance normalization and layer normalization by setting the appropriate number of groups.

You can use a group normalization layer in place of a batch normalization layer. This is particularly useful when training with small batch sizes as it can increase the stability of training.

The layer first normalizes the activations of each group by subtracting the group mean and
dividing by the group standard deviation. Then, the layer shifts the input by a learnable
offset *β* and scales it by a learnable scale factor
*γ*.

creates a group normalization layer that divides the channels in the layer input into
`layer`

= groupNormalizationLayer(`numGroups`

)`numGroups`

groups and normalizes across each group.

creates a group normalization layer and sets the optional Normalization, Parameters and Initialization, Learn Rate and Regularization, and `layer`

= groupNormalizationLayer(`numGroups`

,`Name,Value`

)`Name`

properties using one or more name-value pair arguments.
You can specify multiple name-value pair arguments. Enclose each property name in
quotes.

A group normalization normalizes its inputs *x _{i}*
by first calculating the mean

$${\widehat{x}}_{i}=\frac{{x}_{i}-{\mu}_{g}}{\sqrt{{\sigma}_{g}^{2}+\epsilon}}$$

Here, *ϵ* (the property `Epsilon`

) improves numerical
stability when the group variance is very small. To allow for the possibility that inputs with
zero mean and unit variance are not optimal for the layer that follows the group normalization
layer, the group normalization layer further shifts and scales the activations as

$${y}_{i}=\gamma {\widehat{x}}_{i}+\beta .$$

Here, the offset *β* and scale factor *γ*
(`Offset`

and `Scale`

properties) are learnable
parameters that are updated during network training.

[1] Wu, Yuxin, and Kaiming He. “Group
Normalization.” *ArXiv:1803.08494 [Cs]*, June 11, 2018.
http://arxiv.org/abs/1803.08494.

`batchNormalizationLayer`

| `convolution2dLayer`

| `fullyConnectedLayer`

| `reluLayer`

| `trainingOptions`

| `trainNetwork`