# crossentropy

Cross-entropy loss for classification tasks

## Syntax

## Description

The cross-entropy operation computes the cross-entropy loss between network predictions and binary or one-hot encoded targets for single-label and multi-label classification tasks.

The `crossentropy`

function computes the cross-entropy loss between
predictions and targets represented as `dlarray`

data.
Using `dlarray`

objects makes working with high
dimensional data easier by allowing you to label the dimensions. For example, you can label
which dimensions correspond to spatial, time, channel, and batch dimensions using the
`"S"`

, `"T"`

, `"C"`

, and
`"B"`

labels, respectively. For unspecified and other dimensions, use the
`"U"`

label. For `dlarray`

object functions that operate
over particular dimensions, you can specify the dimension labels by formatting the
`dlarray`

object directly, or by using the `DataFormat`

option.

**Note**

To train with cross-entropy loss using the `trainnet`

function, set the loss function to `"crossentropy"`

.

returns the categorical cross-entropy loss between the formatted `loss`

= crossentropy(`Y`

,`targets`

)`dlarray`

object `Y`

containing the predictions and the target values
`targets`

for single-label classification tasks. The output
`loss`

is an unformatted `dlarray`

scalar.

For unformatted input data, use the `DataFormat`

argument.

specifies options using one or more name-value pair arguments in addition to the input
arguments in previous syntaxes. For example,
`loss`

= crossentropy(___,`Name=Value`

)`ClassificationMode="multilabel"`

computes the cross-entropy loss for a
multi-label classification task.

## Examples

### Cross-Entropy Loss for Single-Label Classification

Create an array of prediction scores for 12 observations over 10 classes.

```
numClasses = 10;
numObservations = 12;
Y = rand(numClasses,numObservations);
Y = dlarray(Y,"CB");
Y = softmax(Y);
```

View the size and format of the prediction scores.

size(Y)

`ans = `*1×2*
10 12

dims(Y)

ans = 'CB'

Create an array of targets encoded as one-hot vectors.

labels = randi(numClasses,[1 numObservations]); targets = onehotencode(labels,1,ClassNames=1:numClasses);

View the size of the targets.

size(targets)

`ans = `*1×2*
10 12

Compute the cross-entropy loss between the predictions and the targets.

loss = crossentropy(Y,targets)

loss = 1x1 dlarray 2.3343

### Cross-Entropy Loss for Multi-Label Classification

Create an array of prediction scores for 12 observations over 10 classes.

```
numClasses = 10;
numObservations = 12;
Y = rand(numClasses,numObservations);
Y = dlarray(Y,"CB");
```

View the size and format of the prediction scores.

size(Y)

`ans = `*1×2*
10 12

dims(Y)

ans = 'CB'

Create a random array of targets encoded as a numeric array of zeros and ones. Each observation can have multiple classes.

targets = rand(numClasses,numObservations) > 0.75; targets = single(targets);

View the size of the targets.

size(targets)

`ans = `*1×2*
10 12

Compute the cross-entropy loss between the predictions and the targets. To specify cross-entropy loss for multi-label classification, set the `ClassificationMode`

argument to `"multilabel"`

.

`loss = crossentropy(Y,targets,ClassificationMode="multilabel")`

loss = 1x1 single dlarray 9.8853

### Weighted Cross-Entropy Loss

Create an array of prediction scores for 12 observations over 10 classes.

```
numClasses = 10;
numObservations = 12;
Y = rand(numClasses,numObservations);
Y = dlarray(Y,"CB");
Y = softmax(Y);
```

View the size and format of the prediction scores.

size(Y)

`ans = `*1×2*
10 12

dims(Y)

ans = 'CB'

Create an array of targets encoded as one-hot vectors.

labels = randi(numClasses,[1 numObservations]); targets = onehotencode(labels,1,ClassNames=1:numClasses);

View the size of the targets.

size(targets)

`ans = `*1×2*
10 12

Compute the weighted cross-entropy loss between the predictions and the targets using a vector class weights. Specify a weights format of `"UC"`

(unspecified, channel) using the `WeightsFormat`

argument.

```
weights = rand(1,numClasses);
loss = crossentropy(Y,targets,weights,WeightsFormat="UC")
```

loss = 1x1 dlarray 1.1261

## Input Arguments

`Y`

— Predictions

`dlarray`

object | numeric array

Predictions, specified as a formatted or unformatted `dlarray`

object,
or a numeric array. When `Y`

is not a formatted
`dlarray`

, you must specify the dimension format using the
`DataFormat`

argument.

If `Y`

is a numeric array, `targets`

must be a
`dlarray`

object.

`targets`

— Target classification labels

`dlarray`

| numeric array

Target classification labels, specified as a formatted or unformatted
`dlarray`

or a numeric array.

Specify the targets as an array containing one-hot encoded labels with the same size
and format as `Y`

. For example, if `Y`

is a
`numObservations`

-by-`numClasses`

array, then
`targets(n,i)`

= 1 if observation `n`

belongs to
class `i`

`targets(n,i)`

= 0 otherwise.

If `targets`

is a formatted `dlarray`

, then its format must
be the same as the format of `Y`

, or the same as
`DataFormat`

if `Y`

is
unformatted.

If `targets`

is an unformatted `dlarray`

or a numeric array,
then the function applies the format of `Y`

or the value of
`DataFormat`

to `targets`

.

**Tip**

Formatted `dlarray`

objects automatically permute the dimensions of the
underlying data to have the order `"S"`

(spatial), `"C"`

(channel), `"B"`

(batch), `"T"`

(time), then
`"U"`

(unspecified). To ensure that the dimensions of
`Y`

and `targets`

are consistent, when
`Y`

is a formatted `dlarray`

, also specify
`targets`

as a formatted `dlarray`

.

`weights`

— Weights

`dlarray`

object | numeric array

Weights, specified as a `dlarray`

object or a numeric array.

To specify class weights, specify a vector with a `"C"`

(channel) dimension
with size matching the `"C"`

(channel) dimension of
`Y`

and a singleton `"U"`

(unspecified)
dimension. Specify the dimensions of the class weights by using a formatted
`dlarray`

object or by using the `WeightsFormat`

argument.

To specify observation weights, specify a vector with a `"B"`

(batch)
dimension with size matching the `"B"`

(batch) dimension of
`Y`

. Specify the `"B"`

(batch) dimension of the
class weights by using a formatted `dlarray`

object or by using the
`WeightsFormat`

argument.

To specify weights for each element of the input independently, specify the weights as an
array of the same size as `Y`

. In this case, if
`weights`

is not a formatted `dlarray`

object, then
the function uses the same format as `Y`

. Alternatively, specify the
weights format using the `WeightsFormat`

argument.

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

*
Before R2021a, use commas to separate each name and value, and enclose*
`Name`

*in quotes.*

**Example: **`ClassificationMode="multilabel",DataFormat="CB"`

evaluates the
cross-entropy loss for multi-label classification tasks and specifies the dimension order of
the input data as `"CB"`

`ClassificationMode`

— Type of classification task

`"single-label"`

(default) | `"multilabel"`

Type of classification task, specified as one of these values:

`"single-label"`

— Each observation is exclusively assigned one class label (single-label classification). The function computes the loss between the target value for the single category specified by`targets`

and the corresponding prediction in`Y`

, averaged over the number of observations.`"multilabel"`

— Each observation can be assigned more than one independent class label (multilabel classification). The function computes the sum of the loss between each category specified by`targets`

and the predictions in`Y`

for those categories, averaged over the number of observations. Cross-entropy loss for this type of classification task is also known as binary cross-entropy loss.

**Note**

To select the classification mode for binary classification, you must consider the final layer of the network:

If the final layer has an output size of one, such as with a sigmoid layer, use

`"multilabel"`

.If the final layer has an output size of two, such as with a softmax layer, use

`"single-label"`

.

`Mask`

— Mask indicating which elements to include for loss computation

`dlarray`

| logical array | numeric array

Mask indicating which elements to include for loss computation, specified as a
`dlarray`

object, a logical array, or a numeric array with the same
size as `Y`

.

The function includes and excludes elements of the input data for loss computation when the corresponding value in the mask is 1 and 0, respectively.

If `Mask`

is a formatted `dlarray`

object, then its
format must match that of `Y`

. If `Mask`

is not a
formatted `dlarray`

object, then the function uses the same format as
`Y`

.

If you specify the `DataFormat`

argument, then the function also
uses the specified format for the mask.

The size of each dimension of `Mask`

must match the size of the
corresponding dimension in `Y`

. The default value is a logical array
of ones.

**Tip**

Formatted `dlarray`

objects automatically permute the dimensions of the
underlying data to have this order: `"S"`

(spatial), `"C"`

(channel), `"B"`

(batch), `"T"`

(time), and
`"U"`

(unspecified). For example, `dlarray`

objects
automatically permute the dimensions of data with format `"TSCSBS"`

to have
format `"SSSCBT"`

.

To ensure that the dimensions of `Y`

and the mask are consistent, when
`Y`

is a formatted `dlarray`

, also specify the mask as
a formatted `dlarray`

.

`Reduction`

— Loss value array reduction mode

`"sum"`

(default) | `"none"`

Loss value array reduction mode, specified as `"sum"`

or
`"none"`

.

If the `Reduction`

argument is `"sum"`

, then the function
sums all elements in the array of loss values. In this case, the output
`loss`

is a scalar.

If the `Reduction`

argument is `"none"`

, then the
function does not reduce the array of loss values. In this case, the output
`loss`

is an unformatted `dlarray`

object
of the same size as `Y`

.

`NormalizationFactor`

— Divisor for normalizing reduced loss

`"batch-size"`

(default) | `"all-elements"`

| `"mask-included"`

| `"none"`

Divisor for normalizing the reduced loss when `Reduction`

is
`"sum"`

, specified as one of the following:

`"batch-size"`

— Normalize the loss by dividing it by the number of observations in`Y`

.`"all-elements"`

— Normalize the loss by dividing it by the number of elements of`Y`

.`"mask-included"`

— Normalize the loss by dividing the loss values by the product of the number of observations and the number of included elements specified by the mask for each observation independently. To use this option, you must specify a mask using the`Mask`

option.`"none"`

— Do not normalize the loss.

`DataFormat`

— Description of data dimensions

character vector | string scalar

Description of the data dimensions, specified as a character vector or string scalar.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

`"S"`

— Spatial`"C"`

— Channel`"B"`

— Batch`"T"`

— Time`"U"`

— Unspecified

For example, consider an array containing a batch of sequences where the first, second,
and third dimensions correspond to channels, observations, and time steps, respectively. You
can specify that this array has the format `"CBT"`

(channel, batch,
time).

You can specify multiple dimensions labeled `"S"`

or `"U"`

.
You can use the labels `"C"`

, `"B"`

, and
`"T"`

once each, at most. The software ignores singleton trailing
`"U"`

dimensions after the second dimension.

If the input data is not a formatted `dlarray`

object, then you must
specify the `DataFormat`

option.

For more information, see Deep Learning Data Formats.

**Data Types: **`char`

| `string`

`WeightsFormat`

— Description of dimensions of weights

character vector | string scalar

Description of the dimensions of the weights, specified as a character vector or string scalar.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

`"S"`

— Spatial`"C"`

— Channel`"B"`

— Batch`"T"`

— Time`"U"`

— Unspecified

For example, consider an array containing a batch of sequences where the first, second,
and third dimensions correspond to channels, observations, and time steps, respectively. You
can specify that this array has the format `"CBT"`

(channel, batch,
time).

You can specify multiple dimensions labeled `"S"`

or `"U"`

.
You can use the labels `"C"`

, `"B"`

, and
`"T"`

once each, at most. The software ignores singleton trailing
`"U"`

dimensions after the second dimension.

If `weights`

is a numeric vector and
`Y`

has two or more nonsingleton
dimensions, then you must specify the
`WeightsFormat`

option.

If `weights`

is not a vector, or
`weights`

and
`Y`

are both vectors, then the
default value of `WeightsFormat`

is the same
as the format of `Y`

.

For more information, see Deep Learning Data Formats.

**Data Types: **`char`

| `string`

## Output Arguments

`loss`

— Cross-entropy loss

`dlarray`

Cross-entropy loss, returned as an unformatted `dlarray`

. The
output `loss`

is an unformatted `dlarray`

with the
same underlying data type as the input `Y`

.

The size of `loss`

depends on the `Reduction`

argument.

## Algorithms

### Cross-Entropy Loss

For each element *Y _{j}* of the input, the

`crossentropy`

function computes the corresponding cross-entropy
element-wise loss values using the formula$${\text{loss}}_{j}=-\left({T}_{j}\text{ln}{Y}_{j}+(1-{T}_{j})\text{ln}(1-{Y}_{j})\right),$$

where *T _{j}* is the corresponding
target value to

*Y*.

_{j}To reduce the loss values to a scalar, the function then reduces the element-wise loss using the formula

$$\text{loss}=\frac{1}{N}{\displaystyle \sum _{j}{m}_{j}{w}_{j}{\text{loss}}_{j},}$$

where *N* is the normalization factor,
*m _{j}* is the mask value for element

*j*, and

*w*is the weight value for element

_{j}*j*.

If you do not opt to reduce the loss, then the function applies the mask and the weights to the loss values directly:

$${\text{loss}}_{j}^{*}={m}_{j}{w}_{j}{\text{loss}}_{j}$$

This table shows the loss formulations for different tasks.

Task | Description | Loss |
---|---|---|

Single-label classification | Cross-entropy loss for mutually exclusive classes. This is useful when observations must have a single label only. |
$$\text{loss}=-\frac{1}{N}{\displaystyle \sum _{n=1}^{N}{\displaystyle \sum}_{i=1}^{K}}{T}_{n,i}\text{ln}{Y}_{n,i},$$ where |

Multi-label classification | Cross-entropy loss for independent classes. This is useful when observations can have multiple labels. |
$$\text{loss}=-\frac{1}{N}{\displaystyle \sum _{n=1}^{N}{\displaystyle \sum _{i=1}^{K}\left({T}_{ni}\mathrm{ln}({Y}_{n,i})+(1-{T}_{n,i})\mathrm{ln}(1-{Y}_{n,i})\right)}},$$ where |

Single-label classification with weighted classes | Cross-entropy loss with class weights. This is useful for datasets with imbalanced classes. |
$$\text{loss}=-\frac{1}{N}{\displaystyle \sum _{n=1}^{N}{\displaystyle \sum}_{i=1}^{K}}{w}_{i}{T}_{n,i}\text{ln}{Y}_{n,i},$$ where i. |

Sequence-to-sequence classification | Cross-entropy loss with masked time-steps. This is useful for ignoring loss values that correspond to padded data. |
$$\text{loss}=-\frac{1}{N}{\displaystyle \sum _{n=1}^{N}{\displaystyle \sum}_{t=1}^{S}{m}_{n,t}{\displaystyle \sum}_{i=1}^{K}}{T}_{n,t,i}\text{ln}{Y}_{n,t,i},$$ where t of observation
n. |

### Deep Learning Array Formats

Most deep learning networks and functions operate on different dimensions of the input data in different ways.

For example, an LSTM operation iterates over the time dimension of the input data, and a batch normalization operation normalizes over the batch dimension of the input data.

To provide input data with labeled dimensions or input data with additional layout information, you can use *data formats*.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

`"S"`

— Spatial`"C"`

— Channel`"B"`

— Batch`"T"`

— Time`"U"`

— Unspecified

For example, consider an array containing a batch of sequences where the first, second,
and third dimensions correspond to channels, observations, and time steps, respectively. You
can specify that this array has the format `"CBT"`

(channel, batch,
time).

To create formatted input data, create a `dlarray`

object and specify the format using the second argument.

To provide additional layout information with unformatted data, specify the formats using the `DataFormat`

and `WeightsFormat`

arguments.

For more information, see Deep Learning Data Formats.

## Extended Capabilities

### GPU Arrays

Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The `crossentropy`

function
supports GPU array input with these usage notes and limitations:

When at least one of these input arguments is a

`gpuArray`

or a`dlarray`

with underlying data of type`gpuArray`

, this function runs on the GPU:`Y`

`targets`

`weights`

`'Mask'`

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

## Version History

**Introduced in R2019b**

### R2023b: `TargetCategories`

is not recommended

`TargetCategories`

is not recommended. Use
`ClassificationMode`

instead. To update your code, replace all
instances of `TargetCategories="exclusive"`

with
`ClassificationMode="single-label"`

and all instances of
`TargetCategories="independent"`

with
`ClassificationMode="multilabel"`

. There are no differences between the
properties that require additional updates to your code. The default behavior of the
`crossentropy`

function remains the same.

## See Also

`dlarray`

| `dlgradient`

| `dlfeval`

| `indexcrossentropy`

| `softmax`

| `sigmoid`

| `huber`

| `l1loss`

| `l2loss`

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)