# rlDiscreteCategoricalActor

Stochastic categorical actor with a discrete action space for reinforcement learning agents

*Since R2022a*

## Description

This object implements a function approximator to be used as a stochastic actor
within a reinforcement learning agent with a discrete action space. A discrete categorical
actor takes an environment observation as input and returns as output a random action sampled
from a categorical (also known as Multinoulli) probability distribution, thereby implementing
a parametrized stochastic policy. After you create an
`rlDiscreteCategoricalActor`

object, use it to create a suitable agent, such
as `rlACAgent`

or `rlPGAgent`

. For more
information on creating actors and critics, see Create Policies and Value Functions.

## Creation

### Syntax

### Description

creates a stochastic actor with a discrete action space, using the deep neural network
`actor`

= rlDiscreteCategoricalActor(`net`

,`observationInfo`

,`actionInfo`

)`net`

as underlying approximation model. For this actor,
`actionInfo`

must specify a discrete action space. The network
input layers are automatically associated with the environment observation channels
according to the dimension specifications in `observationInfo`

. The
network must have a single output layer with as many elements as the number of possible
discrete actions, as specified in `actionInfo`

. This function sets
the `ObservationInfo`

and `ActionInfo`

properties
of `actor`

to the inputs `observationInfo`

and
`actionInfo`

, respectively.

creates a discrete space stochastic actor using a custom basis function as underlying
approximation model. The first input argument is a two-element cell array whose first
element is the handle `actor`

= rlDiscreteCategoricalActor({`basisFcn`

,`W0`

},`observationInfo`

,`actionInfo`

)`basisFcn`

to a custom basis function and whose
second element is the initial weight matrix `W0`

. This function sets
the `ObservationInfo`

and `ActionInfo`

properties
of `actor`

to the inputs `observationInfo`

and
`actionInfo`

, respectively.

specifies names of the observation input layers (for network-based approximators) or
sets the `actor`

= rlDiscreteCategoricalActor(___,`Name=Value`

)`UseDevice`

property using one or more name-value arguments.
Specifying the input layer names allows you explicitly associate the layers of your
network approximator with specific environment channels. For all types of approximators,
you can specify the device where computations for `actor`

are
executed, for example `UseDevice="gpu"`

.

### Input Arguments

`net`

— Deep neural network

array of `Layer`

objects | `layerGraph`

object | `DAGNetwork`

object | `SeriesNetwork`

object | `dlNetwork`

object (preferred)

Deep neural network used as the underlying approximator within the actor, specified as one of the following:

Array of

`Layer`

objects`layerGraph`

object`DAGNetwork`

object`SeriesNetwork`

object`dlnetwork`

object

**Note**

Among the different network representation options, `dlnetwork`

is preferred, since it
has built-in validation checks and supports automatic differentiation. If you pass
another network object as an input argument, it is internally converted to a
`dlnetwork`

object. However, best practice is to convert other
representations to `dlnetwork`

explicitly *before*
using it to create a critic or an actor for a reinforcement learning agent. You can
do so using `dlnet=dlnetwork(net)`

, where `net`

is
any Deep Learning Toolbox™ neural network object. The resulting `dlnet`

is the
`dlnetwork`

object that you use for your critic or actor. This
practice allows a greater level of insight and control for cases in which the
conversion is not straightforward and might require additional
specifications.

The network must have as many input layers as the number of environment
observation channels (with each input layer receiving input from an observation
channel), and a single output layer with as many elements as the number of possible
discrete actions. Since the actor must return the probability of executing each
possible action, the software automatically adds a `softmaxLayer`

as a final output layer if you do not specify it explicitly. When computing the
action, the actor then randomly samples the distribution to return an action.

`rlDiscreteCategoricalActor`

objects support recurrent deep neural
networks. For an example, see Create Discrete Categorical Actor from Deep Recurrent Neural Network.

The learnable parameters of the actor are the weights of the deep neural network. For a list of deep neural network layers, see List of Deep Learning Layers. For more information on creating deep neural networks for reinforcement learning, see Create Policies and Value Functions.

`basisFcn`

— Custom basis function

function handle

Custom basis function, specified as a function handle to a user-defined MATLAB
function. The user defined function can either be an anonymous function or a function
on the MATLAB path. The number of the action to be taken based on the current
observation, which is the output of the actor, is randomly sampled from a categorical
distribution with probabilities `p = softmax(W'*B)`

, where
`W`

is a weight matrix containing the learnable parameters and
`B`

is the column vector returned by the custom basis function.
Each element of `p`

represents the probability of executing the
corresponding action from the observed state.

Your basis function must have the following signature.

B = myBasisFunction(obs1,obs2,...,obsN)

Here, `obs1`

to `obsN`

are inputs in the same
order and with the same data type and dimensions as the environment observation
channels defined in `observationInfo`

.

**Example: **```
@(obs1,obs2,obs3) [obs3(2)*obs1(1)^2;
abs(obs2(5)+obs3(1))]
```

`W0`

— Initial value of the basis function weights

matrix

Initial value of the basis function weights `W`

, specified as a
matrix having as many rows as the length of the vector returned by the basis function
and as many columns as the dimension of the action space.

**Name-Value Arguments**

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

**Example: **`UseDevice="gpu"`

`ObservationInputNames`

— Network input layers names corresponding to the environment observation channels

string array | cell array of strings | cell array of character vectors

Network input layers names corresponding to the environment observation channels,
specified as a string array or a cell array of strings or character vectors. The
function assigns, in sequential order, each environment observation channel specified in
`observationInfo`

to each layer whose name is specified in the
array assigned to this argument. Therefore, the specified network input layers, ordered
as indicated in this argument, must have the same data type and dimensions as the
observation channels, as ordered in `observationInfo`

.

This name-value argument is supported only when the approximation model is a deep neural network.

**Example: **`ObservationInputNames={"obsInLyr1_airspeed","obsInLyr2_altitude"}`

## Properties

`ObservationInfo`

— Observation specifications

`rlFiniteSetSpec`

object | `rlNumericSpec`

object | array

Observation specifications, specified as an `rlFiniteSetSpec`

or `rlNumericSpec`

object or an array containing a mix of such objects. Each element in the array defines
the properties of an environment observation channel, such as its dimensions, data type,
and name.

When you create the approximator object, the constructor function sets the
`ObservationInfo`

property to the input argument
`observationInfo`

.

You can extract `observationInfo`

from an existing environment,
function approximator, or agent using `getObservationInfo`

. You can also construct the specifications manually
using `rlFiniteSetSpec`

or `rlNumericSpec`

.

**Example: **```
[rlNumericSpec([2 1])
rlFiniteSetSpec([3,5,7])]
```

`ActionInfo`

— Action specifications

`rlFiniteSetSpec`

object

Action specifications, specified as an `rlNumericSpec`

object. This object defines the properties of the environment action channel, such as
its dimensions, data type, and name.

**Note**

Only one action channel is allowed.

When you create the approximator object, the constructor function sets the
`ActionInfo`

property to the input argument
`actionInfo`

.

You can extract `ActionInfo`

from an existing environment,
approximator object, or agent using `getActionInfo`

. You can also construct the specification manually using
`rlFiniteSetSpec`

.

**Example: **`rlNumericSpec([2 1])`

`Normalization`

— Normalization method

`"none"`

(default) | string array

Normalization method, returned as an array in which each element (one for each input
channel defined in the `observationInfo`

and
`actionInfo`

properties, in that order) is one of the following
values:

`"none"`

— Do not normalize the input of the function approximator object.`"rescale-zero-one"`

— Normalize the input by rescaling it to the interval between 0 and 1. The normalized input*Y*is (*U*–`Min`

)./(`UpperLimit`

–`LowerLimit`

), where*U*is the nonnormalized input. Note that nonnormalized input values lower than`LowerLimit`

result in normalized values lower than 0. Similarly, nonnormalized input values higher than`UpperLimit`

result in normalized values higher than 1. Here,`UpperLimit`

and`LowerLimit`

are the corresponding properties defined in the specification object of the input channel.`"rescale-symmetric"`

— Normalize the input by rescaling it to the interval between –1 and 1. The normalized input*Y*is 2(*U*–`LowerLimit`

)./(`UpperLimit`

–`LowerLimit`

) – 1, where*U*is the nonnormalized input. Note that nonnormalized input values lower than`LowerLimit`

result in normalized values lower than –1. Similarly, nonnormalized input values higher than`UpperLimit`

result in normalized values higher than 1. Here,`UpperLimit`

and`LowerLimit`

are the corresponding properties defined in the specification object of the input channel.

**Note**

When you specify the `Normalization`

property of
`rlAgentInitializationOptions`

, normalization is applied only to
the approximator input channels corresponding to `rlNumericSpec`

specification objects in which both the
`UpperLimit`

and `LowerLimit`

properties
are defined. After you create the agent, you can use `setNormalizer`

to assign normalizers that use any normalization
method. For more information on normalizer objects, see `rlNormalizer`

.

**Example: **`"rescale-symmetric"`

`UseDevice`

— Computation device used for training and simulation

`"cpu"`

(default) | `"gpu"`

Computation device used to perform operations such as gradient computation, parameter
update and prediction during training and simulation, specified as either
`"cpu"`

or `"gpu"`

.

The `"gpu"`

option requires both Parallel Computing Toolbox™ software and a CUDA^{®} enabled NVIDIA^{®} GPU. For more information on supported GPUs see GPU Computing Requirements (Parallel Computing Toolbox).

You can use `gpuDevice`

(Parallel Computing Toolbox) to query or select a local GPU device to be
used with MATLAB^{®}.

**Note**

Training or simulating an agent on a GPU involves device-specific numerical round-off errors. These errors can produce different results compared to performing the same operations using a CPU.

To speed up training by using parallel processing over multiple cores, you do not need
to use this argument. Instead, when training your agent, use an `rlTrainingOptions`

object in which the `UseParallel`

option is set to `true`

. For more information about training using
multicore processors and GPUs for training, see Train Agents Using Parallel Computing and GPUs.

**Example: **`"gpu"`

`Learnables`

— Learnable parameters of the approximator object

cell array of `dlarray`

objects

Learnable parameters of the approximation object, specified as a cell array of
`dlarray`

objects. This property contains the learnable parameters of
the approximation model used by the approximator object.

**Example: **`{dlarray(rand(256,4)),dlarray(rand(256,1))}`

`State`

— State of the approximator object

cell array of `dlarray`

objects

State of the approximation object, specified as a cell array of
`dlarray`

objects. For `dlnetwork`

-based models, this
property contains the `Value`

column of the
`State`

property table of the `dlnetwork`

model.
The elements of the cell array are the state of the recurrent neural network used in the
approximator (if any), as well as the state for the batch normalization layer (if
used).

For model types that are not based on a `dlnetwork`

object, this
property is an empty cell array, since these model types do not support states.

**Example: **`{dlarray(rand(256,1)),dlarray(rand(256,1))}`

## Object Functions

`rlACAgent` | Actor-critic (AC) reinforcement learning agent |

`rlPGAgent` | Policy gradient (PG) reinforcement learning agent |

`rlPPOAgent` | Proximal policy optimization (PPO) reinforcement learning agent |

`getAction` | Obtain action from agent, actor, or policy object given environment observations |

`evaluate` | Evaluate function approximator object given observation (or observation-action) input data |

`gradient` | (Not recommended) Evaluate gradient of function approximator object given observation and action input data |

`accelerate` | (Not recommended) Option to accelerate computation of gradient for approximator object based on neural network |

`getLearnableParameters` | Obtain learnable parameter values from agent, function approximator, or policy object |

`setLearnableParameters` | Set learnable parameter values of agent, function approximator, or policy object |

`setModel` | Set approximation model in function approximator object |

`getModel` | Get approximation model from function approximator object |

## Examples

### Create Discrete Categorical Actor from Deep Neural Network

Create an observation specification object (or alternatively use `getObservationInfo`

to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that there is a single observation channel that carries a column vector containing four doubles.

obsInfo = rlNumericSpec([4 1]);

Create an action specification object (or alternatively use `getActionInfo`

to extract the specification object from an environment). For this example, define the action space as consisting of three actions, labeled -10, 0, and 10.

actInfo = rlFiniteSetSpec([-10 0 10]);

A discrete categorical actor implements a parametrized stochastic policy for a discrete action space. This actor takes an observation as input and returns as output a random action sampled (among the finite number of possible actions) from a categorical probability distribution.

To model the probability distribution within the actor, use a neural network with one input layer (which receives the content of the environment observation channel, as specified by `obsInfo`

) and one output layer.

The output layer must return a vector of probabilities of taking each possible action, as specified by `actInfo`

. Therefore, each element of the output vector must be between 0 and 1. Using softmax as the output layer enforces this requirement (the software automatically adds a `softmaxLayer`

as a final output layer if you do not specify it explicitly).

Note that `prod(obsInfo.Dimension)`

returns the total number of dimensions of the observation space regardless of whether the observation space is arranged as a column vector, row vector, or matrix, while `numel(actInfo.Dimension)`

returns the number of elements of the discrete action space.

Define the network as an array of layer objects.

net = [ featureInputLayer(prod(obsInfo.Dimension)) fullyConnectedLayer(16) reluLayer fullyConnectedLayer(16) reluLayer fullyConnectedLayer(numel(actInfo.Elements)) ];

Convert the network to a `dlnetwork`

object and display the number of learnable parameters.

net = dlnetwork(net); summary(net)

Initialized: true Number of learnables: 403 Inputs: 1 'input' 4 features

Create the actor with `rlDiscreteCategoricalActor`

, using the network, the observations and action specification objects. When the network has multiple input layers, they are automatically associated with the environment observation channels according to the dimension specifications in `obsInfo`

.

actor = rlDiscreteCategoricalActor(net,obsInfo,actInfo);

To check your actor, use `getAction`

to return an action from a random observation vector, given the current network weights.

act = getAction(actor,{rand(obsInfo.Dimension)}); act

`act = `*1x1 cell array*
{[-10]}

To return the probability distribution of the actions, given an observation, use `evaluate`

.

prb = evaluate(actor,{rand(obsInfo.Dimension)}); prb{1}

`ans = `*3x1 single column vector*
0.3736
0.1875
0.4389

You can now use the actor (along with a critic) to create an agent for the environment described by the given observation specification object. Examples of agents that can work with a continuous observation space, a discrete action space, and use a discrete categorical actor, are `rlACAgent`

, `rlPGAgent`

, `rlPPOAgent`

, and `rlTRPOAgent`

.

For more information on creating approximator objects such as actors and critics, see Create Policies and Value Functions.

### Create Discrete Categorical Actor from Deep Neural Network Specifying Input Layer Name

Create an observation specification object (or alternatively use `getObservationInfo`

to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that there is a single observation channel that carries a column vector containing four doubles.

obsInfo = rlNumericSpec([4 1]);

Create an action specification object (or alternatively use `getActionInfo`

to extract the specification object from an environment). For this example, define the action space as consisting of three actions, labeled -10, 0, and 10.

actInfo = rlFiniteSetSpec([-10 0 10]);

A discrete categorical actor implements a parametrized stochastic policy for a discrete action space. This actor takes an observation as input and returns as output a random action sampled (among the finite number of possible actions) from a categorical probability distribution.

To model the probability distribution within the actor, use a neural network with one input layer (which receives the content of the environment observation channel, as specified by `obsInfo`

) and one output layer. The output layer must return a vector of probabilities for each possible action, as specified by `actInfo`

. Therefore, each element of the output vector must be between 0 and 1. Using softmax as the output layer enforces this requirement (the software automatically adds a `softmaxLayer`

as a final output layer if you do not specify it explicitly).

Note that `prod(obsInfo.Dimension)`

returns the total number of dimensions of the observation space regardless of whether the observation space is arranged as a column vector, row vector, or matrix, while `numel(actInfo.Dimension)`

returns the number of elements of the discrete action space.

Define the network as an array of layer objects. Specify a name for the input layer, so you can later explicitly associate it with the observation channel.

net = [ featureInputLayer( ... prod(obsInfo.Dimension), ... Name="netObsIn") fullyConnectedLayer(32) reluLayer fullyConnectedLayer(numel(actInfo.Elements)) softmaxLayer(Name="actionProb") ];

Convert the network to a `dlnetwork`

object and display the number of learnable parameters (weights).

net = dlnetwork(net); summary(net)

Initialized: true Number of learnables: 259 Inputs: 1 'netObsIn' 4 features

Create the actor with `rlDiscreteCategoricalActor`

, using the network, the observations and action specification objects, and the name of the network input layer.

actor = rlDiscreteCategoricalActor(net, ... obsInfo,actInfo,... Observation="netObsIn");

To validate your actor, use `getAction`

to return an action from a random observation, given the current network weights.

act = getAction(actor,{rand(obsInfo.Dimension)}); act{1}

ans = -10

To return the probability distribution of the possible actions as a function of a random observation, and given the current network weights, use `evaluate`

.

prb = evaluate(actor,{rand(obsInfo.Dimension)})

`prb = `*1x1 cell array*
{3x1 single}

prb{1}

`ans = `*3x1 single column vector*
0.3038
0.2658
0.4304

You can now use the actor (along with a critic) to create an agent for the environment described by the given observation specification object. Examples of agents that can work with a continuous observation space, a discrete action space, and use a discrete categorical actor, are `rlACAgent`

, `rlPGAgent`

, `rlPPOAgent`

, and `rlTRPOAgent`

.

For more information on creating approximator objects such as actors and critics, see Create Policies and Value Functions.

### Create Discrete Categorical Actor from Custom Basis Function

Create an observation specification object (or alternatively use `getObservationInfo`

to extract the specification object from an environment). For this example, define the observation space as consisting of two channels, the first carrying a two-dimensional vector in a continuous space, the second carrying a two dimensional vector that can assume only three values, -[1 2], [0 1], and [1 3]. Therefore a single observation consists of two two-dimensional vectors, one continuous, the other discrete.

obsInfo = [rlNumericSpec([2 1]) rlFiniteSetSpec({-[1 2],[0 1],[1 3]})];

Create a *discrete action space* specification object (or alternatively use `getActionInfo`

to extract the specification object from an environment with a discrete action space). For this example, define the action space as a finite set consisting of three possible actions, labeled 7, 5, and 3.

actInfo = rlFiniteSetSpec([7 5 3]);

A discrete categorical actor implements a parametrized stochastic policy for a discrete action space. To model the parametrized probability distribution within the actor, use a custom basis function with two inputs (which receive the content of the environment observation channels, as specified by `obsInfo`

).

Create a function that returns a vector of four elements, depending on a given observation.

myBasisFcn = @(obsC,obsD) [obsC(1)^2-obsD(2)^2; obsC(2)^2-obsD(1)^2; exp(obsC(2))+abs(obsD(1)); exp(obsC(1))+abs(obsD(2))];

The actor samples the action randomly, according to the probability distribution `softmax(W'*myBasisFcn(obsC,obsD))`

. Here, `W`

is a weight matrix, containing the learnable parameters, which must have as many rows as the length of the basis function output (for this example, four), and as many columns as the number of possible actions (for this example, three).

Define an initial parameter matrix.

W0 = rand(4,3);

Create the actor. The first argument is a two-element cell containing both the handle to the custom function and the initial parameter matrix. The second and third arguments are, respectively, the observation and action specification objects.

actor = rlDiscreteCategoricalActor({myBasisFcn,W0},obsInfo,actInfo);

To check your actor use `getAction`

to return one of the three possible actions, depending on a given random observation and on the current parameter matrix.

getAction(actor,{rand(2,1),[1 1]})

`ans = `*1x1 cell array*
{[3]}

Note that the discrete set constraint is not enforced.

getAction(actor,{rand(2,1),[0.5 -0.7]})

`ans = `*1x1 cell array*
{[3]}

To return the probability of each action as a function of a random observation (and given the current weights), use `evaluate`

.

prb = evaluate(actor, ... {rand(obsInfo(1).Dimension), ... rand(obsInfo(2).Dimension)})

`prb = `*1x1 cell array*
{3x1 single}

prb{1}

`ans = `*3x1 single column vector*
0.3434
0.2074
0.4492

You can now use the actor (along with a critic) to create an agent for the environment described by the given observation specification object. Examples of agents that can work with a mixed observation space, a discrete action space, and use a discrete categorical actor, are `rlACAgent`

, `rlPGAgent`

, and `rlPPOAgent`

. `rlTRPOAgent`

does not support actors or critics that use custom basis functions.

For more information on creating approximator objects such as actors and critics, see Create Policies and Value Functions.

### Create Discrete Categorical Actor from Deep Recurrent Neural Network

This example shows you how to create a stochastic actor with a discrete action space using a recurrent neural network.

For this example, use the same environment used in Train PG Agent to Balance Cart-Pole System. Load the environment and obtain the observation and action specifications.

```
env = rlPredefinedEnv("CartPole-Discrete");
obsInfo = getObservationInfo(env)
```

obsInfo = rlNumericSpec with properties: LowerLimit: -Inf UpperLimit: Inf Name: "CartPole States" Description: "x, dx, theta, dtheta" Dimension: [4 1] DataType: "double"

actInfo = getActionInfo(env)

actInfo = rlFiniteSetSpec with properties: Elements: [-10 10] Name: "CartPole Action" Description: [0×0 string] Dimension: [1 1] DataType: "double"

A discrete categorical actor implements a parametrized stochastic policy for a discrete action space. This actor takes an observation as input and returns as output a random action sampled (among the finite number of possible actions) from a categorical probability distribution.

To model the probability distribution within the actor, use a neural network with one input layer (which receives the content of the environment observation channel, as specified by `obsInfo`

) and one output layer.

The output layer must return a vector of probabilities of taking each possible action, as specified by `actInfo`

. Therefore, each element of the output vector must be between 0 and 1. Using softmax as the output layer enforces this requirement (the software automatically adds a `softmaxLayer`

as a final output layer if you do not specify it explicitly).

Note that `prod(obsInfo.Dimension)`

returns the total number of dimensions of the observation space regardless of whether the observation space is arranged as a column vector, row vector, or matrix, while `numel(actInfo.Dimension)`

returns the number of elements of the discrete action space.

Define the network as an array of layer objects. To create a recurrent network, use a `sequenceInputLayer`

as the input layer and include at least one `lstmLayer`

.

net = [ sequenceInputLayer( ... prod(obsInfo.Dimension), ... Name="netObsIn") fullyConnectedLayer(8) reluLayer lstmLayer(8) fullyConnectedLayer( ... numel(actInfo.Elements)) ];

Convert the network to a `dlnetwork`

object and display the number of learnable parameters (weights).

net = dlnetwork(net); summary(net)

Initialized: true Number of learnables: 602 Inputs: 1 'netObsIn' Sequence input with 4 dimensions

Create a discrete categorical actor using the network, the environment specifications, and the name of the network input layer to be associated with the observation channel.

actor = rlDiscreteCategoricalActor(net, ... obsInfo,actInfo,... Observation="netObsIn");

To check your actor use `getAction`

to return one of the two possible actions, depending on a given random observation and on the current network weights.

act = getAction(actor,{rand(obsInfo.Dimension)}); act{1}

ans = -10

To return the probability of each of the two possible action, use `evaluate`

. Note that the type of the returned numbers is `single`

, not `double`

.

prb = evaluate(actor,{rand(obsInfo.Dimension)}); prb{1}

`ans = `*2×1 single column vector*
0.4549
0.5451

You can use dot notation to extract and set the current state of the recurrent neural network in the actor.

actor.State

`ans=`*2×1 cell array*
{8×1 dlarray}
{8×1 dlarray}

actor.State = { dlarray(-0.1*rand(8,1)) dlarray(0.1*rand(8,1)) };

To evaluate the actor using sequential observations, use the sequence length (time) dimension. For example, obtain actions for 5 independent sequences each one consisting of `9`

sequential observations.

```
[action,state] = getAction(actor, ...
{rand([obsInfo.Dimension 5 9])});
```

Display the action corresponding to the seventh element of the observation sequence in the fourth sequence.

action = action{1}; action(1,1,4,7)

ans = 10

Display the updated state of the recurrent neural network.

state

`state=`*2×1 cell array*
{8×5 single}
{8×5 single}

You can now use the actor (along with a critic) to create an agent for the environment described by the given observation specification object. Examples of agents that can work with a continuous observation space, a discrete action space, and use a discrete categorical actor, are `rlACAgent`

, `rlPGAgent`

, and `rlPPOAgent`

. `rlTRPOAgent`

does not support actors or critics with recurrent neural networks.

For more information on input and output format for recurrent neural networks, see the Algorithms section of `lstmLayer`

. For more information on creating approximator objects such as actors and critics, see Create Policies and Value Functions.

## Version History

**Introduced in R2022a**

## See Also

### Functions

### Objects

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)