Main Content

Policies and Value Functions

Define policy and value function approximators, such as actors and critics

During training, most agents rely on an actor, a critic, or both. The actor learns the policy that selects the action to take. The critic learns the value (or Q-value) function that estimates the value of a policy.

Reinforcement Learning Toolbox™ provides function approximator objects for actors and critics, and policy objects for custom loops and deployment. Approximator objects can internally use different approximation models, such as deep neural networks, linear basis functions, or look-up tables.

For an introduction to policies, value functions, actors and critics, see Create Policies and Value Functions.

Blocks

PolicyReinforcement learning policy (Since R2022b)

Functions

expand all

rlTableValue table or Q table (Since R2019a)
rlValueFunctionValue function approximator object for reinforcement learning agents (Since R2022a)
rlQValueFunction Q-Value function approximator object for reinforcement learning agents (Since R2022a)
rlVectorQValueFunction Vector Q-value function approximator for reinforcement learning agents (Since R2022a)
rlContinuousDeterministicActor Deterministic actor with a continuous action space for reinforcement learning agents (Since R2022a)
rlDiscreteCategoricalActorStochastic categorical actor with a discrete action space for reinforcement learning agents (Since R2022a)
rlContinuousGaussianActorStochastic Gaussian actor with a continuous action space for reinforcement learning agents (Since R2022a)
getActorExtract actor from reinforcement learning agent (Since R2019a)
setActorSet actor of reinforcement learning agent (Since R2019a)
getCriticExtract critic from reinforcement learning agent (Since R2019a)
setCriticSet critic of reinforcement learning agent (Since R2019a)
getModelGet approximation model from function approximator object (Since R2020b)
setModelSet approximation model in function approximator object (Since R2020b)
getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object (Since R2019a)
setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object (Since R2019a)
rlOptimizerOptionsOptimization options for actors and critics (Since R2022a)
getGreedyPolicyExtract greedy (deterministic) policy object from agent (Since R2022a)
getExplorationPolicyExtract exploratory (stochastic) policy object from agent (Since R2023a)
rlMaxQPolicyPolicy object to generate discrete max-Q actions for custom training loops and application deployment (Since R2022a)
rlEpsilonGreedyPolicyPolicy object to generate discrete epsilon-greedy actions for custom training loops (Since R2022a)
rlDeterministicActorPolicyPolicy object to generate continuous deterministic actions for custom training loops and application deployment (Since R2022a)
rlAdditiveNoisePolicyPolicy object to generate continuous noisy actions for custom training loops (Since R2022a)
rlStochasticActorPolicyPolicy object to generate stochastic actions for custom training loops and application deployment (Since R2022a)
getActionObtain action from agent, actor, or policy object given environment observations (Since R2020a)
getValueObtain estimated value from a critic given environment observations and actions (Since R2020a)
getMaxQValueObtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations (Since R2020a)
evaluateEvaluate function approximator object given observation (or observation-action) input data (Since R2022a)
gradientEvaluate gradient of function approximator object given observation and action input data (Since R2022a)
accelerateOption to accelerate computation of gradient for approximator object based on neural network (Since R2022a)
quadraticLayerQuadratic layer for actor or critic network (Since R2019a)
scalingLayerScaling layer for actor or critic network (Since R2019a)
softplusLayerSoftplus layer for actor or critic network (Since R2020a)
featureInputLayerFeature input layer (Since R2020b)
reluLayerRectified Linear Unit (ReLU) layer
tanhLayerHyperbolic tangent (tanh) layer (Since R2019a)
fullyConnectedLayerFully connected layer
lstmLayerLong short-term memory (LSTM) layer for recurrent neural network (RNN)
softmaxLayerSoftmax layer

Topics