Main Content

getAction

Obtain action from agent, actor, or policy object given environment observations

Description

Agent

action = getAction(agent,obs) returns the action generated from the policy of a reinforcement learning agent, given environment observations. If agent contains internal states, they are updated.

example

[action,agent] = getAction(agent,obs) also returns the updated agent as an output argument.

Actor

action = getAction(actor,obs) returns the action generated from the policy represented by the actor actor, given environment observations obs.

example

[action,nextState] = getAction(actor,obs) also returns the updated state of the actor when it uses a recurrent neural network.

Policy

action = getAction(policy,obs) returns the action generated from the policy object policy, given environment observations.

example

[action,updatedPolicy] = getAction(policy,obs) also returns the updated policy as an output argument (any internal state of the policy, if used, is updated).

Use Forward

___ = getAction(___,UseForward=useForward) allows you to explicitly call a forward pass when computing gradients.

Examples

collapse all

Create an environment with a discrete action space, and obtain its observation and action specifications. For this example, load the environment used in the example Create DQN Agent Using Deep Network Designer and Train Using Image Observations.

% load predefined environment
env = rlPredefinedEnv("SimplePendulumWithImage-Discrete");

Obtain the observation and action specifications for this environment.

obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Create a TRPO agent from the environment observation and action specifications.

agent = rlTRPOAgent(obsInfo,actInfo);

Use getAction to return the action from a random observation.

getAction(agent, ...
    {rand(obsInfo(1).Dimension), ...
     rand(obsInfo(2).Dimension)})
ans = 1×1 cell array
    {[1]}

You can also obtain actions for a batch of observations. For example, obtain actions for a batch of 10 observations.

actBatch = getAction(agent, ...
    {rand([obsInfo(1).Dimension 10]), ...
     rand([obsInfo(2).Dimension 10])});
size(actBatch{1})
ans = 1×3

     1     1    10

actBatch{1}(1,1,7)
ans = 
-2

actBatch contains one action for each observation in the batch.

Create observation and action information. You can also obtain these specifications from an environment.

obsinfo = rlNumericSpec([4 1]);
actinfo = rlNumericSpec([2 1]);

Create a deep neural network for the actor.

net = [featureInputLayer(obsinfo.Dimension(1), ...
           "Normalization","none","Name","state")
       fullyConnectedLayer(10,"Name","fc1")
       reluLayer("Name","relu1")
       fullyConnectedLayer(20,"Name","CriticStateFC2")
       fullyConnectedLayer(actinfo.Dimension(1),"Name","fc2")
       tanhLayer("Name","tanh1")];
net = dlnetwork(net);

Create a deterministic actor representation for the network.

actor = rlContinuousDeterministicActor(net, ...
    obsinfo,actinfo, ...
    ObservationInputNames={"state"});

Obtain an action from this actor for a random batch of 20 observations.

act = getAction(actor,{rand(4,1,10)})
act = 1×1 cell array
    {2×1×10 single}

act is a single cell array that contains the two computed actions for all 10 observations in the batch.

act{1}(:,1,7)
ans = 2×1 single column vector

    0.2643
   -0.2934

Create observation and action specification objects. For this example, define the observation and action spaces as continuous four- and two-dimensional spaces, respectively.

obsInfo = rlNumericSpec([4 1]);
actInfo = rlNumericSpec([2 1]);

Alternatively, you can use the getObservationInfo and getActionInfo functions to extract the specification objects from an environment.

Create a continuous deterministic actor. This actor must accept an observation as input and return an action as output.

To approximate the policy function within the actor, use a recurrent deep neural network model. Define the network as an array of layer objects, and get the dimension of the observation and action spaces from the environment specification objects. To create a recurrent network, use a sequenceInputLayer as the input layer (with size equal to the number of dimensions of the observation channel) and include at least one lstmLayer.

layers = [ 
    sequenceInputLayer(obsInfo.Dimension(1))
    lstmLayer(2)
    reluLayer
    fullyConnectedLayer(actInfo.Dimension(1)) 
    ];

Convert the network to a dlnetwork object and display the number of weights.

model = dlnetwork(layers);
summary(model)
   Initialized: true

   Number of learnables: 62

   Inputs:
      1   'sequenceinput'   Sequence input with 4 channels

Create the actor using model, and the observation and action specifications.

actor = rlContinuousDeterministicActor(model,obsInfo,actInfo)
actor = 
  rlContinuousDeterministicActor with properties:

    ObservationInfo: [1×1 rl.util.rlNumericSpec]
         ActionInfo: [1×1 rl.util.rlNumericSpec]
      Normalization: "none"
          UseDevice: "cpu"
         Learnables: {5×1 cell}
              State: {2×1 cell}

Check the actor with a random observation input.

act = getAction(actor,{rand(obsInfo.Dimension)});
act{1}
ans = 2×1 single column vector

    0.0568
    0.0691

Create an additive noise policy object from actor.

policy = rlAdditiveNoisePolicy(actor)
policy = 
  rlAdditiveNoisePolicy with properties:

               Actor: [1×1 rl.function.rlContinuousDeterministicActor]
           NoiseType: "gaussian"
        NoiseOptions: [1×1 rl.option.GaussianActionNoise]
    EnableNoiseDecay: 1
       Normalization: "none"
      UseNoisyAction: 1
     ObservationInfo: [1×1 rl.util.rlNumericSpec]
          ActionInfo: [1×1 rl.util.rlNumericSpec]
          SampleTime: -1

Use dot notation to set the standard deviation decay rate.

policy.NoiseOptions.StandardDeviationDecayRate = 0.9;

Use getAction to generate an action from the policy, given a random observation input.

act = getAction(policy,{rand(obsInfo.Dimension)});
act{1}
ans = 2×1

    0.5922
   -0.3745

Display the state of the recurrent neural network in the policy object.

xNN = policy.Actor.State;
xNN{1}
ans = 
  2×1 single dlarray

     0
     0

Use getAction to also return the updated policy as a second argument.

[act, updatedPolicy] = getAction(policy,{rand(obsInfo.Dimension)});

Display the state of the recurrent neural network in the updated policy object.

xpNN = updatedPolicy.Actor.State;
xpNN{1}
ans = 
  2×1 single dlarray

    0.3327
   -0.2479

As expected, the state is updated.

Input Arguments

collapse all

Agent, specified as one of the following reinforcement learning agent objects:

Note

agent is a handle object, so a function that does not return it as output argument, such as train, can still update it. For more information about handle objects, see Handle Object Behavior.

For more information on reinforcement learning agents, see Reinforcement Learning Agents.

Example: agent = rlPPOAgent(rlNumericSpec([2 1]),rlNumericSpec([1 1])) creates the default rlPPOAgent object agent for an environment with an observation channel carrying a continuous two-element vector and an action channel carrying a continuous scalar.

Actor, specified as an rlContinuousDeterministicActor, rlDiscreteCategoricalActor or rlContinuousGaussianActor object.

Example: cda = rlContinuousDeterministicActor(dlnetwork([featureInputLayer(2) fullyConnectedLayer(10) reluLayer fullyConnectedLayer(1)]),rlNumericSpec([2 1]),rlNumericSpec([1 1])) creates the rlContinuousDeterministicActor object cda.

For more information on reinforcement learning policies, see Create Actors, Critics, and Policy Objects.

Example: policy = getExplorationPolicy(rlPPOAgent(rlNumericSpec([2 1]),rlNumericSpec([1 1]))) extracts the object that implements the exploration policy from a default PPO agent and assigns it to the variable policy.

Environment observations, specified as a cell array with as many elements as there are observation input channels. Each element of obs contains an array of observations for a single observation input channel.

The dimensions of each element in obs are MO-by-LB-by-LS, where:

  • MO corresponds to the dimensions of the associated observation input channel.

  • LB is the batch size. To specify a single observation, set LB = 1. To specify a batch of observations, specify LB > 1. If your approximator object has multiple observation input channels, then LB must be the same for all elements of obs.

  • LS specifies the sequence length for a recurrent neural network. If your approximator object does not use a recurrent neural network, then LS = 1. If the approximator has multiple observation input channels, then LS must be the same for all elements of obs.

LB and LS must be the same for all the approximator input channels (both observation and, if needed, action).

For more information on input and output formats for recurrent neural networks, see the Algorithms section of lstmLayer.

Example: {rand(8,3,64,1),rand(4,1,64,1)}

Option to use forward pass, specified as a logical value. When you specify UseForward=true the function calculates its outputs using forward instead of predict. This allows layers such as batch normalization and dropout to appropriately change their behavior for training.

Example: true

Output Arguments

collapse all

Action, returned as cell array containing either one (for discrete or continuous action spaces) or two (for hybrid action spaces) elements. Each element of the array in turn contains the action corresponding to obs, which is an array with dimensions MA-by-LB-by-LS, where:

  • MA corresponds to the dimensions of the associated action specification.

  • LB is the batch size.

  • LS is the sequence length for recurrent neural networks. If the agent, actor, or policy calculating action do not use recurrent neural networks, then LS = 1.

For hybrid action spaces, the first element of the cell array contains the discrete part of the action, while the second element contains the continuous part of the action.

Note

The following continuous action-space actor, policy and agent objects do not enforce the constraints set by the action specification:

In these cases, you must enforce action space constraints within the environment.

Next state of the actor, returned as a cell array. If actor does not use a recurrent neural network, then state is an empty cell array.

You can set the state of the actor to state using dot notation. For example:

actor.State=state;

Updated agent, returned as the same agent object as the agent in the input argument. Note that agent is a handle object. Therefore, its internal states (if any) are updated whether agent is returned as an output argument or not. For more information about handle objects, see Handle Object Behavior.

Updated policy object. It is identical to the policy object supplied as a first input argument, except that its internal states (if any) are updated.

Tips

Version History

Introduced in R2020a