Main Content

getGreedyPolicy

Extract greedy (deterministic) policy object from agent

Since R2022a

    Description

    policy = getGreedyPolicy(agent) returns a deterministic policy object from the specified reinforcement learning agent.

    example

    Examples

    collapse all

    For this example, load the PG agent trained in Train PG Agent with Custom Actor Network to Balance Discrete Cart-Pole.

    load("MATLABCartpolePG.mat","agent")

    Extract the agent greedy policy using getGreedyPolicy.

    policyDtr = getGreedyPolicy(agent)
    policyDtr = 
      rlStochasticActorPolicy with properties:
    
                         Actor: [1×1 rl.function.rlDiscreteCategoricalActor]
        UseMaxLikelihoodAction: 1
                 Normalization: "none"
               ObservationInfo: [1×1 rl.util.rlNumericSpec]
                    ActionInfo: [1×1 rl.util.rlFiniteSetSpec]
                    SampleTime: 1
    
    

    Note that, in the extracted policy object, the UseMaxLikelihoodAction property is set to true. This means that the policy object always generates the maximum likelihood action in response to a given observation, and is therefore greedy (and deterministic).

    Alternatively, you can extract a stochastic policy using getExplorationPolicy.

    policyXpl = getExplorationPolicy(agent)
    policyXpl = 
      rlStochasticActorPolicy with properties:
    
                         Actor: [1×1 rl.function.rlDiscreteCategoricalActor]
        UseMaxLikelihoodAction: 0
                 Normalization: "none"
               ObservationInfo: [1×1 rl.util.rlNumericSpec]
                    ActionInfo: [1×1 rl.util.rlFiniteSetSpec]
                    SampleTime: 1
    
    

    This time, the extracted policy object has the UseMaxLikelihoodAction property is set to false. This means that the policy object generates a random action, given an observation. The policy is therefore stochastic and useful for exploration.

    Input Arguments

    collapse all

    Agent, specified as one of the following reinforcement learning agent objects:

    Note

    agent is a handle object, so a function that does not return it as output argument, such as train, can still update it. For more information about handle objects, see Handle Object Behavior.

    For more information on reinforcement learning agents, see Reinforcement Learning Agents.

    Example: agent = rlPPOAgent(rlNumericSpec([2 1]),rlNumericSpec([1 1])) creates the default rlPPOAgent object agent for an environment with an observation channel carrying a continuous two-element vector and an action channel carrying a continuous scalar.

    Note

    if agent is an rlMBPOAgent object, to extract the greedy policy, use getGreedyPolicy(agent.BaseAgent).

    Output Arguments

    collapse all

    Policy object, returned as one of the following:

    • rlMaxQPolicy object — Returned when agent is an rlQAgent, rlSARSAAgent, or rlDQNAgent object.

    • rlDeterministicActorPolicy object — Returned when agent is an rlDDPGAgent or rlTD3Agent object.

    • rlStochasticActorPolicy object, with the UseMaxLikelihoodAction set to true — Returned when agent is an rlACAgent, rlPGAgent, rlPPOAgent, rlTRPOAgent or rlSACAgent object. Because the returned policy object has the UseMaxLikelihoodAction property set to true, it always generates the deterministic maximum likelihood action as a response to given observation.

    • rlHybridStochasticActorPolicy object, with the UseMaxLikelihoodAction set to true — Returned when agent is a hybrid rlSACAgent object. Because the returned policy object has the UseMaxLikelihoodAction property set to true, it always generates the deterministic maximum likelihood action as a response to given observation.

    Version History

    Introduced in R2022a