Error appears when setting the multi-dimensional actions in Matlab Environment (Reinforcement Learning Toolbox)

As shown in the following codes, three actions, whose ranges are [0.1 10], [0.1 10] and [0 pi], respectively, are set:
%% main.m
%% Observation
ObservationInfo = rlNumericSpec([7 1]);
ObservationInfo.Name = 'Obstacle Avoidance States';
ObservationInfo.Description = 'delta_x, delta_y, delta_z, delta_L, delta_V, pusi, theta';
%% Action
ActionInfo = rlNumericSpec([3 1],'LowerLimit',[0.1 0.1 0]','UpperLimit',[10 10 pi]');
ActionInfo.Name = 'Action';
%% Environment
env = rlFunctionEnv(ObservationInfo,ActionInfo,'myStepFunction','myResetFunction');
rng(0);
InitialObs = reset(env);
Then, the actions are assigned three variables in the function myStepFunction (Action, LoggedSignals) as follows:
function [NextObs,Reward,IsDone,LoggedSignals] = myStepFunction(Action,LoggedSignals)
para_rho1 = Action(1);
para_rho2 = Action(2);
para_theta = Action(3);
......
end
Run the main.m, it is normal.
However, when running the following instruction, an error appears:
step(env,10)
Index exceeds the number of array elements (1)
Error: myStepFunction (line 23)
para_rho2 = Action(2);
Why does the dimension of the action is changed as 1? How to address this error?
If I set the variables para_rho2 and para_theta as constants, and change the dimension of the action as [1 1] in rlNumericSpec, then the instruction step(env,10) can be normally executed.

Answers (1)

Hello, so I've take a look at the rocket lander code environment which MATLAB gives as an example. What they do, it that every action is scaled between 0 and 1. The maximum values for the actions are then stored in your environment properties as a vector of values defining the min and max values. When we hop into the step function, the actions get scaled. I know this seems strange, and i'm not sure if it's the best approach but i'm not a veteran of RL. So, what you should do is the following:
%% main.m
%% Observation
ObservationInfo = rlNumericSpec([7 1]);
ObservationInfo.Name = 'Obstacle Avoidance States';
ObservationInfo.Description = 'delta_x, delta_y, delta_z, delta_L, delta_V, pusi, theta';
%% Action
ActionInfo = rlNumericSpec([3 1 1],'LowerLimit',0,'UpperLimit',1);
ActionInfo.Name = 'Action';
%stuff...
function [NextObs,Reward,IsDone,LoggedSignals] = myStepFunction(Action,LoggedSignals)
para_rho1 = Action(1).*env.borders(1); %so open rocket lander from MATLAB and take a look
para_rho2 = Action(2).*env.borders(2);
para_theta = Action(3).*env.borders(3);
......
end
Hope this helps
RC

2 Comments

Many thanks for your reply! But I think I have solved this problem. When I continued designing the corresponding DDPG training process and start training, I found that the whole programme can correctly run. So I guess the verification using the function step() may not be necessary.

Sign in to comment.

Categories

Find more on Reinforcement Learning Toolbox in Help Center and File Exchange

Products

Release

R2020a

Asked:

on 27 May 2020

Commented:

on 1 Jun 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!