Use MATLAB environment templates to create custom environments with continuous state space and continuous multi-action space. However, errors occur when using DDPG training

5 views (last 30 days)
I'm creating my reinforcement learning environment, which is a continuous, multi-input, multi-output environment. Thank you for your advice and help.
My reinforcement learning environment has a continuous observation space with three inputs and a continuous action space with two outputs. The observation space and action space are defined as follows:
function this = MyEnvironmentCalucateMin()
% Initialize Observation settings
ObservationInfo = rlNumericSpec([3 1]);
ObservationInfo.Name = '+ - * States';
ObservationInfo.Description = 'add, minus, multiply';
% Initialize Action settings
ActionInfo = rlNumericSpec([2 1]);
ActionInfo.Name = 'Action';
% The following line implements built-in functions of RL env
this = this@rl.env.MATLABEnvironment(ObservationInfo,ActionInfo);
% Initialize property values and pre-compute necessary values
updateActionInfo(this);
end
The limits of action space are as follows:
function updateActionInfo(this)
this.ActionInfo.LowerLimit = [0 0];
this.ActionInfo.UpperLimit = [10 10];%
end
My DDPGAgent design was modified based on the example "Quadruped Robot Locomotion Using DDPGAgent".I attached the files created by the environment and agent.
When I started training, the following warning first appeared:
Warning: Error executing listener callback for event Episodefinable defined for class MyEnvironmentCalucateMin:
)
Input data dimensions must match the dimensions specified in the corresponding observation and action info specifications.
Error rl. Representation. RlAbstractRepresentation/evaluate (line 116)
ValidateInputData (this, Data);
Error rl. Representation. Qdelegate. QSingleOutputDelegate/getValueImpl (line 30)
[QValue, State, BatchSize, SequenceLength] = evaluate(QRep, [Observation, Action]);
Error rl. Representation. Qdelegate. AbstractQDelegate/getValue (19)
[QValue, State, BatchSize, SequenceLength] = getValueImpl(this, QRep, varargin{:});
Error rl. Representation. RlQValueRepresentation/getValueImpl (line 86)
[QValue, State] = getValue(this.Delegate, this, varargin{:});
Error rl. Representation. RlAbstractValueRepresentation/getValue (59)
[Value, State] = getValueImpl(this, varargin{:});
Error rl. Agent. RlDDPGAgent/evaluateQ0Impl (line 159)
Q0 = getValue(this.Critic, observation, action);
Error rl. Agent. AbstractAgent/evaluateQ0 (line 275)
Q0 = evaluateQ0Impl (this);
Error rl. Train. TrainingManager/update (line 135)
Q0 = evaluateQ0 (enclosing Agents (independence idx), epinfo (independence idx). InitialObservation);
Error rl. Train. TrainingManager> @(info)update(this,info) (line 440)
Trainer. FinishedEpisodeFcn = @ (info), update (this info);
Error rl. Train. Trainer/notifyEpisodeFinishedAndCheckStopTrain (56)
StopTraining = this. FinishedEpisodeFcn (info);
Error rl. Train. SeriesTrainer> IUpdateEpisodeFinished (line 31)
notifyEpisodeFinishedAndCheckStopTrain(this,ed.Data);
Error rl. Train. SeriesTrainer> @(SRC, Ed)iUpdateEpisodeFinished(this, Ed) (line 17)
@ iUpdateEpisodeFinished (SRC, Ed) (this, Ed));
Error rl. Env. AbstractEnv/notifyEpisodeFinished (line 325)
Notify (this, 'EpisodeFinished, Ed).
Error rl. Env. MATLABEnvironment/simWithPolicyImpl (line 110)
NotifyEpisodeFinished (env, epinfo siminfos {simCount}, simCount);
Error rl. Env. AbstractEnv/simWithPolicy (line 83)
[experiences, varargout)} {1: (nargout - 1] = simWithPolicyImpl (this, policy, opts, varargin {that});
Error rl. Task. SeriesTrainTask/runImpl (line 33)
Varargout [varargout {1}, {2}] = simWithPolicy (enclosing Env, enclosing Agent, simOpts);
Error rl.task.task /run (line 21)
[varargout {1: nargout}] = runImpl (this);
Error rl. Task. TaskSpec/internal_run (line 166)
[varargout {1: nargout}] = run (task);
Error rl. Task. TaskSpec/runDirect (line 170)
[this Outputs {1: getNumOutputs (this)}] = internal_run (this);
Error rl. Task. TaskSpec/runScalarTask (line 194)
RunDirect (this);
Error rl.task.TaskSpec/run (line 69)
RunScalarTask (task);
Error rl. Train. SeriesTrainer/run (line 24)
The run (seriestaskspec);
Error rl. Train. TrainingManager/" train "(line 424)
The run (trainer);
Error rl. Train. TrainingManager/run (line 215)
"Train" (this);
Error rl. Agent. AbstractAgent/" train "(line 77)
TrainingStatistics = run (trainMgr);
Error LiveEditorEvaluationHelperE153176539 (line 82)
trainingStats = train(agent,env,trainOpts);
Then something goes error:
Error using rl. Policy. AbstractPolicy/step (line 242)
Invalid input argument type or size such as observation, reward, isdone or loggedSignals.
Error rl. Env. MATLABEnvironment/simLoop (line 258)
Action = step (policy, observation, reward, isdone);
Error rl. Env. MATLABEnvironment/simWithPolicyImpl (line 106)
[expcell {simCount}, epinfo siminfos {simCount}] = simLoop (env, policy, opts, simCount, usePCT);
Error rl. Env. AbstractEnv/simWithPolicy (line 83)
[experiences, varargout)} {1: (nargout - 1] = simWithPolicyImpl (this, policy, opts, varargin {that});
Error rl. Task. SeriesTrainTask/runImpl (line 33)
Varargout [varargout {1}, {2}] = simWithPolicy (enclosing Env, enclosing Agent, simOpts);
Error rl.task.task /run (line 21)
[varargout {1: nargout}] = runImpl (this);
Error rl. Task. TaskSpec/internal_run (line 166)
[varargout {1: nargout}] = run (task);
Error rl. Task. TaskSpec/runDirect (line 170)
[this Outputs {1: getNumOutputs (this)}] = internal_run (this);
Error rl. Task. TaskSpec/runScalarTask (line 194)
RunDirect (this);
Error rl.task.TaskSpec/run (line 69)
RunScalarTask (task);
Error rl. Train. SeriesTrainer/run (line 24)
The run (seriestaskspec);
Error rl. Train. TrainingManager/" train "(line 424)
The run (trainer);
Error rl. Train. TrainingManager/run (line 215)
"Train" (this);
Error rl. Agent. AbstractAgent/" train "(line 77)
TrainingStatistics = run (trainMgr);
The reason:
Incorrect use of 0
The number of elements cannot be changed. Use [] as one of the size inputs to automatically calculate the appropriate size for this dimension.
Note that DDPG Agent can be used when I set the action space to one,like:
ActionInfo = rlNumericSpec([1 1])
. Doesn't DDPG Agent support continuous, multi-action space in matlab environment?
Thank you again for your answer.
YIFENG.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!