Use MATLAB environment templates to create custom environments with continuous state space and continuous multi-action space. However, errors occur when using DDPG training

Question

Yi Feng on 15 Dec 2021

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/1611125-use-matlab-environment-templates-to-create-custom-environments-with-continuous-state-space-and-conti

Edited: Yi Feng on 15 Dec 2021

I'm creating my reinforcement learning environment, which is a continuous, multi-input, multi-output environment. Thank you for your advice and help.

My reinforcement learning environment has a continuous observation space with three inputs and a continuous action space with two outputs. The observation space and action space are defined as follows:

function this = MyEnvironmentCalucateMin()
            % Initialize Observation settings
            ObservationInfo = rlNumericSpec([3 1]);
            ObservationInfo.Name = '+ - * States';
            ObservationInfo.Description = 'add, minus, multiply';
            
            % Initialize Action settings   
            ActionInfo = rlNumericSpec([2 1]);
            ActionInfo.Name = 'Action';
            
            % The following line implements built-in functions of RL env
            this = this@rl.env.MATLABEnvironment(ObservationInfo,ActionInfo);
            
            % Initialize property values and pre-compute necessary values
            updateActionInfo(this);
end

The limits of action space are as follows:

 function updateActionInfo(this)
            this.ActionInfo.LowerLimit = [0 0];
            this.ActionInfo.UpperLimit = [10 10];%
 end

My DDPGAgent design was modified based on the example "Quadruped Robot Locomotion Using DDPGAgent".I attached the files created by the environment and agent.

When I started training, the following warning first appeared:

 Warning: Error executing listener callback for event Episodefinable defined for class MyEnvironmentCalucateMin:
)
Input data dimensions must match the dimensions specified in the corresponding observation and action info specifications.
Error rl. Representation. RlAbstractRepresentation/evaluate (line 116)
ValidateInputData (this, Data);
Error rl. Representation. Qdelegate. QSingleOutputDelegate/getValueImpl (line 30)
[QValue, State, BatchSize, SequenceLength] = evaluate(QRep, [Observation, Action]);
Error rl. Representation. Qdelegate. AbstractQDelegate/getValue (19)
[QValue, State, BatchSize, SequenceLength] = getValueImpl(this, QRep, varargin{:});
Error rl. Representation. RlQValueRepresentation/getValueImpl (line 86)
[QValue, State] = getValue(this.Delegate, this, varargin{:});
Error rl. Representation. RlAbstractValueRepresentation/getValue (59)
[Value, State] = getValueImpl(this, varargin{:});
Error rl. Agent. RlDDPGAgent/evaluateQ0Impl (line 159)
Q0 = getValue(this.Critic, observation, action);
Error rl. Agent. AbstractAgent/evaluateQ0 (line 275)
Q0 = evaluateQ0Impl (this);
Error rl. Train. TrainingManager/update (line 135)
Q0 = evaluateQ0 (enclosing Agents (independence idx), epinfo (independence idx). InitialObservation);
Error rl. Train. TrainingManager&gt; @(info)update(this,info) (line 440)
Trainer. FinishedEpisodeFcn = @ (info), update (this info);
Error rl. Train. Trainer/notifyEpisodeFinishedAndCheckStopTrain (56)
StopTraining = this. FinishedEpisodeFcn (info);
Error rl. Train. SeriesTrainer&gt; IUpdateEpisodeFinished (line 31)
notifyEpisodeFinishedAndCheckStopTrain(this,ed.Data);
Error rl. Train. SeriesTrainer&gt; @(SRC, Ed)iUpdateEpisodeFinished(this, Ed) (line 17)
@ iUpdateEpisodeFinished (SRC, Ed) (this, Ed));
Error rl. Env. AbstractEnv/notifyEpisodeFinished (line 325)
Notify (this, 'EpisodeFinished, Ed).
Error rl. Env. MATLABEnvironment/simWithPolicyImpl (line 110)
NotifyEpisodeFinished (env, epinfo siminfos {simCount}, simCount);
Error rl. Env. AbstractEnv/simWithPolicy (line 83)
[experiences, varargout)} {1: (nargout - 1] = simWithPolicyImpl (this, policy, opts, varargin {that});
Error rl. Task. SeriesTrainTask/runImpl (line 33)
Varargout [varargout {1}, {2}] = simWithPolicy (enclosing Env, enclosing Agent, simOpts);
Error rl.task.task /run (line 21)
[varargout {1: nargout}] = runImpl (this);
Error rl. Task. TaskSpec/internal_run (line 166)
[varargout {1: nargout}] = run (task);
Error rl. Task. TaskSpec/runDirect (line 170)
[this Outputs {1: getNumOutputs (this)}] = internal_run (this);
Error rl. Task. TaskSpec/runScalarTask (line 194)
RunDirect (this);
Error rl.task.TaskSpec/run (line 69)
RunScalarTask (task);
Error rl. Train. SeriesTrainer/run (line 24)
The run (seriestaskspec);
Error rl. Train. TrainingManager/" train "(line 424)
The run (trainer);
Error rl. Train. TrainingManager/run (line 215)
"Train" (this);
Error rl. Agent. AbstractAgent/" train "(line 77)
TrainingStatistics = run (trainMgr);
Error LiveEditorEvaluationHelperE153176539 (line 82)
trainingStats = train(agent,env,trainOpts);

Then something goes error:

Error using rl. Policy. AbstractPolicy/step (line 242)
Invalid input argument type or size such as observation, reward, isdone or loggedSignals.
Error rl. Env. MATLABEnvironment/simLoop (line 258)
Action = step (policy, observation, reward, isdone);
Error rl. Env. MATLABEnvironment/simWithPolicyImpl (line 106)
[expcell {simCount}, epinfo siminfos {simCount}] = simLoop (env, policy, opts, simCount, usePCT);
Error rl. Env. AbstractEnv/simWithPolicy (line 83)
[experiences, varargout)} {1: (nargout - 1] = simWithPolicyImpl (this, policy, opts, varargin {that});
Error rl. Task. SeriesTrainTask/runImpl (line 33)
Varargout [varargout {1}, {2}] = simWithPolicy (enclosing Env, enclosing Agent, simOpts);
Error rl.task.task /run (line 21)
[varargout {1: nargout}] = runImpl (this);
Error rl. Task. TaskSpec/internal_run (line 166)
[varargout {1: nargout}] = run (task);
Error rl. Task. TaskSpec/runDirect (line 170)
[this Outputs {1: getNumOutputs (this)}] = internal_run (this);
Error rl. Task. TaskSpec/runScalarTask (line 194)
RunDirect (this);
Error rl.task.TaskSpec/run (line 69)
RunScalarTask (task);
Error rl. Train. SeriesTrainer/run (line 24)
The run (seriestaskspec);
Error rl. Train. TrainingManager/" train "(line 424)
The run (trainer);
Error rl. Train. TrainingManager/run (line 215)
"Train" (this);
Error rl. Agent. AbstractAgent/" train "(line 77)
TrainingStatistics = run (trainMgr);
The reason:
Incorrect use of 0
The number of elements cannot be changed. Use [] as one of the size inputs to automatically calculate the appropriate size for this dimension.

Note that DDPG Agent can be used when I set the action space to one，like：

ActionInfo = rlNumericSpec([1 1])

. Doesn't DDPG Agent support continuous, multi-action space in matlab environment?

Use MATLAB environment templates to create custom environments with continuous state space and continuous multi-action space. However, errors occur when using DDPG training

0 Comments
Show -2 older commentsHide -2 older comments

Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

Use MATLAB environment templates to create custom environments with continuous state space and continuous multi-action space. However, errors occur when using DDPG training

0 Comments Show -2 older commentsHide -2 older comments

Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments