Reinforcment Learning Toolbox: Actor Crtic training issues

1 view (last 30 days)
Anthony
Anthony on 7 Oct 2019
I am having issues training my Actor Critic agent. I can train this agent when i use a single set of disecrete actions but I get an error when i move on to multiple sets of discrete actions using the advice given in this MATLAb Answers thread:
The advice given above got me past the error I was working on at the time but I am recieving a new error :
Invalid input argument type or size such as observation, reward, isdone or loggedSignals.
I went through the construction of my environment and all of my related code is listed below and I'm at loss for what could be causing the error at this point. Among my attempts at fixing the error, I tried the solition presented in the following MATLAB Answers but I don't think it is applicable to my environment and it didn't work:
An help would be greatly appreciated!
My reset function
function [InitialObservation, LoggedSignal] = DriveResetFunction()
LoggedSignal.State = [0;0];
InitialObservation = LoggedSignal.State;
end
My step function
function [NextObs,Reward,IsDone,LoggedSignals] = DriveStepFunction(Action,LoggedSignals)
% clear a
% a=arduino('com6','Uno');
%
% FrontRight(a,Action(1));
% FrontLeft(a,Action(2));
% RearRight(a,Action(3));
% RearLeft(a,Action(4));
% pause(0.5);
% clear a
State=LoggedSignals.State;
x=mean(Action);
y=mean(Action);
LoggedSignals.State=[State(1)+x; State(2)+y];
NextObs=LoggedSignals.State;
%% Reward function
if y<0
Reward=-(10*y)*(10*y);
else
Reward=(10*y)*(10*y);
end
if y==1
IsDone = 1;
else
IsDone=0;
if ~IsDone
else
Reward=1000;
end
display(Reward)
display(Action)
end
The rest of my learning code
%% Develop Environment
% Observations
obsInfo = rlNumericSpec([2 1]);
obsInfo.Name = 'observations';
vectors = { [-1 -.8 -.6 -.4 0 .4 .6 .8 1]', ...
[-1 -.8 -.6 -.4 0 .4 .6 .8 1]', ...
[-1 -.8 -.6 -.4 0 .4 .6 .8 1]', ...
[-1 -.8 -.6 -.4 0 .4 .6 .8 1]'};
%input data: cell array of vectors
n = numel(vectors); % number of vectors
combs = cell(1,n); % pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); % the reverse order in these two
% comma-separated lists is needed to produce the rows of the result matrix
combs = cat(n+1, combs{:}); %concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n);%reshape to obtain desired matrix
combs=combs';
NumAction=length(combs);
% Actions
actInfo = rlFiniteSetSpec(num2cell(combs,1));
actInfo.Name={'action1', 'action2', 'action3', 'action4'};
%% Build Custom Environment
env=rlFunctionEnv(obsInfo,actInfo,'DriveStepFunction','DriveResetFunction');
%% Extract Data from Environment
obsInfo = getObservationInfo(env);
numObs = obsInfo.Dimension(1);
actInfo = getActionInfo(env);
%% Develop Critic
criticNetwork = [
imageInputLayer([numObs 1 1],'Normalization','none','Name','state')
fullyConnectedLayer(1,'Name','CriticFC')];
criticOpts = rlRepresentationOptions('LearnRate',.05,'GradientThreshold',1);
critic = rlRepresentation(criticNetwork,obsInfo,'Observation',{'state'},criticOpts);
%% Develop Actor
actorNetwork = [
imageInputLayer([numObs 1 1],'Normalization','none','Name','state')
fullyConnectedLayer(NumAction,'Name','action')];
actorOpts = rlRepresentationOptions('LearnRate',.05,'GradientThreshold',1);
actor = rlRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'state'},'Action',{'action'},actorOpts);
%% Develop Agent
agentOpts = rlACAgentOptions(...
'NumStepsToLookAhead',1,...
'EntropyLossWeight',0.01,...
'DiscountFactor',0.99);
agent = rlACAgent(actor,critic,agentOpts);
%% Train Agent
trainOpts = rlTrainingOptions(...
'MaxEpisodes',300, ...
'MaxStepsPerEpisode',1, ...
'Verbose',false, ...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',101,...
'ScoreAveragingWindowLength',10);
trainingStats = train(agent,env,trainOp

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!