# Reinforcement Learning Toolbox: Actor Critic Error "Invalid input argument type or size such as observation, reward, isdone or loggedSignals"

6 views (last 30 days)
Anthony on 1 Nov 2019
I am having issues training my Actor Critic agent. I can train this agent when i use a single set of disecrete actions but I get an error when i move on to multiple sets of discrete actions using the advice given in this MATLAb Answers thread:
The advice given above got me past the error I was working on at the time but I am recieving a new error :
Invalid input argument type or size such as observation, reward, isdone or loggedSignals.
I went through the construction of my environment and all of my related code is listed below and I'm at loss for what could be causing the error at this point. Among my attempts at fixing the error, I tried the solition presented in the following MATLAB Answers but I don't think it is applicable to my environment and it didn't work:
Any help would be greatly appreciated!
clear
clc
pause(1)
global z
s=1;
save('Success.mat','s')
i=1;
tt=1;
%% Develop Environment
s=1;
save('Success.mat','s')
z=1;
% Observations
obsInfo = rlNumericSpec([1 1]);
obsInfo.Name = 'observations';
% Actions
vectors = { round(linspace(0,20,10),2)', ...
round(linspace(0,20,10),2)'}; ...
%input data: cell array of vectors
n = numel(vectors); % number of vectors
combs = cell(1,n); % pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); % the reverse order in these two
% comma-separated lists is needed to produce the rows of the result matrix
combs = cat(n+1, combs{:}); %concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n);%reshape to obtain desired matrix
combs=combs';
actInfo = rlFiniteSetSpec(num2cell(combs,1));
actInfo.Name={'action'};
%% Build Custom Environment
env=rlFunctionEnv(obsInfo,actInfo,'GabeStepFunction','GabeResetFunction');
%% Extract Data from Environment
obsInfo = getObservationInfo(env);
numObs = obsInfo.Dimension(1);
actInfo = getActionInfo(env);
numActions = length(combs);
%%
criticNetwork = [
imageInputLayer([numObs 1 1],'Normalization','none','Name','state')
fullyConnectedLayer(52,'Name','CriticStateFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(1, 'Name', 'CriticFC')];
critic = rlRepresentation(criticNetwork,obsInfo,'Observation',{'state'},criticOpts);
actorNetwork = [
imageInputLayer([numObs 1 1],'Normalization','none','Name','state')
fullyConnectedLayer(52, 'Name','ActorStateFC1')
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(numActions,'Name','action')];
actor = rlRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'state'},'Action',{'action'},actorOpts);
agentOpts = rlACAgentOptions(...
'EntropyLossWeight',.7,...
'DiscountFactor',0.99);
agent = rlACAgent(actor,critic,agentOpts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',130,...
'MaxStepsPerEpisode', 128,...
'Verbose',false,...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',500,...
'ScoreAveragingWindowLength',10);
trainingStats = train(agent,env,trainOpts);
Step Function
function [NextObs,Reward,IsDone,LoggedSignals] = GabeStepFunction(Action,~)
%% Oscillator Function
% phase = Matsuoka_2_weights(Action(1),Action(2));
phase=1;
%% some kind of phase calc
if phase<0
phase=360+phase;
end
LoggedSignals.State=phase;
NextObs=LoggedSignals.State;
%% Reward function
Reward=-abs(50 - phase);
if Reward >= -2
IsDone = 1;
else
IsDone=0;
end
if ~IsDone
else
Reward=1000;
fprintf('solution found: h12= %.2f \n',Action(1))
fprintf(' h21= %.2f \n',Action(2))
h12(z)=Action(1);
h21(z)=Action(2);
z=z+1;
save('Success.mat','s')
end
end
Reset Function
function [InitialObservation, LoggedSignal] = GabeResetFunction()
LoggedSignal.State = 0;
InitialObservation = LoggedSignal.State;
end