sudden increase of ram and get an out of memory error while i training reinforcement learning agent.

9 views (last 30 days)
(It is no use to increase the page file size and increase RAM from 32GB to 64GB.)
I use matlab r2020b in Windows 10 and the gpu is 2080ti, ram is 64GB.
i created a simulink environment and import this using rlSimulinkEnv function .
I used matlab function blocks to calculate the observation and rewards in a simulink, and is this why the memory increase?
(also The larger the experience buffer length, the later the memory growth occurs.
If experience buffer length =1e3, the episode goes around 100 times and stops, raise it to 1e7, the episode goes around 1,900 times and stops.)
actor and critic structure is as follows. I use gpu with parallel computing toolbox. (There is the same error when I training with a cpu.)
  4 Comments
eunhwa lee
eunhwa lee on 23 Oct 2020
Sorry for late reply it also occurs in R2020a.
And there's the same error whether I use a parallel computing or a gpu or a cpu.
The main code is below and compressed with zip is the entire code.
I use driving scenario designer to create autonomous driving environment and learn from simulink RL agnet block.
Call up the environment with rlenvsimulink
%% Bus Creation
% Create the bus of actors from the scenario reader
ModelName = 'new_lpp_sim_matfile'; % simulink model name
wasModelLoaded = bdIsLoaded(ModelName);
if ~wasModelLoaded
load_system(ModelName)
end
blk=find_system(ModelName,'System','driving.scenario.internal.ScenarioReader');
s = get_param(blk{1},'PortHandles');
get(s.Outport(1),'SignalHierarchy');
[scenario,egoCar,actor_profiles] = helperSessionToScenario('temp_lpp_env.mat');
ego_x = egoCar.x0;
ego_y = egoCar.y0;
ego_v = egoCar.v0;
ego_yaw = egoCar.yaw0;
m = 1575 ; %vehivle mass(kg)
lf = 1.2;
lr = 1.6;
lz = 2875;
%% open simulink and define state, action, env
open_system(ModelName)
agent_Block = [ModelName, '/RL Agent'];
Tf = 10 ; % seconds, simulation duration
Ts = 0.1; % seconds (15ms), simulation time per one step
ObservationInfo = rlNumericSpec([8 6]);
ObservationInfo.Name = 'ENV_States';
ActionInfo = rlNumericSpec([2 1], ...
'LowerLimit',[ 0; -180],...
'UpperLimit',[ 10; 180]);
ActionInfo.Name = 'ENV_Action';
env = rlSimulinkEnv(ModelName,agent_Block, ObservationInfo, ActionInfo);
%env.ResetFcn = @(in) LocalPathPlanningFcn(in);
env.UseFastRestart = 'on';
rng(0);
%% DDPG critic Network
statePath = [
imageInputLayer([8 6 1],'Normalization','none','Name','state')
fullyConnectedLayer(256,'Name','CriticStateFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(128,'Name','CriticStateFC2')
reluLayer('Name','CriticRelu2')
fullyConnectedLayer(64,'Name','CriticStateFC3')
additionLayer(2,'Name','add')
reluLayer('Name','CriticRelu3')
fullyConnectedLayer(1,'Name','CriticStateFC4')];
actionPath = [
imageInputLayer([2 1 1],'Normalization','none','Name','action')
fullyConnectedLayer(64,'Name','CriticActionFC1')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
figure
plot(criticNetwork)
criticOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,ObservationInfo,ActionInfo,'Observation',{'state'},'Action',{'action'},criticOpts);
%% Actor network
actorNetwork = [
imageInputLayer([8 6 1],'Normalization','none','Name','state')
fullyConnectedLayer(256,'Name','CriticStateFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(512,'Name','CriticStateFC2')
reluLayer('Name','CriticRelu2')
fullyConnectedLayer(24,'Name','CriticStateFC3')
reluLayer('Name','CriticRelu3')
fullyConnectedLayer(2,'Name','CriticStateFC4')
tanhLayer('Name','tanh1')];
actorOpts = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,ObservationInfo,ActionInfo,...
'Observation',{'state'},'Action',{'tanh1'},actorOpts);
%% define agent
agentOptions = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'ExperienceBufferLength',20000 ,...
'DiscountFactor',0.98,...
'MiniBatchSize',256);
agentOptions.NoiseOptions.Variance = 1e-1;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-6;
agentOptions.SaveExperienceBufferWithAgent = true; % Default false
%agentOptions.ResetExperienceBufferBeforeTraining = false; % Default true
agent = rlDDPGAgent(actor,critic,agentOptions);
%% Train DDQN
maxsteps = ceil(Tf/Ts) + 50;
trainingOptions = rlTrainingOptions(...
'MaxEpisodes',10000,...
'MaxStepsPerEpisode',maxsteps,...
'StopOnError',"on",...
'Verbose',false,...
'Plots',"training-progress",...
'StopTrainingCriteria',"AverageReward",...
'StopTrainingValue',45,...
'ScoreAveragingWindowLength',20,...
'SaveAgentCriteria',"AverageReward",...
'SaveAgentValue',45);
% trainingOptions.UseParallel = true;
% trainingOptions.ParallelizationOptions.Mode = "async";
% trainingOptions.ParallelizationOptions.DataToSendFromWorkers = "experiences";
% trainingOptions.ParallelizationOptions.StepsUntilDataIsSent = 24;
%% Train the agent.
doTraining = true;
if doTraining
trainingStats = train(agent,env,trainingOptions);
end
save(trainingOptions.SaveAgentDirectory + "finalAgent.mat",'agent')
Guru Bhargava Khandavalli
Guru Bhargava Khandavalli on 20 Dec 2020
Hello, I have a similar issue. But my code runs properly with cpu. It also runs without problems when using gpu without parpool. Where can the issue be?

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!