matlab强化学习多维离散动作如何创建

12 views (last 30 days)
w
w on 24 Oct 2024
Commented: w on 24 Oct 2024
action1_values = 0:1:40;
action2_values = 0:1:40;
[action1, action2] = ndgrid(action1_values, action2_values);
discreteActions = [action1(:), action2(:)];
actionCellArray = num2cell(discreteActions, 2);
actInfo = rlFiniteSetSpec(actionCellArray);
上述为我创建的代码 是生成一个1681个element的cell 但是我对此疑惑的是 agent是否能像连续的动作空间那样知道如何通过改变数字大小来进行学习
  2 Comments
Mr. Pavl M.
Mr. Pavl M. on 24 Oct 2024
Translated:
Question Title:
How to create multi-dimensional discrete actions for matlab reinforcement learning?
Question body:
The code created for me above generates a cell of 1681 elements. But what I wonder about this is whether the agent can know how to learn by changing the size of the numbers like a continuous action space.
Answer to the question:
Suppose it learns in grid by making steps, rememorizing in vicinity of current step and the learning steps are connected as Markov Chain.
What is the motivation from real world trials to tend to continuous space?
In MPPL NCE TCE and computerized environment it is numeric and discrete, continuous can be approximated by discrete with sampling_step(grid_res) -> 0 with continuous to discrete approx. error depends on agent learning process and learning environment, that's why I added the for loop, also continuous can be simulated by symbolic math.
What are the Objective, fitness function of the desired learning detailed formulation in symbolic or numeric?
Have you tried next:
clc
clear all
close all
Grid_dim = 40;
Grid_step = 0.1;
start = 0
for grid_res = 0.0000001:0.0000001:0.5
%Action space:
action1_values = start:grid_res:Grid_dim;
action2_values = start:grid_res:Grid_dim;
[action1, action2] = ndgrid(action1_values, action2_values);
discreteActions = [action1(:), action2(:)];
actionCellArray = num2cell(discreteActions, 2);
actInfo = rlFiniteSetSpec(actionCellArray);
%Learning environment space:
%Agent Learning logic code:
%Fitness function:
end
% ...
w
w on 24 Oct 2024
我的目的是让agent通过学习,能最快的从一个容量为800的储罐向另外两个容量最大为400储罐运输,运输速度最大为40,按常理说,agent的两个action的应该为40在10s内输出完成即为最优解。当我采用连续的动作空间的时能合理的输出结果,但是当我采用离散的空间时却无法实现。所以我产生了上述的疑问。下面为我的代码
My goal is to enable the agent to learn the fastest way to transport from an 800-capacity tank to two other tanks with a maximum capacity of 400 each, with a maximum transport speed of 40. According to common sense, the agent's two actions should be 40 and output completed within 10s is the optimal solution. When I use a continuous action space, I can reasonably output the results, but when I use a discrete action space, I cannot achieve this. Therefore, I have the above question. Below is my code.
net = [
featureInputLayer(prod(obsInfo.Dimension))
fullyConnectedLayer(16)
reluLayer
fullyConnectedLayer(16)
reluLayer
fullyConnectedLayer(numel(actInfo.Elements))
];
net = dlnetwork(net);
actor = rlDiscreteCategoricalActor(net,obsInfo,actInfo);
rng(0,"twister");
layers = [
featureInputLayer(obsInfo.Dimension(1))
fullyConnectedLayer(32)
reluLayer
fullyConnectedLayer(32)
reluLayer
fullyConnectedLayer(32)
reluLayer
fullyConnectedLayer(numel(actInfo.Elements))
];
dnn = dlnetwork(layers);
summary(dnn)
critic = rlVectorQValueFunction(dnn,obsInfo,actInfo);
criticOpts = rlOptimizerOptions(LearnRate=1e-4,GradientThreshold=10);
agentOpts = rlSACAgentOptions( ...
SampleTime = Ts, ...
ExperienceBufferLength = 1e6, ...
NumWarmStartSteps = 1e3, ...
MiniBatchSize = 300);
agentOpts.ActorOptimizerOptions.Algorithm = "adam";
agentOpts.ActorOptimizerOptions.LearnRate = 1e-4;
agentOpts.ActorOptimizerOptions.GradientThreshold = 1;
for ct = 1:2
agentOpts.CriticOptimizerOptions(ct).Algorithm = "adam";
agentOpts.CriticOptimizerOptions(ct).LearnRate = 5e-4;
agentOpts.CriticOptimizerOptions(ct).GradientThreshold = 1;
end
opt.CriticOptimizerOptions = criticOpts;
agent = rlSACAgent(actor,critic,agentOpts)
trainOpts = rlTrainingOptions(...
MaxEpisodes=1000,...
MaxStepsPerEpisode=Tf/Ts,...
Verbose=false,...
Plots="training-progress",...
StopTrainingCriteria="AverageReward",...
StopTrainingValue=5000,...
ScoreAveragingWindowLength=10,...
SaveAgentCriteria="EpisodeReward",...
SaveAgentValue=500);
trainingStats = train(agent,env,trainOpts);

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!