I am making code by reinforcement learning.
The purpose of reinforcement learning describes a simple pendulum that throws a ball at a target point.
However, the figure below shows the learning situation.
I feel that there is a problem with the episode reward.
Is this because the episodes haven't been updated, that is, the observations haven't been updated?
Or is there some other cause?
Below is the code for the update of the observed values.
function [Observation,Reward,IsDone,LoggedSignals] = step(this,Action)
LoggedSignals = ;
Force = getForce(this,Action);
theta = this.State(1);
w = this.State(2);
IsDone = false;
R = 0;
q2 = w - (this.g/this.L) *theta*this.Ts- this.b * this.Ts-Force*this.Ts;
q1 = theta + w * this.Ts;
ball_x = this.L * sin(q1);
ball_y = -this.L * cos(q1);
ball_time = sqrt(2*abs(ball_y)/9.8);
ball_reach = ball_x +abs(q2).*ball_time;
ball_gosa = ball_reach-this.Target;
q3 = ball_gosa;
if 0 < q3 && q3 < 1
IsDone = true;
R = this.RewardForStrike;
R = this.RewardForNotFalling;
Observation = [q1 q2 q3 Force]';
this.State = Observation;
this.IsDone = IsDone;
Reward = getReward(this,R);