Can the range of an RL agent's Action specification be changed within the step function?
In my problem, the range of discrete Actions is (1:32)
Each Action can only be used once except for Action 1.
As each Action is used I would like to remove it from the Action space until the only Action 1 remains.
I currently end an episode if the step function is called with an Action that was used previously.
The Reward returned is 1 if step is called with a new Action and -1 if any of the following 3 constraints is satisfied.
- Action Sequence: [..., 1, 1, ...]
- Action Sequence: [..., Action_N, ..., Action_N, ...]
- Sum(Weights([Action_1, ..., Action_N]) > 100 where Action Sequence = [..., 1, Action_1, ... Action_N, 1, ...]
If any of the three constraints is satisfied, the episode is terminated.
My question addresses Constraint 2.