How to input action in reinforcement learning template environment?

Question

I have modified the template environment to adapt my scenarios. My current action cosists of two vectors. The Action configuration is like the following.
function this = EdgeEnvironment()
         % Initialize Observation settings
         ObservationInfo(1) = rlNumericSpec([1 10]);
         ObservationInfo(1).Name = 'schedule';
         ObservationInfo(1).Description = 'schedule';
         ObservationInfo(2) = rlNumericSpec([1 20]);
         ObservationInfo(2).Name = 'ppath';
         ObservationInfo(2).Description = 'ppath';
         ObservationInfo(3) = rlNumericSpec([1 1]);
         ObservationInfo(3).Name = 'completionTime';
         ObservationInfo(3).Description = 'completionTime';
         ObservationInfo(4) = rlNumericSpec([1 1]);
         ObservationInfo(4).Name = 'computeDuring';
         ObservationInfo(4).Description = 'computeDuring';

         % Initialize Action settings
         ActionInfo(1) = rlNumericSpec([1 10]);
         ActionInfo(1).Name = 'schedule';
         ActionInfo(2) = rlNumericSpec([1 20]);
         ActionInfo(2).Name = 'ppath';
         
         % The following line implements built-in functions of RL env
         this = this@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo);
end
 The step function was designed like the following.
        function [Observation,Reward,IsDone,LoggedSignals] = step(this, Action)
            LoggedSignals = [];
            % distance
            node_distance = zeros(this.device_count, this.device_count);
            distance = getDistance(this, node_distance);
            % parameter list
            parameter_list = getstruct(this, distance);
            % the parameter list of device
            device_list = get_device_list(this);
            % Extract action
            [schedule_act, ppath_act]=get_act(Action);
%             schedule_act = Action{1,1};
%             ppath_act = Action{1,2};
            % Unpack state vector
            last_schedule = schedule_act;
            last_ppath = ppath_act;
            last_completionTime = this.State{1,3};
            last_computeDuring = this.State{1,4};
            % Update system states
            [schedule, stay_node_list, completionTime] = ComScheduling(last_completionTime,...
                last_schedule, last_ppath, device_list, parameter_list);
            [ppath, stay_node_list, completionTime, computeDuring] = PathPlanning(last_completionTime,...
                last_ppath, schedule, stay_node_list, device_list, parameter_list);
            prob = 1 / (1 + exp((completionTime - last_completionTime)/parameter_list.omega));
            dice = rand(1);
            if dice <= prob
               last_ppath = ppath;
               last_schedule = schedule;
               last_stay_node_list = stay_node_list;
               last_completionTime = completionTime;
               last_computeDuring = computeDuring;
               completionTime_iter(end + 1) = completionTime;
            else
                completionTimer_iter(end + 1) = last_computeDuring;
            end
            ppath = last_ppath;
            schedule = last_schedule;
            stay_node_list = last_stay_node_list;
            completionTime = last_completionTime;
            computeDuring = last_computeDuring;
            Observation = {schedule, ppath, completionTime, computeDuring};
            this.State = Observation;
            
            % Check terminal condition
            completionTime = Observation(3);
            computeDuring = Observation(4);
            IsDone = completionTime < this.completionTime_threshold || computeDuring < this.computeDuring_threshold;
            this.IsDone = IsDone;
            
            % Get reward
            Reward = -completionTime;
       
        end
We caculate the action value by the following function.
       function [schedule_act, ppath_act] = get_act(action)
                schedule_act = action{1,1};
                ppath_act = action{1,2};
       end
When I run the validateEnvironment function, the error is like the following.

I want to know how to fix them.

Emmanouil Tzorakoleftherakis · Accepted Answer

Easiest thing you can do is add a break point and display what "action" variable is. It's obviously not a cell array so you cannot access is with braces {} in the "get_act" function. That's why you are getting the error

How to input action in reinforcement learning template environment?

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

8 Comments
Show 6 older comments Hide 6 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

How to input action in reinforcement learning template environment?

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

8 Comments Show 6 older comments Hide 6 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

8 Comments
Show 6 older comments Hide 6 older comments