Code covered by the BSD License  

Highlights from
Markov Decision Processes (MDP) Toolbox

image thumbnail
from Markov Decision Processes (MDP) Toolbox by Marie-Josee Cros
Functions related to the resolution of discrete-time Markov Decision Processes.

mdp_relative_value_iteration.html
mdp_relative_value_iteration description
MDP Toolbox for MATLAB

mdp_relative_value_iteration

Solves MDP with average reward with relative value iteration algorithm.

Syntax

[U, policy, g, cpu_time] = mdp_relative_value_iteration (P, R)
[U, policy, g, cpu_time] = mdp_relative_value_iteration (P, R, epsilon)
[U, policy, g, cpu_time] = mdp_relative_value_iteration (P, R, epsilon, max_iter)

Description

mdp_relative_value_iteration applies the relative value iteration algorithm to solve MDP with average reward. The algorithm consists in solving optimality equations iteratively.
Iterating is stopped when an epsilon-optimal policy is found or after a specified number (max_iter) of iterations is done.
This fonction uses verbose and silent modes. In verbose mode, the function displays the span of (Un+1-Un) for each iteration.

Arguments

  • P : transition probability array.
P can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS).
  • R : reward array.
R can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.
  • epsilon (optional) : search for an epsilon-optimal policy.
epsilon is a real in [0; 1].
By default, epsilon is set to 0.01.
  • max_iter (optional) : maximum number of iterations.
max_iter is an integer greater than 0.
By default, max_iter is set to 1000.

Evaluations

  • U : optimal relative value fonction.
U is a (Sx1) vector.
  • policy : optimal policy.
policy is a (Sx1) vector. Each element is an integer corresponding to an action which maximizes the value function.
  • g : gain of the optimal policy.
g is a real.
  • cpu_time : CPU time used to run the program.

Example
In grey, verbose mode display.

>> P(:,:,1) = [ 0.5 0.5;   0.8 0.2 ];
>> P(:,:,2) = [ 0 1;   0.1 0.9 ];
>> R = [ 5 10;   -1 2 ];

>> [U, policy, g, cpu_time] = mdp_relative_value_iteration(P, R)
   Iteration U_variation
        1             8
        2             3.4
        3             2.72
        4             2.176
        5             1.7408
        6             1.3926
        7             1.1141
        8             0.89129
        9             0.71303
        10             0.57043
        11             0.45634
        12             0.36507
        13             0.29206
        14             0.23365
        15             0.18692
        16             0.14953
        17             0.11963
        18             0.095701
        19             0.076561
        20             0.061249
        21             0.048999
        22             0.039199
        23             0.031359
        24             0.025088
        25             0.02007
        26             0.016056
        27             0.012845
        28             0.010276
        29             0.0082207
MDP Toolbox : iterations stopped, epsilon-optimal policy found
U =
   6.1065
   0
policy =
   2
   1
g =
   3.8852
cpu_time =
   0.1200

In the above example, P can be a cell array containing sparse matrices:
>> P{1} = sparse([ 0.5 0.5;  0.8 0.2 ]);
>> P{2} = sparse([ 0 1;  0.1 0.9 ]);
The function call is unchanged.


MDP Toolbox for MATLAB



MDPtoolbox/documentation/mdp_relative_value_iteration.html
Page created on July 31, 2001. Last update on August 31, 2009.

Contact us at files@mathworks.com