mdp_finite_horizon description
mdp_finite_horizon
Solves finite-horizon MDP with backwards induction algorithm.
Syntax
[V, policy, cpu_time] = mdp_finite_horizon (P, R, discount, N)
[V, policy, cpu_time] = mdp_finite_horizon (P, R, discount, N, h)
Description
mdp_finite_horizon applies backwards induction algorithm for
finite-horizon MDP. The optimality equations allow to recursively evaluate function values starting from the terminal stage.
This function uses verbose and silent modes. In verbose mode, the function
displays the current stage and the corresponding optimal policy.
Arguments
- P : transition probability array.
P can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS).
R can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.
- discount : discount factor.
discount is a real which belongs to ]0; 1].
N is an integer greater than 0.
- h (optional) : terminal reward.
h is a (Sx1) vector.
By default, h = [0; 0; ... 0].
Evaluations
V is a (Sx(N+1)) matrix.
Each column n is the optimal value fonction at stage n, with n = 1, ... N.
V(:,N+1) is the terminal reward.
policy is a (SxN) matrix. Each element is an integer corresponding to an
action and each column n is the optimal policy at stage n.
- cpu_time : CPU time used to run the program.
Example
In grey, verbose mode display.
>> P(:,:,1) = [ 0.5 0.5;   0.8 0.2 ];
>> P(:,:,2) = [ 0 1;   0.1 0.9 ];
>> R = [ 5 10;   -1 2 ];
>> [V, policy, cpu_time] = mdp_finite_horizon(P, R, 0.9, 3)
stage:3 policy transpose : 2 2
stage:2 policy transpose : 2 1
stage:1 policy transpose : 2 1
V =
   15.9040 11.8000 10.0000 0
     8.6768   6.5600   2.0000 0
policy =
   2 2 2
   1 1 2
cpu_time =
   0.0400
In the above example, P can be a cell array containing sparse matrices:
>> P{1} = sparse([ 0.5 0.5;  0.8 0.2 ]);
>> P{2} = sparse([ 0 1;  0.1 0.9 ]);
The function call is unchanged.
MDPtoolbox/documentation/mdp_finite_horizon.html
Page created on July 31, 2001. Last update on August 31, 2009.