Code covered by the BSD License  

Highlights from
Markov Decision Processes (MDP) Toolbox

image thumbnail
from Markov Decision Processes (MDP) Toolbox by Marie-Josee Cros
Functions related to the resolution of discrete-time Markov Decision Processes.

mdp_finite_horizon.html
mdp_finite_horizon description
MDP Toolbox for MATLAB

mdp_finite_horizon

Solves finite-horizon MDP with backwards induction algorithm.

Syntax

[V, policy, cpu_time] = mdp_finite_horizon (P, R, discount, N)
[V, policy, cpu_time] = mdp_finite_horizon (P, R, discount, N, h)

Description

mdp_finite_horizon applies backwards induction algorithm for finite-horizon MDP. The optimality equations allow to recursively evaluate function values starting from the terminal stage.
This function uses verbose and silent modes. In verbose mode, the function displays the current stage and the corresponding optimal policy.

Arguments

  • P : transition probability array.
P can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS).
  • R : reward array.
R can be a 3 dimensions array (SxSxA) or a cell array (1xA), each cell containing a sparse matrix (SxS) or a 2D array (SxA) possibly sparse.
  • discount : discount factor.
discount is a real which belongs to ]0; 1].
  • N : number of stages.
N is an integer greater than 0.
  • h (optional) : terminal reward.
h is a (Sx1) vector.
By default, h = [0; 0; ... 0].

Evaluations

  • V : value fonction.
V is a (Sx(N+1)) matrix. Each column n is the optimal value fonction at stage n, with n = 1, ... N.
V(:,N+1) is the terminal reward.
  • policy : optimal policy.
policy is a (SxN) matrix. Each element is an integer corresponding to an action and each column n is the optimal policy at stage n.
  • cpu_time : CPU time used to run the program.

Example
In grey, verbose mode display.

>> P(:,:,1) = [ 0.5 0.5;   0.8 0.2 ];
>> P(:,:,2) = [ 0 1;   0.1 0.9 ];
>> R = [ 5 10;   -1 2 ];

>> [V, policy, cpu_time] = mdp_finite_horizon(P, R, 0.9, 3)
stage:3 policy transpose : 2 2
stage:2 policy transpose : 2 1
stage:1 policy transpose : 2 1
V =
   15.9040 11.8000 10.0000 0
     8.6768   6.5600   2.0000 0
policy =
   2 2 2
   1 1 2
cpu_time =
   0.0400

In the above example, P can be a cell array containing sparse matrices:
>> P{1} = sparse([ 0.5 0.5;  0.8 0.2 ]);
>> P{2} = sparse([ 0 1;  0.1 0.9 ]);
The function call is unchanged.


MDP Toolbox for MATLAB


MDPtoolbox/documentation/mdp_finite_horizon.html
Page created on July 31, 2001. Last update on August 31, 2009.

Contact us at files@mathworks.com