# createMDP

Create Markov decision process model

## Syntax

``MDP = createMDP(states,actions)``

## Description

example

````MDP = createMDP(states,actions)` creates a Markov decision process model with the specified states and actions.```

## Examples

collapse all

Create an MDP model with eight states and two possible actions.

`MDP = createMDP(8,["up";"down"]);`

Specify the state transitions and their associated rewards.

```% State 1 Transition and Reward MDP.T(1,2,1) = 1; MDP.R(1,2,1) = 3; MDP.T(1,3,2) = 1; MDP.R(1,3,2) = 1; % State 2 Transition and Reward MDP.T(2,4,1) = 1; MDP.R(2,4,1) = 2; MDP.T(2,5,2) = 1; MDP.R(2,5,2) = 1; % State 3 Transition and Reward MDP.T(3,5,1) = 1; MDP.R(3,5,1) = 2; MDP.T(3,6,2) = 1; MDP.R(3,6,2) = 4; % State 4 Transition and Reward MDP.T(4,7,1) = 1; MDP.R(4,7,1) = 3; MDP.T(4,8,2) = 1; MDP.R(4,8,2) = 2; % State 5 Transition and Reward MDP.T(5,7,1) = 1; MDP.R(5,7,1) = 1; MDP.T(5,8,2) = 1; MDP.R(5,8,2) = 9; % State 6 Transition and Reward MDP.T(6,7,1) = 1; MDP.R(6,7,1) = 5; MDP.T(6,8,2) = 1; MDP.R(6,8,2) = 1; % State 7 Transition and Reward MDP.T(7,7,1) = 1; MDP.R(7,7,1) = 0; MDP.T(7,7,2) = 1; MDP.R(7,7,2) = 0; % State 8 Transition and Reward MDP.T(8,8,1) = 1; MDP.R(8,8,1) = 0; MDP.T(8,8,2) = 1; MDP.R(8,8,2) = 0;```

Specify the terminal states of the model.

`MDP.TerminalStates = ["s7";"s8"];`

## Input Arguments

collapse all

Model states, specified as one of the following:

• Positive integer — Specify the number of model states. In this case, each state has a default name, such as `"s1"` for the first state.

• String vector — Specify the state names. In this case, the total number of states is equal to the length of the vector.

Model actions, specified as one of the following:

• Positive integer — Specify the number of model actions. In this case, each action has a default name, such as `"a1"` for the first action.

• String vector — Specify the action names. In this case, the total number of actions is equal to the length of the vector.

## Output Arguments

collapse all

MDP model, returned as a `GenericMDP` object with the following properties.

Name of the current state, specified as a string.

State names, specified as a string vector with length equal to the number of states.

Action names, specified as a string vector with length equal to the number of actions.

State transition matrix, specified as a 3-D array, which determines the possible movements of the agent in an environment. State transition matrix `T` is a probability matrix that indicates how likely the agent will move from the current state `s` to any possible next state `s'` by performing action `a`. `T` is an S-by-S-by-A array, where S is the number of states and A is the number of actions. It is given by:

The sum of the transition probabilities out from a nonterminal state `s` following a given action must sum up to one. Therefore, all stochastic transitions out of a given state must be specified at the same time.

For example, to indicate that in state `1` following action `4` there is an equal probability of moving to states `2` or `3`, use the following:

`MDP.T(1,[2 3],4) = [0.5 0.5];`

You can also specify that, following an action, there is some probability of remaining in the same state. For example:

`MDP.T(1,[1 2 3 4],1) = [0.25 0.25 0.25 0.25];`

Reward transition matrix, specified as a 3-D array, which determines how much reward the agent receives after performing an action in the environment. `R` has the same shape and size as state transition matrix `T`. The reward for moving from state `s` to state `s'` by performing action `a` is given by:

Terminal state names in the grid world, specified as a string vector of state names.