Code covered by the BSD License  

Highlights from
Sutton's Mountain Car Problem with Value Iteration

image thumbnail

Sutton's Mountain Car Problem with Value Iteration

by

 

Implementation of Sutton's mountain car problem using value iteration.

traceBack(predecessorP, predecessorV, ...
function [XStar, UStar, TStar] = traceBack(predecessorP, predecessorV, ...
    policy, x0, gridSize)

% This function will trace back the predecessor and the policy matrix for
% the given inital state. 

P_MIN = -1.2;
P_MAX = 0.5;
V_MIN = -0.07;
V_MAX = 0.07;

gridPos = gridSize(1);
gridVel = gridSize(2);

p = x0(1);
v = x0(2);

index = 1;

% Start from initial condition
pIdx = snapToGrid(p, P_MIN, P_MAX, gridPos);
vIdx = snapToGrid(v, V_MIN, V_MAX, gridVel);

while(1)
    XStar(index, :) = [p v];
    UStar(index) = policy(pIdx, vIdx);
    
    pPredIdx = predecessorP(pIdx, vIdx);
    vPredIdx = predecessorV(pIdx, vIdx);
    
    if (pPredIdx == gridPos + 1)
        XStar(index, :) = [p v];
        UStar(index) = policy(pIdx, vIdx);
    
        break;
    end
    
    % Convert index back to actual value
    p = (pPredIdx - 1) / gridPos * (P_MAX - P_MIN) + P_MIN;  
    v = (vPredIdx - 1) / gridVel * (V_MAX - V_MIN) + V_MIN;
    
    index = index + 1;
    
    pIdx = pPredIdx;
    vIdx = vPredIdx;
    
end

TStar = index - 1;

Contact us