# Documentation

## Algorithms for Recursive Estimation

### Types of Recursive Estimation Algorithms

You can choose from the following four recursive estimation algorithms:

You specify the type of recursive estimation algorithms as arguments in the recursive estimation commands.

For detailed information about these algorithms, see the corresponding chapter in System Identification: Theory for the User by Lennart Ljung (Prentice Hall PTR, Upper Saddle River, NJ, 1999).

### General Form of Recursive Estimation Algorithm

The general recursive identification algorithm is given by the following equation:

$\stackrel{^}{\theta }\left(t\right)=\stackrel{^}{\theta }\left(t-1\right)+K\left(t\right)\left(y\left(t\right)-\stackrel{^}{y}\left(t\right)\right)$

$\stackrel{^}{\theta }\left(t\right)$ is the parameter estimate at time t. y(t) is the observed output at time t and $\stackrel{^}{y}\left(t\right)$ is the prediction of y(t) based on observations up to time t-1. The gain, K(t), determines how much the current prediction error $y\left(t\right)-\stackrel{^}{y}\left(t\right)$ affects the update of the parameter estimate. The estimation algorithms minimize the prediction-error term $y\left(t\right)-\stackrel{^}{y}\left(t\right)$.

The gain has the following general form:

$K\left(t\right)=Q\left(t\right)\psi \left(t\right)$

The recursive algorithms supported by the System Identification Toolbox™ product differ based on different approaches for choosing the form of Q(t) and computing $\psi \left(t\right)$, where $\psi \left(t\right)$ represents the gradient of the predicted model output $\stackrel{^}{y}\left(t|\theta \right)$ with respect to the parameters $\theta$.

The simplest way to visualize the role of the gradient $\psi \left(t\right)$ of the parameters, is to consider models with a linear-regression form:

$y\left(t\right)={\psi }^{T}\left(t\right){\theta }_{0}\left(t\right)+e\left(t\right)$

In this equation, $\psi \left(t\right)$ is the regression vector that is computed based on previous values of measured inputs and outputs. ${\theta }_{0}\left(t\right)$ represents the true parameters. e(t) is the noise source (innovations), which is assumed to be white noise. The specific form of $\psi \left(t\right)$ depends on the structure of the polynomial model.

For linear regression equations, the predicted output is given by the following equation:

$\stackrel{^}{y}\left(t\right)={\psi }^{T}\left(t\right)\stackrel{^}{\theta }\left(t-1\right)$

For models that do not have the linear regression form, it is not possible to compute exactly the predicted output and the gradient $\psi \left(t\right)$ for the current parameter estimate $\stackrel{^}{\theta }\left(t-1\right)$. To learn how you can compute approximation for $\psi \left(t\right)$ and $\stackrel{^}{\theta }\left(t-1\right)$ for general model structures, see the section on recursive prediction-error methods in System Identification: Theory for the User by Lennart Ljung (Prentice Hall PTR, Upper Saddle River, NJ, 1999).

### Kalman Filter Algorithm

#### Mathematics of the Kalman Filter Algorithm

The following set of equations summarizes the Kalman filter adaptation algorithm:

$\stackrel{^}{\theta }\left(t\right)=\stackrel{^}{\theta }\left(t-1\right)+K\left(t\right)\left(y\left(t\right)-\stackrel{^}{y}\left(t\right)\right)$

$\stackrel{^}{y}\left(t\right)={\psi }^{T}\left(t\right)\stackrel{^}{\theta }\left(t-1\right)$

$K\left(t\right)=Q\left(t\right)\psi \left(t\right)$

$Q\left(t\right)=\frac{P\left(t-1\right)\psi \left(t\right)}{{R}_{2}+{\psi }^{T}\left(t\right)P\left(t-1\right)\psi \left(t\right)}$

$P\left(t\right)=P\left(t-1\right)+{R}_{1}-\frac{P\left(t-1\right)\psi \left(t\right)\psi {\left(t\right)}^{T}P\left(t-1\right)}{{R}_{2}+\psi {\left(t\right)}^{T}P\left(t-1\right)\psi \left(t\right)}$

This formulation assumes the linear-regression form of the model:

$y\left(t\right)={\psi }^{T}\left(t\right){\theta }_{0}\left(t\right)+e\left(t\right)$

The Kalman filter is used to obtain Q(t).

This formulation also assumes that the true parameters ${\theta }_{0}\left(t\right)$ are described by a random walk:

${\theta }_{0}\left(t\right)={\theta }_{0}\left(t-1\right)+w\left(t\right)$

w(t) is Gaussian white noise with the following covariance matrix, or drift matrix R1:

$Ew\left(t\right){w}^{T}\left(t\right)={R}_{1}$

R2 is the variance of the innovations e(t) in the following equation:

$y\left(t\right)={\psi }^{T}\left(t\right){\theta }_{0}\left(t\right)+e\left(t\right)$

The Kalman filter algorithm is entirely specified by the sequence of data y(t), the gradient $\psi \left(t\right)$, R1, R2, and the initial conditions $\theta \left(t=0\right)$ (initial guess of the parameters) and $P\left(t=0\right)$ (covariance matrix that indicates parameters errors).

 Note:   To simplify the inputs, you can scale R1, R2, and $P\left(t=0\right)$ of the original problem by the same value such that R2 is equal to 1. This scaling does not affect the parameters estimates.

#### Using the Kalman Filter Algorithm

The general syntax for the command described in Algorithms for Recursive Estimation is the following:

To specify the Kalman filter algorithm, set adm to 'kf' and adg to the value of the drift matrix R1 (described in Mathematics of the Kalman Filter Algorithm).

### Forgetting Factor Algorithm

#### Mathematics of the Forgetting Factor Algorithm

The following set of equations summarizes the forgetting factor adaptation algorithm:

$\stackrel{^}{\theta }\left(t\right)=\stackrel{^}{\theta }\left(t-1\right)+K\left(t\right)\left(y\left(t\right)-\stackrel{^}{y}\left(t\right)\right)$

$\stackrel{^}{y}\left(t\right)={\psi }^{T}\left(t\right)\stackrel{^}{\theta }\left(t-1\right)$

$K\left(t\right)=Q\left(t\right)\psi \left(t\right)$

$Q\left(t\right)=\frac{P\left(t-1\right)}{\lambda +{\psi }^{T}\left(t\right)P\left(t-1\right)\psi \left(t\right)}$

$P\left(t\right)=\frac{1}{\lambda }\left(P\left(t-1\right)-\frac{P\left(t-1\right)\psi \left(t\right)\psi {\left(t\right)}^{T}P\left(t-1\right)}{\lambda +\psi {\left(t\right)}^{T}P\left(t-1\right)\psi \left(t\right)}\right)$

To obtain Q(t), the following function is minimized at time t:

${\sum }_{k=1}^{t}{\lambda }^{t-k}{\left(y\left(k\right)-\stackrel{^}{y}\left(k\right)\right)}^{2}$

This approach discounts old measurements exponentially such that an observation that is $\tau$ samples old carries a weight that is equal to ${\lambda }^{\tau }$ times the weight of the most recent observation. $\tau =1}{1-\lambda }$ represents the memory horizon of this algorithm. Measurements older than $\tau =1}{1-\lambda }$ typically carry a weight that is less than about 0.3.

$\lambda$ is called the forgetting factor and typically has a positive value between 0.97 and 0.995.

 Note:   In the linear regression case, the forgetting factor algorithm is known as the recursive least-squares (RLS) algorithm. The forgetting factor algorithm for $\lambda$ = 1 is equivalent to the Kalman filter algorithm with R1=0 and R2=1. For more information about the Kalman filter algorithm, see Kalman Filter Algorithm.

#### Using the Forgetting Factor Algorithm

The general syntax for the command described in Algorithms for Recursive Estimation is the following:

To specify the forgetting factor algorithm, set adm to 'ff' and adg to the value of the forgetting factor $\lambda$ (described in Mathematics of the Forgetting Factor Algorithm).

 Tip   $\lambda$ typically has a positive value from 0.97 to 0.995.

### Unnormalized and Normalized Gradient Algorithms

#### Mathematics of the Unnormalized and Normalized Gradient Algorithm

In the linear regression case, the gradient methods are also known as the least mean squares (LMS) methods.

$\stackrel{^}{\theta }\left(t\right)=\stackrel{^}{\theta }\left(t-1\right)+K\left(t\right)\left(y\left(t\right)-\stackrel{^}{y}\left(t\right)\right)$

$\stackrel{^}{y}\left(t\right)={\psi }^{T}\left(t\right)\stackrel{^}{\theta }\left(t-1\right)$

$K\left(t\right)=Q\left(t\right)\psi \left(t\right)$

In the unnormalized gradient approach, Q(t) is the product of the gain $\gamma$ and the identity matrix:

$Q\left(t\right)=\gamma *\psi \left(t\right)$

In the normalized gradient approach, Q(t) is the product of the gain $\gamma$, and the identity matrix is normalized by the magnitude of the gradient $\psi \left(t\right)$:

$Q\left(t\right)=\frac{\gamma *\psi \left(t\right)}{{|\psi \left(t\right)|}^{2}}$

These choices of Q(t) update the parameters in the negative gradient direction, where the gradient is computed with respect to the parameters.

#### Using the Unnormalized and Normalized Gradient Algorithms

The general syntax for the command described in Algorithms for Recursive Estimation is the following:

To specify the unnormalized gain algorithm, set adm to 'ug' and adg to the value of the gain$\gamma$ (described in Mathematics of the Unnormalized and Normalized Gradient Algorithm).
To specify the normalized gain algorithm, set adm to 'ng' and adg to the value of the gain$\gamma$.