Accelerating the pace of engineering and science

# Documentation

## Recursive Algorithms for Online Estimation

### General Form of Recursive Estimation

The general form of the recursive estimation algorithm is as follows:

$\stackrel{^}{\theta }\left(t\right)=\stackrel{^}{\theta }\left(t-1\right)+K\left(t\right)\left(y\left(t\right)-\stackrel{^}{y}\left(t\right)\right)$

$\stackrel{^}{\theta }\left(t\right)$ is the parameter estimate at time t. y(t) is the observed output at time t and $\stackrel{^}{y}\left(t\right)$ is the prediction of y(t) based on observations up to time t-1. The gain, K(t), determines how much the current prediction error $y\left(t\right)-\stackrel{^}{y}\left(t\right)$ affects the update of the parameter estimate. The estimation algorithms minimize the prediction-error term $y\left(t\right)-\stackrel{^}{y}\left(t\right)$.

The gain has the following form:

$K\left(t\right)=Q\left(t\right)\psi \left(t\right)$

The recursive algorithms supported by the System Identification Toolbox™ product differ based on different approaches for choosing the form of Q(t) and computing $\psi \left(t\right)$, where $\psi \left(t\right)$ represents the gradient of the predicted model output $\stackrel{^}{y}\left(t|\theta \right)$ with respect to the parameters $\theta$.

The simplest way to visualize the role of the gradient $\psi \left(t\right)$ of the parameters, is to consider models with a linear-regression form:

$y\left(t\right)={\psi }^{T}\left(t\right){\theta }_{0}\left(t\right)+e\left(t\right)$

In this equation, $\psi \left(t\right)$ is the regression vector that is computed based on previous values of measured inputs and outputs. ${\theta }_{0}\left(t\right)$ represents the true parameters. e(t) is the noise source (innovations), which is assumed to be white noise. The specific form of $\psi \left(t\right)$ depends on the structure of the polynomial model.

For linear regression equations, the predicted output is given by the following equation:

$\stackrel{^}{y}\left(t\right)={\psi }^{T}\left(t\right)\stackrel{^}{\theta }\left(t-1\right)$

For models that do not have the linear regression form, it is not possible to compute exactly the predicted output and the gradient $\psi \left(t\right)$ for the current parameter estimate $\stackrel{^}{\theta }\left(t-1\right)$. To learn how you can compute approximation for $\psi \left(t\right)$ and $\stackrel{^}{\theta }\left(t-1\right)$ for general model structures, see the section on recursive prediction-error methods in [1].

### Types of Recursive Estimation Algorithms

The System Identification Toolbox software provides the following recursive estimation algorithms for online estimation:

The forgetting factor and Kalman Filter formulations are more computationally intensive than gradient and unnormalized gradient methods. However, they have much better convergence properties.

#### Forgetting Factor

The following set of equations summarizes the forgetting factor adaptation algorithm:

$\stackrel{^}{\theta }\left(t\right)=\stackrel{^}{\theta }\left(t-1\right)+K\left(t\right)\left(y\left(t\right)-\stackrel{^}{y}\left(t\right)\right)$

$\stackrel{^}{y}\left(t\right)={\psi }^{T}\left(t\right)\stackrel{^}{\theta }\left(t-1\right)$

$K\left(t\right)=Q\left(t\right)\psi \left(t\right)$

$Q\left(t\right)=\frac{P\left(t-1\right)}{\lambda +{\psi }^{T}\left(t\right)P\left(t-1\right)\psi \left(t\right)}$

$P\left(t\right)=\frac{1}{\lambda }\left(P\left(t-1\right)-\frac{P\left(t-1\right)\psi \left(t\right)\psi {\left(t\right)}^{T}P\left(t-1\right)}{\lambda +\psi {\left(t\right)}^{T}P\left(t-1\right)\psi \left(t\right)}\right)$

Q(t) is obtained by minimizing the following function at time t:

${\sum }_{k=1}^{t}{\lambda }^{t-k}{\left(y\left(k\right)-\stackrel{^}{y}\left(k\right)\right)}^{2}$

See section 11.2 in [1] for details.

This approach discounts old measurements exponentially such that an observation that is $\tau$ samples old carries a weight that is equal to ${\lambda }^{\tau }$ times the weight of the most recent observation. $\tau =1}{1-\lambda }$ represents the memory horizon of this algorithm. Measurements older than $\tau =1}{1-\lambda }$ typically carry a weight that is less than about 0.3.

$\lambda$ is called the forgetting factor and typically has a positive value between 0.97 and 0.995.

 Note:   The forgetting factor algorithm for $\lambda$ = 1 is equivalent to the Kalman filter algorithm with R1=0 and R2=1. For more information about the Kalman filter algorithm, see Kalman Filter.

#### Kalman Filter

The following set of equations summarizes the Kalman filter adaptation algorithm:

$\stackrel{^}{\theta }\left(t\right)=\stackrel{^}{\theta }\left(t-1\right)+K\left(t\right)\left(y\left(t\right)-\stackrel{^}{y}\left(t\right)\right)$

$\stackrel{^}{y}\left(t\right)={\psi }^{T}\left(t\right)\stackrel{^}{\theta }\left(t-1\right)$

$K\left(t\right)=Q\left(t\right)\psi \left(t\right)$

$Q\left(t\right)=\frac{P\left(t-1\right)\psi \left(t\right)}{{R}_{2}+{\psi }^{T}\left(t\right)P\left(t-1\right)\psi \left(t\right)}$

$P\left(t\right)=P\left(t-1\right)+{R}_{1}-\frac{P\left(t-1\right)\psi \left(t\right)\psi {\left(t\right)}^{T}P\left(t-1\right)}{{R}_{2}+\psi {\left(t\right)}^{T}P\left(t-1\right)\psi \left(t\right)}$

This formulation assumes the linear-regression form of the model:

$y\left(t\right)={\psi }^{T}\left(t\right){\theta }_{0}\left(t\right)+e\left(t\right)$

Q(t) is computed using a Kalman filter.

This formulation also assumes that the true parameters ${\theta }_{0}\left(t\right)$ are described by a random walk:

${\theta }_{0}\left(t\right)={\theta }_{0}\left(t-1\right)+w\left(t\right)$

w(t) is Gaussian white noise with the following covariance matrix, or drift matrix R1:

$Ew\left(t\right){w}^{T}\left(t\right)={R}_{1}$

R2 is the variance of the innovations e(t) in the following equation:

$y\left(t\right)={\psi }^{T}\left(t\right){\theta }_{0}\left(t\right)+e\left(t\right)$

The Kalman filter algorithm is entirely specified by the sequence of data y(t), the gradient $\psi \left(t\right)$, R1, R2, and the initial conditions $\theta \left(t=0\right)$ (initial guess of the parameters) and $P\left(t=0\right)$ (covariance matrix that indicates parameters errors).

 Note:   It is assumed that R1 and P(t = 0) matrices are scaled such that R2 = 1. This scaling does not affect the parameter estimates.

In the linear regression case, the gradient methods are also known as the least mean squares (LMS) methods.

$\stackrel{^}{\theta }\left(t\right)=\stackrel{^}{\theta }\left(t-1\right)+K\left(t\right)\left(y\left(t\right)-\stackrel{^}{y}\left(t\right)\right)$

$\stackrel{^}{y}\left(t\right)={\psi }^{T}\left(t\right)\stackrel{^}{\theta }\left(t-1\right)$

$K\left(t\right)=Q\left(t\right)\psi \left(t\right)$

In the unnormalized gradient approach, Q(t) is given by:

$Q\left(t\right)=\gamma *\psi \left(t\right)$

In the normalized gradient approach, Q(t) is given by:

$Q\left(t\right)=\frac{\gamma *\psi \left(t\right)}{{|\psi \left(t\right)|}^{2}}$

These choices of Q(t) update the parameters in the negative gradient direction, where the gradient is computed with respect to the parameters. See pg. 372 in [1] for details.

## References

[1] Ljung, L. System Identification: Theory for the User. Upper Saddle River, NJ: Prentice-Hall PTR, 1999.