Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

The general form of the recursive estimation algorithm is as follows:

$$\widehat{\theta}\left(t\right)=\widehat{\theta}\left(t-1\right)+K\left(t\right)\left(y\left(t\right)-\widehat{y}\left(t\right)\right)$$

$$\widehat{\theta}\left(t\right)$$ is the parameter estimate at
time *t*. *y(t)* is the observed
output at time *t* and $$\widehat{y}\left(t\right)$$ is the prediction of *y(t)* based
on observations up to time *t-1*. The gain, *K(t)*,
determines how much the current prediction error $$y\left(t\right)-\widehat{y}\left(t\right)$$ affects the update of the parameter
estimate. The estimation algorithms minimize the prediction-error
term $$y\left(t\right)-\widehat{y}\left(t\right)$$.

The gain has the following form:

$$K\left(t\right)=Q\left(t\right)\psi \left(t\right)$$

The recursive algorithms supported by the System
Identification Toolbox™ product
differ based on different approaches for choosing the form of *Q(t)* and
computing $$\psi \left(t\right)$$, where $$\psi \left(t\right)$$ represents the gradient of the
predicted model output $$\widehat{y}\left(t|\theta \right)$$ with respect to
the parameters $$\theta $$.

The simplest way to visualize the role of the gradient $$\psi \left(t\right)$$ of the parameters, is to consider models with a linear-regression form:

$$y\left(t\right)={\psi}^{T}\left(t\right){\theta}_{0}\left(t\right)+e\left(t\right)$$

In this equation, $$\psi \left(t\right)$$ is the *regression
vector* that is computed based on previous values of measured
inputs and outputs. $${\theta}_{0}\left(t\right)$$ represents the
true parameters. *e(t)* is the noise source (*innovations*),
which is assumed to be white noise. The specific form of $$\psi \left(t\right)$$ depends on the structure of
the polynomial model.

For linear regression equations, the predicted output is given by the following equation:

$$\widehat{y}\left(t\right)={\psi}^{T}\left(t\right)\widehat{\theta}\left(t-1\right)$$

For models that do not have the linear regression form, it is not possible to compute exactly the predicted output and the gradient $$\psi \left(t\right)$$ for the current parameter estimate $$\widehat{\theta}\left(t-1\right)$$. To learn how you can compute approximation for $$\psi \left(t\right)$$ and $$\widehat{\theta}\left(t-1\right)$$ for general model structures, see the section on recursive prediction-error methods in [1].

The System Identification Toolbox software provides the following recursive estimation algorithms for online estimation:

The forgetting factor and Kalman Filter formulations are more computationally intensive than gradient and unnormalized gradient methods. However, they typically have better convergence properties.

The following set of equations summarizes the *forgetting
factor* adaptation algorithm:

$$\widehat{\theta}\left(t\right)=\widehat{\theta}\left(t-1\right)+K\left(t\right)\left(y\left(t\right)-\widehat{y}\left(t\right)\right)$$

$$\widehat{y}\left(t\right)={\psi}^{T}\left(t\right)\widehat{\theta}\left(t-1\right)$$

$$K\left(t\right)=Q\left(t\right)\psi \left(t\right)$$

$$Q\left(t\right)=\frac{P\left(t-1\right)}{\lambda +{\psi}^{T}\left(t\right)P\left(t-1\right)\psi \left(t\right)}$$

$$P\left(t\right)={\scriptscriptstyle \frac{1}{\lambda}}\left(P\left(t-1\right)-\frac{P\left(t-1\right)\psi \left(t\right)\psi {\left(t\right)}^{T}P\left(t-1\right)}{\lambda +\psi {\left(t\right)}^{T}P\left(t-1\right)\psi \left(t\right)}\right)$$

The software ensures *P(t)* is a positive-definite
matrix by using a square-root algorithm to update it [2].
The software computes `P`

assuming that the residuals
(difference between estimated and measured outputs) are white noise,
and the variance of these residuals is 1. *R _{2}*

`/2`

* `P`

is
approximately equal to the covariance matrix of the estimated parameters,
where *Q(t)* is obtained by minimizing the following
function at time *t*:

$$\sum}_{k=1}^{t}{\lambda}^{t-k}{(y(k)-\widehat{y}(k))}^{2$$

See section 11.2 in [1] for details.

This approach discounts old measurements exponentially such
that an observation that is $$\tau $$ samples
old carries a weight that is equal to $${\lambda}^{\tau}$$ times the weight of the most
recent observation. $$\tau ={\scriptscriptstyle \raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$1-\lambda $}\right.}$$ represents the *memory
horizon* of this algorithm. Measurements older than $$\tau ={\scriptscriptstyle \raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$1-\lambda $}\right.}$$ typically carry a weight that
is less than about 0.3.

$$\lambda $$ is called the forgetting factor
and typically has a positive value between `0.98`

and `0.995`

.
Set $$\lambda =1$$ to
estimate time-invariant (constant) parameters. Set $$\lambda <1$$ to
estimate time-varying parameters.

The forgetting factor algorithm for $$\lambda $$ = 1 is equivalent to the Kalman
filter algorithm with *R*_{1}=0
and *R*_{2}=1. For more information
about the Kalman filter algorithm, see Kalman Filter.

The following set of equations summarizes the *Kalman
filter* adaptation algorithm:

$$\widehat{\theta}\left(t\right)=\widehat{\theta}\left(t-1\right)+K\left(t\right)\left(y\left(t\right)-\widehat{y}\left(t\right)\right)$$

$$\widehat{y}\left(t\right)={\psi}^{T}\left(t\right)\widehat{\theta}\left(t-1\right)$$

$$K\left(t\right)=Q\left(t\right)\psi \left(t\right)$$

$$Q\left(t\right)=\frac{P\left(t-1\right)}{{R}_{2}+{\psi}^{T}\left(t\right)P\left(t-1\right)\psi \left(t\right)}$$

$$P\left(t\right)=P\left(t-1\right)+{R}_{1}-\frac{P\left(t-1\right)\psi \left(t\right)\psi {\left(t\right)}^{T}P\left(t-1\right)}{{R}_{2}+\psi {\left(t\right)}^{T}P\left(t-1\right)\psi \left(t\right)}$$

The software ensures *P(t)* is a positive-definite
matrix by using a square-root algorithm to update it [2].
The software computes `P`

assuming that the residuals
(difference between estimated and measured outputs) are white noise,
and the variance of these residuals is 1. *R _{2}**

`P`

is
the covariance matrix of the estimated parameters, and This formulation assumes the linear-regression form of the model:

$$y\left(t\right)={\psi}^{T}\left(t\right){\theta}_{0}\left(t\right)+e\left(t\right)$$

*Q(t)* is computed using a Kalman filter.

This formulation also assumes that the true parameters $${\theta}_{0}\left(t\right)$$ are described by a random walk:

$${\theta}_{0}\left(t\right)={\theta}_{0}\left(t-1\right)+w\left(t\right)$$

*w(t)* is Gaussian white noise with the following
covariance matrix, or *drift matrix* *R*_{1}:

$$Ew\left(t\right){w}^{T}\left(t\right)={R}_{1}$$

*R*_{2} is the variance
of the innovations *e(t)* in the following equation:

$$y\left(t\right)={\psi}^{T}\left(t\right){\theta}_{0}\left(t\right)+e\left(t\right)$$

The Kalman filter algorithm is entirely specified by the sequence
of data *y(t)*, the gradient $$\psi \left(t\right)$$, *R*_{1}, *R*_{2},
and the initial conditions $$\theta \left(t=0\right)$$ (initial
guess of the parameters) and $$P\left(t=0\right)$$ (covariance
matrix that indicates parameters errors).

It is assumed that *R*_{1} and *P*(t
= 0) matrices are scaled such that *R*_{2} =
1. This scaling does not affect the parameter estimates.

In the linear regression case, the gradient methods are also
known as the *least mean squares* (LMS) methods.

The following set of equations summarizes the *unnormalized
gradient* and *normalized gradient* adaptation
algorithm:

$$\widehat{y}\left(t\right)={\psi}^{T}\left(t\right)\widehat{\theta}\left(t-1\right)$$

$$K\left(t\right)=Q\left(t\right)\psi \left(t\right)$$

In the unnormalized gradient approach, *Q(t)* is
given by:

$$Q(t)=\gamma $$

In the normalized gradient approach, *Q(t)* is
given by:

$$Q\left(t\right)=\frac{\gamma}{{\left|\psi \left(t\right)\right|}^{2}+Bias}$$

The normalized gradient algorithm scales the adaptation gain, *γ*,
at each step by the square of the two-norm of the gradient vector.
If the gradient is close to zero, this can cause jumps in the estimated
parameters. To prevent these jumps, a bias term is introduced in the
scaling factor.

These choices of *Q(t)* for the gradient
algorithms update the parameters in the negative gradient direction,
where the gradient is computed with respect to the parameters. See
pg. 372 in [1] for details.

[1] Ljung, L. *System Identification:
Theory for the User*. Upper Saddle River, NJ: Prentice-Hall
PTR, 1999.

[2] Carlson, N.A. "Fast triangular formulation
of the square root filter." *AIAA Journal*, Vol.
11, Number 9, 1973, pp. 1259-1265.

Recursive Least Squares Estimator | Recursive Polynomial Model Estimator | `recursiveAR`

| `recursiveARMA`

| `recursiveARMAX`

| `recursiveARX`

| `recursiveBJ`

| `recursiveLS`

| `recursiveOE`

Was this topic helpful?