Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

*Regularization* is the technique for specifying
constraints on the flexibility of a model, thereby reducing uncertainty
in the estimated parameter values.

Model parameters are obtained by fitting measured data to the
predicted model response, such as a transfer function with three poles
or a second-order state-space model. The model order is a measure
of its flexibility — higher the order, the greater the flexibility.
For example, a model with three poles is more flexible than one with
two poles. Increasing the order causes the model to fit the observed
data with increasing accuracy. However, the increased flexibility
comes with the price of higher uncertainty in the estimates, measured
by a higher value of random or *variance* error.
On the other hand, choosing a model with too low an order leads to
larger systematic errors. Such errors cannot be attributed to measurement
noise and are also known as *bias* error.

Ideally, the parameters of a good model should minimize the *mean
square error (MSE)*, given by a sum of systematic error
(bias) and random error (variance):

MSE = |Bias|^{2} + Variance

The minimization is thus a tradeoff in constraining the model. A flexible (high order) model gives small bias and large variance, whereas a simpler (low order) model results in larger bias and smaller variance errors. Typically, you can investigate this tradeoff between bias and variance errors by cross-validation tests on a set of models of increasing flexibility. However, such tests do not always give full control in managing the parameter estimation behavior. For example:

You cannot use the known (

*a priori*) information about the model to influence the quality of the fits.In grey-box and other structured models, the order is fixed by the underlying ODEs and cannot be changed. If the data is not rich enough to capture the full range of dynamic behavior, this typically leads to high uncertainty in the estimated values.

Varying the model order does not let you explicitly shape the variance of the underlying parameters.

Regularization gives you a better control over the bias versus
variance tradeoff by introducing an additional term in the minimization
criterion that penalizes the model flexibility. Without regularization,
for a model with one output and no weighting, the parameter estimates
are obtained by minimizing a weighted quadratic norm of the prediction
errors * ε*(

$$\begin{array}{l}{V}_{N}\left(\theta \right)=\frac{1}{N}{\displaystyle \sum _{t=1}^{N}{\epsilon}^{2}(t,\theta )}\\ \end{array}$$

where * t* is the time variable,

Regularization modifies the cost function by adding a term proportional
to the square of the norm of the parameter vector * θ*,
so that the parameters

${\widehat{V}}_{N}\left(\theta \right)=\frac{1}{N}{\displaystyle \sum _{t=1}^{N}{\epsilon}^{2}\left(t,\theta \right)+\frac{1}{N}\lambda {\Vert \theta \Vert}^{2}}$

where * λ* is a positive constant that
has the effect of trading variance error in V

_{1}-norm, known as lasso regularization.
However, System Identification Toolbox™ supports only the 2-norm
based penalty, known as L_{2} regularization,
as shown in the previous equation. |

The penalty term is made more effective by using a positive
definite matrix * R*, which allows weighting and/or
rotation of the parameter vector:

$${\widehat{V}}_{N}\left(\theta \right)=\frac{1}{N}{\displaystyle \sum _{t=1}^{N}{\epsilon}^{2}\left(t,\theta \right)+\text{}}\frac{1}{N}\lambda {\theta}^{T}R\theta $$

The square matrix * R* gives additional freedom
for:

Shaping the penalty term to meet the required constraints, such as keeping the model stable

Adding known information about the model parameters, such as reliability of the individual parameters in the

vector*θ*

For structured models such as grey-box models, you may want to keep the estimated parameters close to their guess values to maintain the physical validity of the estimated model. This can be achieved by generalizing the penalty term to $$\lambda {\left(\theta -{\theta}^{*}\right)}^{T}R\left(\theta -{\theta}^{*}\right)$$, such that the cost function becomes:

$${\widehat{V}}_{N}\left(\theta \right)=\frac{1}{N}{\displaystyle \sum _{t=1}^{N}{\epsilon}^{2}\left(t,\theta \right)+\frac{1}{N}\lambda {\left(\theta -{\theta}^{*}\right)}^{T}R\left(\theta -{\theta}^{*}\right)}$$

Minimizing this cost function has the effect of estimating * θ* such
that their values remain close to initial guesses

In regularization:

represents prior knowledge about the unknown parameters.*θ****λ*represents the confidence in the prior knowledge of the unknown parameters. This implies that the larger the value, the higher the confidence.*R*

A formal interpretation in a Bayesian setting is that θ
has a prior distribution that is Gaussian with mean θ* and covariance
matrix $${\sigma}^{2}/\lambda {R}^{-1}$$,
where σ^{2} is the variance of ε(t).
The use of regularization can therefore be linked to some prior information
about the system. This could be quite soft, such as the system is
stable.

You can use the regularization variables * λ* and

`Lambda`

, `R`

,
and `Nominal`

, respectively in the toolbox. You can
specify their values in the estimation option sets for both linear
and nonlinear models. In the System Identification app, click Use regularization for:

Imposing

*A Priori*Knowledge of Model Parameters in Structured ModelsIncorporating Knowledge of System Behavior in ARX and FIR Models

Over-parameterized models are rich in parameters. Their estimation
typically yields parameter values with a high level of uncertainty.
Over-parameterization is common for nonlinear ARX (`idnlarx`

) models and can also be for linear
state-space models using free parameterization.

In such cases, regularization improves the numerical conditioning
of the estimation. You can explore the bias-vs.-variance tradeoff
using various values of the regularization constant `Lambda`

.
Typically, the `Nominal`

option is its default value
of `0`

, and `R`

is an identity matrix
such that the following cost function is minimized:

${\widehat{V}}_{N}\left(\theta \right)=\frac{1}{N}{\displaystyle \sum _{t=1}^{N}{\epsilon}^{2}\left(t,\theta \right)+\frac{1}{N}\lambda {\Vert \theta \Vert}^{2}}$

In the following example, a nonlinear ARX model estimation using a large number of neurons leads to an ill-conditioned estimation problem.

% Load estimation data. load regularizationExampleData.mat nldata % Estimate model without regularization. Orders = [1 2 1]; NL = sigmoidnet('NumberOfUnits',30); sys = nlarx(nldata,Orders,NL); compare(nldata,sys)

Applying even a small regularizing penalty produces a good fit for the model to the data.

```
% Estimate model using regularization constant λ = 1e-8.
opt = nlarxOptions;
opt.Regularization.Lambda = 1e-8;
sysr = nlarx(nldata,Orders,NL,opt);
compare(nldata,sysr)
```

In models derived from differential equations, the parameters have physical significance. You may have a good guess for typical values of those parameters even if the reliability of the guess may be different for each parameter. Because the model structure is fixed in such cases, you cannot simplify the structure to reduce variance errors.

Using the regularization constant `Nominal`

,
you can keep the estimated values close to their initial guesses.
You can also design `R`

to reflect the confidence
in the initial guesses of the parameters. For example, if θ
is a 2-element vector and you can guess the value of the first element
with more confidence than the second one, set `R`

to
be a diagonal matrix of size 2-by-2 such that R(1,1) >> R(2,2).

In the following example, a model of a DC motor is parameterized
by static gain `G`

and time constant τ. From
prior knowledge, suppose you know that `G`

is about
4 and τ is about 1. Also, assume that you have more confidence
in the value of τ than `G`

and would like to
guide the estimation to remain close to the initial guess.

% Load estimation data. load regularizationExampleData.mat motorData % Create idgrey model for DC motor dynamics. mi = idgrey(@DCMotorODE,{'G',4;'Tau',1},'cd',{}, 0); mi = setpar(mi,'label','default'); % Configure Regularization options. opt = greyestOptions; opt.Regularization.Lambda = 100; % Specify that the second parameter better known than the first. opt.Regularization.R = [1, 1000]; % Specify initial guess as Nominal. opt.Regularization.Nominal = 'model'; % Estimate model. sys = greyest(motorData,mi,opt) getpar(sys)

In many situations, you may know the shape of the system impulse
response from impact tests. For example, it is quite common for stable
systems to have an impulse response that is smooth and exponentially
decaying. You can use such prior knowledge of system behavior to derive
good values of regularization constants for linear-in-parameter models
such as ARX and FIR structure models using the `arxRegul`

command.

For black-box models of arbitrary structure, it is often difficult
to determine the optimal values of `Lambda`

and `R`

that
yield the best bias-vs.-variance tradeoff. Therefore, it is recommended
that you start by obtaining the regularized estimate of an ARX or
FIR structure model. Then, convert the model to a state-space, transfer
function or polynomial model using the `idtf`

, `idss`

, or `idpoly`

commands,
followed by order reduction if required.

In the following example, direct estimation of a 15th order continuous-time transfer function model fails due to numerical ill-conditioning.

% Load estimation data. load dryer2 Dryer = iddata(y2,u2,0.08); Dryerd = detrend(Dryer,0); Dryerde = Dryerd(1:500); xe = Dryerd(1:500); ze = Dryerd(1:500); zv = Dryerd(501:end); % Estimate model without regularization. sys1 = tfest(ze,15);

Therefore, use regularized ARX estimation and then convert the model to transfer function structure.

% Specify regularization constants. [L, R] = arxRegul(ze,[15 15 1]); optARX = arxOptions; optARX.Regularization.Lambda = L; optARX.Regularization.R = R; % Estimate ARX model. sysARX = arx(ze,[15 15 1],optARX); % Convert model to continuous time. sysc = d2c(sysARX); % Convert model to transfer function. sys2 = idtf(sysc); % Validate the models sys1 and sys2. compare(zv,sys1,sys2)

A guideline for selecting the regularization constants λ
and `R`

is in the Bayesian interpretation. The added
penalty term is an assumption that the parameter vector θ is
a Gaussian random vector with mean θ* and covariance matrix $${\sigma}^{2}/\lambda {R}^{-1}$$.

You can relate naturally to such an assumption for a grey-box
model, where the parameters are of known physical interpretation.
In other cases, this may be more difficult. Then, you have to use
ridge regression (`R`

= 1; θ* = 0) and tune
λ by trial and error.

Use the following techniques for determining λ and `R`

values:

Tuning the regularization constants for ARX models in `arxRegul`

is based on simple assumptions
about the properties of the true impulse responses.

In the case of an FIR model, the parameter vector contains the
impulse response coefficients b_{k} for the system.
From prior knowledge of the system, it is often known that the impulse
response is smooth and exponentially decaying:

$$E{\left[{b}_{k}\right]}^{2}=C{\mu}^{k},\text{}corr\left\{{b}_{k}{b}_{k-1}\right\}=\rho $$

where *corr* means correlation. The equation
is a parameterization of the regularization constants in terms of
coefficients C, μ, and ρ and the chosen shape (decaying
polynomial) is called a *kernel*. The kernel
thus contains information about parameterization of the prior covariance
of the impulse response coefficients.

You can estimate the parameters of the kernel by adjusting them
to the measured data using the `RegulKernel`

input
of the `arxRegul`

command. For
example, the `DC`

kernel estimates all three parameters
while the `TC`

kernel links $$\rho =\sqrt{\mu}$$.
This technique of tuning kernels applies to all linear-in-parameter
models such as ARX and FIR models.

A general way to test and evaluate any regularization parameters
is to estimate a model based on certain parameters on an estimation
data set, and evaluate the model fit for another validation data set.
This is known as *cross-validation*.

Cross-validation is entirely analogous to the method for selecting model order:

Generate a list of candidate λ and

`R`

values to be tested.Estimate a model for each candidate regularization constant set.

Compare the model fit to the validation data.

Use the constants that give the best fit to the validation data.

For example:

% Create estimation and validation data sets. ze = z(1:N/2); zv = z(N/2:end); % Specify regularization options and estimate models. opt = ssestOptions; for tests = 1:M opt.Regularization.Lambda = Lvalue(test); opt.Regularization.R = Rvalue(test); m{test} = ssest(ze,order,opt); end % Compare models with validation data for model fit. [~,fit] = compare(zv,m{:))

[1] L. Ljung. "Some Classical and Some
New Ideas for Identification of Linear Systems." *Journal
of Control, Automation and Electrical Systems.* April 2013,
Volume 24, Issue 1-2, pp 3-10.

[2] L. Ljung, and T. Chen. "What can regularization
offer for estimation of dynamical systems?" *In Proceedings
of IFAC International Workshop on Adaptation and Learning in Control
and Signal Processing*, ALCOSP13, Caen, France, July 2013.

[3] L. Ljung, and T. Chen. "Convexity issues in system
identification." *In Proceedings of the 10th IEEE
International Conference on Control & Automation*,
ICCA 2013, Hangzhou, China, June 2013.

- Estimate Regularized ARX Model Using System Identification App
- Regularized Identification of Dynamic Systems

Was this topic helpful?