The System Identification Toolbox™ software estimates model parameters by minimizing the error between the model output and the measured response. This error, called loss function or cost function, is a positive function of prediction errors e(t). In general, this function is a weighted sum of squares of the errors. For a model with ny-outputs, the loss function V(θ) has the following general form:
N is the number of data samples.
e(t,θ) is ny-by-1 error vector at a given time t, parameterized by the parameter vector θ.
W(θ) is the weighting matrix, specified as a positive semidefinite matrix. If W is a diagonal matrix, you can think of it as a way to control the relative importance of outputs during multi-output estimations. When W is a fixed or known weight, it does not depend on θ.
The software determines the parameter values by minimizing V(θ) with respect to θ.
For notational convenience, V(θ) is expressed in its matrix form:
E(θ) is the error matrix of size N-by-ny. The i:th row of E(θ) represents the error value at time t = i.
The exact form of V(θ) depends on the following factors:
You can configure the loss function for your application needs. The following estimation options, when available for the estimator, configure the loss function:
For models whose noise component is trivial, (H(q) = 1), ep(t), and es(t) are equivalent.
When you specify a weighting filter, prefiltered prediction or simulation error is minimized:
a linear filter. The
The error v(t,θ) is defined as:
The loss function is set up with the goal of minimizing the prediction errors. It does not include specific constraints on the variance (a measure of reliability) of estimated parameters. This can sometimes lead to models with large uncertainty in estimated model parameters, especially when the model has many parameters.
The second term is a weighted (R) and scaled (λ) variance of the estimated parameter set θ about its nominal value θ*.
WeightingFilterOptions on the Loss Function
Focus option can be interpreted as a
weighting filter in the loss function. The
is an additional custom weighting filter that is applied to the loss
To understand the effect of
consider a linear single-input single-output model:
Where G(q,θ) is the measured transfer function, H(q,θ) is the noise model, and e(t) represents the additive disturbances modeled as white Gaussian noise. q is the time-shift operator.
In frequency domain, the linear model can be represented as:
The loss function to be minimized for the SISO model is given by:
Using Parseval’s Identity, the loss function in frequency-domain is:
Substituting for E(ω) gives:
Thus, you can interpret minimizing the loss function V as
to the empirical transfer function ,
a weighting filter. This corresponds to specifying
The estimation emphasizes frequencies where input has more power ( is
greater) and de-emphasizes frequencies where noise is significant
Focus is specified as
the inverse weighting with is
not used. That is, only the input spectrum is used to weigh the relative
importance of the estimation fit in a specific frequency range.
When you specify a linear filter as
it is used as an additional custom weighting in the loss function.
Here is the frequency response of the filter. Use to enhance the fit of the model response to observed data in certain frequencies, such as to emphasize the fit close to system resonant frequencies.
The estimated value of input-output transfer function G is
the same as what you get if you instead first prefilter the estimation
data with using
idfilt, and then estimate the model without
WeightingFilter. However, the effect
the estimated noise model H depends on the choice
The software minimizes the weighted prediction error ,
and the estimated model has the form:
Thus, the estimation with prediction focus creates a biased estimate
of H. This is the same estimated noise model you
get if you instead first prefilter the estimation data with using
idfilt, and then estimate the model.
When H is parameterized independent of G, you can treat the filter as a way of affecting the estimation bias distribution. That is, you can shape the trade-off between fitting G to the system frequency response and fitting to the disturbance spectrum when minimizing the loss function. For more details see, section 14.4 in System Identification: Theory for the User, Second Edition, by Lennart Ljung, Prentice Hall PTR, 1999.
The software first estimates G by minimizing the
weighted simulation error ,
Once G is estimated, the software fixes it and
computes H by minimizing pure prediction errors e(t)
using unfiltered data. The estimated model has the form:
If you prefilter the data first, and then estimate the model, you get the same estimate for G but get a biased noise model .
WeightingFilter has the same effect
as prefiltering the estimation data for estimation of G.
For estimation of H, the effect of
upon the choice of
Focus. A prediction focus estimates
a biased version of the noise model ,
while a simulation focus estimates H. Prefiltering
the estimation data, and then estimating the model always gives as
the noise model.
After you estimate a model, use model quality metrics to assess
the quality of identified models, compare different models, and pick
the best one. The
Report.Fit property of an identified
model stores various metrics such as
MSE are measures of the actual quantity that
is minimized during the estimation. For example, if
these quantities are computed for the simulation error es (t).
Similarly, if you specify the
computed using filtered residuals ef (t).
BIC measures are computed as properties of
the output disturbance according to the relationship:
G(q) and H(q) represent the measured and noise components of the estimated model.
Regardless of how the loss function is configured, the error
vector e(t) is computed as 1-step
ahead prediction error using a given model and a given dataset. This
implies that even when the model is obtained by minimizing the simulation
error es (t),
the FPE and various AIC values are still computed using the prediction
error ep (t).
The actual value of ep (t)
is determined using the
with prediction horizon of 1 and using the initial conditions specified
for the estimation.
These metrics contain two terms — one for describing the model accuracy and another to describe its complexity. For example, in FPE, describes the model accuracy and describes the model complexity.
By comparing models using these criteria, you can pick a model that gives the best (smallest criterion value) trade-off between accuracy and complexity.
Normalized Root Mean Squared Error (NRMSE) expressed as a percentage, defined as:
Value of the loss function when the estimation completes. It contains effects of error thresholds, output weight, and regularization used for estimation.
Mean Squared Error measure, defined as:
Akaike’s Final Prediction Error (FPE), defined as:
A raw measure of Akaike's Information Criterion, defined as:
Small sample-size corrected Akaike's Information Criterion, defined as:
This metric is often more reliable for picking a model of optimal complexity from a list of candidate models when the data size N is small.
Normalized measure of Akaike's Information Criterion, defined as:
Bayesian Information Criterion, defined as: