The System Identification Toolbox™ software estimates model parameters by minimizing the error between the model output and the measured response. This error, called loss function or cost function, is a positive function of prediction errors e(t). In general, this function is a weighted sum of squares of the errors. For a model with nyoutputs, the loss function V(θ) has the following general form:
$V(\theta )=\frac{1}{N}{\displaystyle \sum}_{t=1}^{N}{e}^{T}\left(t,\theta \right)W\left(\theta \right)e\left(t,\theta \right)$
where:
N is the number of data samples.
e(t,θ) is nyby1 error vector at a given time t, parameterized by the parameter vector θ.
W(θ) is the weighting matrix, specified as a positive semidefinite matrix. If W is a diagonal matrix, you can think of it as a way to control the relative importance of outputs during multioutput estimations. When W is a fixed or known weight, it does not depend on θ.
The software determines the parameter values by minimizing V(θ) with respect to θ.
For notational convenience, V(θ) is expressed in its matrix form:
$V\left(\theta \right)=\frac{1}{N}trace\left({E}^{T}\left(\theta \right)E\left(\theta \right)W(\theta )\right)$
E(θ) is the error matrix of size Nbyny. The i:th row of E(θ) represents the error value at time t = i.
The exact form of V(θ) depends on the following factors:
You can configure the loss function for your application needs. The following estimation options, when available for the estimator, configure the loss function:
Estimation Option  Description  Notes 


NoteFor models whose noise component is trivial, (H(q) = 1), e_{p}(t), and e_{s}(t) are equivalent. The  
 When you specify a weighting filter, prefiltered prediction or simulation error is minimized: $${e}_{f}(t)=\mathcal{L}(e(t))$$ where $\mathcal{L}(.)$ is
a linear filter. The 

 When 





$V(\theta )=\frac{1}{N}\left({\displaystyle \sum}_{t\in I}{e}^{T}\left(t,\theta \right)W\left(\theta \right)e\left(t,\theta \right)+{\displaystyle \sum}_{t\in J}{v}^{T}\left(t,\theta \right)W\left(\theta \right)v\left(t,\theta \right)\right)$ where:
The error v(t,θ) is defined as: $v\left(t,\theta \right)=e\left(t,\theta \right)*\sigma \frac{\rho}{\sqrt{\lefte\left(t,\theta \right)\right}}$ 


The loss function is set up with the goal of minimizing the prediction errors. It does not include specific constraints on the variance (a measure of reliability) of estimated parameters. This can sometimes lead to models with large uncertainty in estimated model parameters, especially when the model has many parameters.
$V\left(\theta \right)=\frac{1}{N}{\displaystyle \sum}_{t=1}^{N}{e}^{T}\left(t,\theta \right)W\left(\theta \right)e\left(t,\theta \right)+\frac{1}{N}\lambda {\left(\theta {\theta}^{*}\right)}^{T}R\left(\theta {\theta}^{*}\right)$ The second term is a weighted (R) and scaled (λ) variance of the estimated parameter set θ about its nominal value θ*. 

Focus
and WeightingFilter
Options on the Loss FunctionThe Focus
option can be interpreted as a
weighting filter in the loss function. The WeightingFilter
option
is an additional custom weighting filter that is applied to the loss
function.
To understand the effect of Focus
and WeightingFilter
,
consider a linear singleinput singleoutput model:
$$y(t)=G(q,\theta )\text{}u(t)+H(q,\theta )\text{}e(t)$$
Where G(q,θ) is the measured transfer function, H(q,θ) is the noise model, and e(t) represents the additive disturbances modeled as white Gaussian noise. q is the timeshift operator.
In frequency domain, the linear model can be represented as:
$$Y(\omega )=G(\omega ,\theta )U(\omega )+H(\omega ,\theta )E(\omega )$$
The loss function to be minimized for the SISO model is given by:
$V(\theta )=\frac{1}{N}{\displaystyle \sum}_{t=1}^{N}{e}^{T}\left(t,\theta \right)e\left(t,\theta \right)$
Using Parseval’s Identity, the loss function in frequencydomain is:
$$V(\theta ,\omega )=\frac{1}{N}{\Vert E(\omega )\Vert}^{2}$$
Substituting for E(ω) gives:
$$V(\theta ,\omega )=\frac{1}{N}{\Vert \frac{Y(\omega )}{U(\omega )}G(\theta ,\omega ))\Vert}^{2}\frac{{\Vert U(\omega )\Vert}^{2}}{{\Vert H(\theta ,\omega )\Vert}^{2}}$$
Thus, you can interpret minimizing the loss function V as
fitting G(θ,ω)
to the empirical transfer function $$Y(\omega )/U(\omega )$$,
using $$\frac{{\Vert U(\omega )\Vert}^{2}}{{\Vert H(\theta ,\omega )\Vert}^{2}}$$ as
a weighting filter. This corresponds to specifying Focus
as 'prediction'
.
The estimation emphasizes frequencies where input has more power ($${\Vert U(\omega )\Vert}^{2}$$ is
greater) and deemphasizes frequencies where noise is significant
($${\Vert H(\theta ,\omega )\Vert}^{2}$$ is
large).
When Focus
is specified as 'simulation'
,
the inverse weighting with $${\Vert H(\theta ,\omega )\Vert}^{2}$$ is
not used. That is, only the input spectrum is used to weigh the relative
importance of the estimation fit in a specific frequency range.
When you specify a linear filter $\mathcal{L}$ as WeightingFilter
,
it is used as an additional custom weighting in the loss function.
$$V(\theta )=\frac{1}{{N}^{2}}{\Vert \frac{Y(\omega )}{U(\omega )}G(\theta ))\Vert}^{2}\frac{{\Vert U(\omega )\Vert}^{2}}{{\Vert H(\theta )\Vert}^{2}}{\Vert \mathcal{L}(\omega )\Vert}^{2}$$
Here $$\mathcal{L}(\omega )$$ is the frequency response of the filter. Use $$\mathcal{L}(\omega )$$ to enhance the fit of the model response to observed data in certain frequencies, such as to emphasize the fit close to system resonant frequencies.
The estimated value of inputoutput transfer function G is
the same as what you get if you instead first prefilter the estimation
data with $\mathcal{L}(.)$ using idfilt
, and then estimate the model without
specifying WeightingFilter
. However, the effect
of $\mathcal{L}(.)$ on
the estimated noise model H depends on the choice
of Focus
:
Focus
is 'prediction'
—
The software minimizes the weighted prediction error $${e}_{f}(t)=\mathcal{L}({e}_{p}(t))$$,
and the estimated model has the form:
$$y(t)=G(q)u(t)+{H}_{1}(q)e(t)$$
Where $${H}_{1}(q)=H(q)/\mathcal{L}(q)$$.
Thus, the estimation with prediction focus creates a biased estimate
of H. This is the same estimated noise model you
get if you instead first prefilter the estimation data with $\mathcal{L}(.)$ using idfilt
, and then estimate the model.
When H is parameterized independent of G, you can treat the filter $\mathcal{L}(.)$ as a way of affecting the estimation bias distribution. That is, you can shape the tradeoff between fitting G to the system frequency response and fitting $$H/\mathcal{L}$$ to the disturbance spectrum when minimizing the loss function. For more details see, section 14.4 in System Identification: Theory for the User, Second Edition, by Lennart Ljung, Prentice Hall PTR, 1999.
Focus
is 'simulation'
—
The software first estimates G by minimizing the
weighted simulation error $${e}_{f}(t)=\mathcal{L}({e}_{s}(t))$$,
where ${e}_{s}\left(t\right)={y}_{measured}\left(t\right)G(q){u}_{measured}\left(t\right)$.
Once G is estimated, the software fixes it and
computes H by minimizing pure prediction errors e(t)
using unfiltered data. The estimated model has the form:
$$y(t)=G(q)u(t)+He(t)$$
If you prefilter the data first, and then estimate the model, you get the same estimate for G but get a biased noise model $$H/\mathcal{L}$$.
Thus, the WeightingFilter
has the same effect
as prefiltering the estimation data for estimation of G.
For estimation of H, the effect of WeightingFilter
depends
upon the choice of Focus
. A prediction focus estimates
a biased version of the noise model $$H/\mathcal{L}$$,
while a simulation focus estimates H. Prefiltering
the estimation data, and then estimating the model always gives $$H/\mathcal{L}$$ as
the noise model.
After you estimate a model, use model quality metrics to assess
the quality of identified models, compare different models, and pick
the best one. The Report.Fit
property of an identified
model stores various metrics such as FitPercent
, LossFcn
, FPE
, MSE
, AIC
, nAIC
, AICc
,
and BIC
values.
FitPercent
, LossFcn
,
and MSE
are measures of the actual quantity that
is minimized during the estimation. For example, if Focus
is 'simulation'
,
these quantities are computed for the simulation error e_{s} (t).
Similarly, if you specify the WeightingFilter
option,
then LossFcn
, FPE
, and MSE
are
computed using filtered residuals e_{f} (t).
FPE
, AIC
, nAIC
, AICc
,
and BIC
measures are computed as properties of
the output disturbance according to the relationship:
$y\left(t\right)=G\left(q\right)u\left(t\right)+H\left(q\right)e(t)$
G(q) and H(q) represent the measured and noise components of the estimated model.
Regardless of how the loss function is configured, the error
vector e(t) is computed as 1step
ahead prediction error using a given model and a given dataset. This
implies that even when the model is obtained by minimizing the simulation
error e_{s} (t),
the FPE and various AIC values are still computed using the prediction
error e_{p} (t).
The actual value of e_{p} (t)
is determined using the pe
command
with prediction horizon of 1 and using the initial conditions specified
for the estimation.
These metrics contain two terms — one for describing the model accuracy and another to describe its complexity. For example, in FPE, $det\left(\frac{1}{N}{E}^{T}E\right)$ describes the model accuracy and $\frac{1+\frac{np}{N}}{1\frac{np}{N}}$ describes the model complexity.
By comparing models using these criteria, you can pick a model that gives the best (smallest criterion value) tradeoff between accuracy and complexity.
Quality Metric  Description 

 Normalized Root Mean Squared Error (NRMSE) expressed as a percentage, defined as: $FitPercent=100\left(1\frac{\Vert {y}_{measured}{y}_{model}\Vert}{\Vert {y}_{measured}\overline{{y}_{measured}}\Vert}\right)$ where:

 Value of the loss function when the estimation completes. It contains effects of error thresholds, output weight, and regularization used for estimation. 
 Mean Squared Error measure, defined as: $MSE=\frac{1}{N}{\displaystyle \sum}_{t=1}^{N}{e}^{T}\left(t\right)e\left(t\right)$ where:

 Akaike’s Final Prediction Error (FPE), defined as: $FPE=det\left(\frac{1}{N}{E}^{T}E\right)\left(\frac{1+\frac{{n}_{p}}{N}}{1\frac{{n}_{p}}{N}}\right)$ where:

 A raw measure of Akaike's Information Criterion, defined as: $AIC=N\ast log\left(det\left(\frac{1}{N}{E}^{T}E\right)\right)+2\ast {n}_{p}+N\left({n}_{y}\ast \mathrm{log}\left(2\pi \right)+1\right)$ 
 Small samplesize corrected Akaike's Information Criterion, defined as: $$AICc=AIC+2\ast {n}_{p}\ast \frac{({n}_{p}+1)}{(N{n}_{p}1)}$$ This metric is often more reliable for picking a model of optimal complexity from a list of candidate models when the data size N is small. 
 Normalized measure of Akaike's Information Criterion, defined as: $nAIC=log\left(det\left(\frac{1}{N}{E}^{T}E\right)\right)+\frac{2\ast {n}_{p}}{N}$ 
 Bayesian Information Criterion, defined as: $BIC=N\ast log\left(det\left(\frac{1}{N}{E}^{T}E\right)\right)+N\ast \left({n}_{y}\ast \mathrm{log}\left(2\pi \right)+1\right)+{n}_{p}\ast \text{log}(N)$ 
aic
 fpe
 goodnessofFit
 nparams
 pe
 predict
 sim