Data acquisition failures sometimes result in missing measurements
both in the input and the output signals. When you import data that
contains missing values using the MATLAB^{®} Import Wizard, these
values are automatically set to NaN
. NaN
serves
as a flag for nonexistent or undefined data. When you plot data on
a time-plot that contains missing values, gaps appear on the plot
where missing data exists.
You can use misdata
to estimate missing
values. This command linearly interpolates missing values to estimate
the first model. Then, it uses this model to estimate the missing
data as parameters by minimizing the output prediction errors obtained
from the reconstructed data. You can specify the model structure you
want to use in the misdata
argument or estimate
a default-order model using the n4sid
method. For
more information, see the misdata
reference
page.
Note:
You can only use |
For example, suppose y
and u
are
output and input signals that contain NaN
s. This
data is sampled at 0.2
s. The following syntax
creates a new iddata
object with these input and
output signals.
dat = iddata(y,u,0.2) % y and u contain NaNs % representing missing data
Apply the misdata
command to the new data
object. For example:
dat1 = misdata(dat); plot(dat,dat1) % Check how the missing data % was estimated on a time plot
Malfunctions can produce errors in measured values, called outliers. Such outliers might be caused by signal spikes or by measurement malfunctions. If you do not remove outliers from your data, this can adversely affect the estimated models.
To identify the presence of outliers, perform one of the following tasks:
Before estimating a model, plot the data on a time plot and identify values that appear out of range.
After estimating a model, plot the residuals and identify
unusually large values. For more information about plotting residuals,
see Residual Analysis. Evaluate
the original data that is responsible for large residuals. For example,
for the model Model
and validation data Data
,
you can use the following commands to plot the residuals:
% Compute the residuals E = resid(Model,Data) % Plot the residuals plot(E)
Next, try these techniques for removing or minimizing the effects of outliers:
Extract the informative data portions into segments and merge them into one multiexperiment data set (see Extract and Model Specific Data Segments). For more information about selecting and extracting data segments, see Selecting Subsets of Data.
Tip The inputs in each of the data segments must be consistently exciting the system. Splitting data into meaningful segments for steady-state data results in minimum information loss. Avoid making data segments too small. |
Manually replace outliers with NaN
s
and then use the misdata
command
to reconstruct flagged data. This approach treats outliers as missing
data and is described in Handling Missing Data. Use this method when your data
contains several inputs and outputs, and when you have difficulty
finding reliable data segments in all variables.
Remove outliers by prefiltering the data for high-frequency content because outliers often result from abrupt changes. For more information about filtering, see Filtering Data.
Note:
The estimation algorithm can handle outliers by assigning a
smaller weight to outlier data. A robust error criterion applies an
error penalty that is quadratic for small and moderate prediction
errors, and is linear for large prediction errors. Because outliers
produce large prediction errors, this approach gives a smaller weight
to the corresponding data points during model estimation. Set the |
This example shows how to create a multi-experiment, time-domain data set by merging only the accurate data segments and ignoring the rest.
Assume that the data has poor or no measurements for some sample
ranges (for example 341–499). You cannot simply concatenate
the good data segments because the transients at the connection points
compromise the model. Instead, you must create a multiexperiment iddata
object, where each experiment corresponds
to a good segment of data, as follows:
% Plot the data in a MATLAB Figure window plot(data) % Create multiexperiment data set % by merging data segments datam = merge(data(1:340),... data(500:897),... data(1001:1200),... data(1550:2000)); % Model the multiexperiment data set % using "experiments" 1, 2, and 4 m = n4sid(getexp(datam,[1,2,4])) % Validate the model by comparing its output to % the output data of experiment 3 compare(getexp(datam,3),m)
To learn more about the theory of handling missing data and outliers, see the chapter on preprocessing data in System Identification: Theory for the User, Second Edition, by Lennart Ljung, Prentice Hall PTR, 1999.