Products & Services Solutions Academia Support User Community Company

Learn more about System Identification Toolbox   

Handling Missing Data and Outliers

Handling Missing Data

Data acquisition failures sometimes result in missing measurements both in the input and the output signals. When you import data that contains missing values using the MATLAB Import Wizard, these values are automatically set to NaN ("Not-a-Number"). NaN serves as a flag for nonexistent or undefined data. When you plot data on a time-plot that contains missing values, gaps appear on the plot where missing data exists.

You can use misdata to estimate missing values. This command linearly interpolates missing values to estimate the first model. Then, it uses this model to estimate the missing data as parameters by minimizing the output prediction errors obtained from the reconstructed data. You can specify the model structure you want to use in the misdata argument or estimate a default-order model using the n4sid method. For more information, see the misdata reference page.

For example, suppose y and u are output and input signals that contain NaNs. This data is sampled at 0.2 s. The following syntax creates a new iddata object with these input and output signals.

dat = iddata(y,u,0.2) % y and u contain NaNs 
                      % representing missing data

Apply the misdata command to the new data object. For example:

dat1 = misdata(dat);
plot(dat,dat1)        % Check how the missing data
                      % was estimated on a time plot

Handling Outliers

Malfunctions can produce errors in measured values, called outliers. Such outliers might be caused by signal spikes or by measurement malfunctions. If you do not remove outliers from your data, this can adversely affect the estimated models.

To identify the presence of outliers, perform one of the following tasks:

% Compute the residuals
  E = resid(Model,Data)
% Plot the residuals
  plot(E)

Next, try these techniques for removing or minimizing the effects of outliers:

Example – Extracting and Modeling Specific Data Segments

The following example shows how to create a multiexperiment, time-domain data set by merging only the accurate-data segments and ignoring the rest. Modeling multiexperiment data sets produces an average model for the different experiments.

You cannot simply concatenate the good data segments because the transients at the connection points compromise the model. Instead, you must create a multiexperiment iddata object, where each experiment corresponds to a good segment of data, as follows:

% Plot the data in a MATLAB Figure window
plot(data)

% Create multiexperiment data set
% by merging data segments
  datam = merge(data(1:340),...
                data(500:897),...
                data(1001:1200),...
                data(1550:2000));

% Model the multiexperiment data set
% using "experiments" 1, 2, and 4
m =pem(getexp(datam,[1,2,4]))

% Validate the model by comparing its output to
% the output data of experiment 3
compare(getexp(datam,3),m)

See Also

To learn more about the theory of handling missing data and outliers, see the chapter on preprocessing data in System Identification: Theory for the User, Second Edition, by Lennart Ljung, Prentice Hall PTR, 1999.

  


Recommended Products

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.

 © 1984-2009- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS