Analyzing Data Quality

Is Your Data Ready for Modeling?

Before you start estimating models from data, you should check your data for the presence of any undesirable characteristics. For example, you might plot the data to identify drifts and outliers. You plot analysis might lead you to preprocess your data before model estimation.

The following data plots are available in the toolbox:

  • Time plot — Shows data values as a function of time.

      Tip   You can infer time delays from time plots, which are required inputs to most parametric models. A time delay is the time interval between the change in input and the corresponding change in output.

  • Spectral plot — Shows a periodogram that is computed by taking the absolute squares of the Fourier transforms of the data, dividing by the number of data points, and multiplying by the sampling interval.

  • Frequency-response plot — For frequency-response data, shows the amplitude and phase of the frequency-response function on a Bode plot. For time- and frequency-domain data, shows the empirical transfer function estimate (see etfe) .

See Also

How to Analyze Data Using the advice Command

Ways to Prepare Data for System Identification

Plotting Data in the App Versus at the Command Line

The plots you create using the System Identification app provide options that are specific to the System Identification Toolbox™ product, such as selecting specific channel pairs in a multivariate signals or converting frequency units between Hertz and radians per second. For more information, see How to Plot Data in the App.

The plots you create using the plot commands, such as plot, and bode are displayed in the standard MATLAB® Figure window, which provides options for formatting, saving, printing, and exporting plots to a variety of file formats. To learn about plotting at the command line, see How to Plot Data at the Command Line. For more information about working with Figure window, see Graphics.

How to Plot Data in the App

How to Plot Data in the App

After importing data into the System Identification app, as described in Represent Data, you can plot the data.

To create one or more plots, select the corresponding check box in the Data Views area of the System Identification app.

An active data icon has a thick line in the icon, while an inactive data set has a thin line. Only active data sets appear on the selected plots. To toggle including and excluding data on a plot, click the corresponding icon in the System Identification app. Clicking the data icon updates any plots that are currently open.

When you have several data sets, you can view different input-output channel pair by selecting that pair from the Channel menu. For more information about selecting different input and output pairs, see Selecting Measured and Noise Channels in Plots.

In this example, data and dataff are active and appear on the three selected plots.

To close a plot, clear the corresponding check box in the System Identification app.

    Tip   To get information about working with a specific plot, select a help topic from the Help menu in the plot window.

Manipulating a Time Plot

The Time plot only shows time-domain data. In this example, data1 is displayed on the time plot because, of the three data sets, it is the only one that contains time-domain input and output.

Time Plot of data1

The following table summarizes options that are specific to time plots, which you can select from the plot window menus. For general information about working with System Identification Toolbox plots, see Working with Plots.

Time Plot Options

ActionCommand

Toggle input display between piece-wise continuous (zero-order hold) and linear interpolation (first-order hold) between samples.

    Note:   This option only affects the display and not the intersample behavior specified when importing the data.

Select Style > Staircase input for zero-order hold or Style > Regular input for first-order hold.

Manipulating Data Spectra Plot

The Data spectra plot shows a periodogram or a spectral estimate of data1 and data3fd.

The periodogram is computed by taking the absolute squares of the Fourier transforms of the data, dividing by the number of data points, and multiplying by the sampling interval. The spectral estimate for time-domain data is a smoothed spectrum calculated using spa. For frequency-domain data, the Data spectra plot shows the square of the absolute value of the actual data, normalized by the sampling interval.

The top axes show the input and the bottom axes show the output. The vertical axis of each plot is labeled with the corresponding channel name.

Periodograms of data1 and data3fd

Data Spectra Plot Options

ActionCommand

Toggle display between periodogram and spectral estimate.

Select Options > Periodogram or Options > Spectral analysis.

Change frequency units.

Select Style > Frequency (rad/s) or Style > Frequency (Hz).

Toggle frequency scale between linear and logarithmic.

Select Style > Linear frequency scale or Style > Log frequency scale.

Toggle amplitude scale between linear and logarithmic.

Select Style > Linear amplitude scale or Style > Log amplitude scale.

Manipulating a Frequency Function Plot

For time-domain data, the Frequency function plot shows the empirical transfer function estimate (etfe). For frequency-domain data, the plot shows the ratio of output to input data.

The frequency-response plot shows the amplitude and phase plots of the corresponding frequency response. For more information about frequency-response data, see Frequency-Response Data Representation.

Frequency Functions of data1 and data3fd

Frequency Function Plot Options

ActionCommand

Change frequency units.

Select Style > Frequency (rad/s) or Style > Frequency (Hz).

Toggle frequency scale between linear and logarithmic.

Select Style > Linear frequency scale or Style > Log frequency scale.

Toggle amplitude scale between linear and logarithmic.

Select Style > Linear amplitude scale or Style > Log amplitude scale.

How to Plot Data at the Command Line

The following table summarizes the commands available for plotting time-domain, frequency-domain, and frequency-response data.

Commands for Plotting Data

CommandDescriptionExample
bode, bodeplot

For frequency-response data only. Shows the magnitude and phase of the frequency response on a logarithmic frequency scale of a Bode plot.

To plot idfrd data:

bode(idfrd_data)
or:
bodeplot(idfrd_data)
plot

The type of plot corresponds to the type of data. For example, plotting time-domain data generates a time plot, and plotting frequency-response data generates a frequency-response plot.

When plotting time- or frequency-domain inputs and outputs, the top axes show the output and the bottom axes show the input.

To plot iddata or idfrd data:

plot(data)

All plot commands display the data in the standard MATLAB Figure window. For more information about working with the Figure window, see Graphics.

To plot portions of the data, you can subreference specific samples (see Select Data Channels, I/O Data and Experiments in iddata Objects and Select I/O Channels and Data in idfrd Objects. For example:

plot(data(1:300))

For time-domain data, to plot only the input data as a function of time, use the following syntax:

plot(data(:,[],:)

When data.intersample = 'zoh', the input is piece-wise constant between sampling points on the plot. For more information about properties, see the iddata reference page.

You can generate plots of the input data in the time domain using:

plot(data.SamplingInstants,data.u)

To plot frequency-domain data, you can use the following syntax:

semilogx(data.Frequency,abs(data.u))

When you specify to plot a multivariable iddata object, each input-output combination is displayed one at a time in the same MATLAB Figure window. You must press Enter to update the Figure window and view the next channel combination. To cancel the plotting operation, press Ctrl+C.

    Tip   To plot specific input and output channels, use plot(data(:,ky,ku)), where ky and ku are specific output and input channel indexes or names. For more information about subreferencing channels, see Subreferencing Data Channels.

To plot several iddata sets d1,...,dN, use plot(d1,...,dN). Input-output channels with the same experiment name, input name, and output name are always plotted in the same plot.

How to Analyze Data Using the advice Command

You can use the advice command to analyze time- or frequency- domain data before estimating a model. The resulting report informs you about the possible need to preprocess the data and identifies potential restrictions on the model accuracy. You should use these recommendations in combination with plotting the data and validating the models estimated from this data.

    Note:   advice does not support frequency-response data.

Before applying the advice command to your data, you must have represented your data as an iddata object. For more information, see Representing Time- and Frequency-Domain Data Using iddata Objects.

If you are using the System Identification app, you must export your data to the MATLAB workspace before you can use the advice command on this data. For more information about exporting data, see Exporting Models from the App to the MATLAB Workspace.

Use the following syntax to get advice about an iddata object data:

advice(data)

For more information about the advice syntax, see the advice reference page.

Advice provide guidance for these kinds of questions:

  • Does it make sense to remove constant offsets and linear trends from the data?

  • What are the excitation levels of the signals and how does this affects the model orders?

  • Is there an indication of output feedback in the data? When feedback is present in the system, only prediction-error methods work well for estimating closed-loop data.

  • Is there an indication of nonlinearity in the process that generated the data?

See Also

advice

delayest

detrend

feedback

pexcit

Was this topic helpful?