| Curve Fitting Toolbox™ | ![]() |
| On this page… |
|---|
You import data sets into Curve Fitting Tool with the Data Sets pane of the Data GUI. Using this pane, you can
Select workspace variables that compose a data set
Display a list of all imported data sets
View, delete, or rename one or more data sets
The Data Sets pane is shown below followed by a description of its features.

Import workspace vectors — All selected variables must be the same length. You can import only vectors, not matrices or scalars. Infs and NaNs are ignored because you cannot fit data containing these values, and only the real part of a complex number is used. To perform any curve-fitting task, you must select at least one vector of data:
X data — Select the predictor data.
Y data — Select the response data.
Weights — Select the weights associated with the response data. If weights are not imported, they are assumed to be 1 for all data points.
Preview — The selected workspace vectors are displayed graphically in the preview window. Weights are not displayed.
Data set name — The name of the imported data set. The toolbox automatically creates a unique name for each imported data set. You can change the name by editing this field. Click the Create data set button to complete the data import process.
Data sets — Lists all data sets added to Curve Fitting Tool. The data sets can be created from workspace variables, or from smoothing an existing imported data set. When you select a data set, you can perform these actions:
Click View to open the View Data Set GUI. Using this GUI, you can view a single data set both graphically and numerically. Additionally, you can display data points to be excluded in a fit by selecting an exclusion rule.
Click Rename to change the name of a single data set.
Click Delete to delete one or more data sets. To select multiple data sets, you can use the Ctrl key and the mouse to select data sets one by one, or you can use the Shift key and the mouse to select a range of data sets.
This example imports the ENSO data set into the Curve Fitting Tool using the Data Sets pane of the Data GUI. The first step is to load the data from the file enso.mat into the MATLAB® workspace.
load enso
The workspace contains two new variables, pressure and month:
pressure is the monthly averaged atmospheric pressure differences between Easter Island and Darwin, Australia. This difference drives the trade winds in the southern hemisphere.
month is the relative time in months.
Alternatively, you can import data by specifying the variable names as arguments to the cftool function.
cftool(month,pressure)
In this case, the Data GUI is not opened.
The data import process is described below:
The predictor and response data are displayed graphically in the Preview window. Weights and data points containing Infs or NaNs are not displayed.
You should specify a meaningful name when you import multiple data sets. If you do not specify a name, the default name, which is constructed from the selected variable names, is used.
Click the Create data set button.
The Data sets list box displays all the data sets added to the toolbox. Note that you can construct data sets from workspace variables, or by smoothing an existing data set.
If your data contains Infs or complex values, a warning message like this appears.

After you click the Create data set window.
The Data Sets pane shown below displays the imported ENSO data in the Preview button, the data set enso is added to the Data sets list box. You can then view, rename, or delete enso by selecting it in the list box and clicking the appropriate button.

After you import a data set, it is automatically displayed as a scatter plot in Curve Fitting Tool. The response data is plotted on the vertical axis and the predictor data is plotted on the horizontal axis.
The scatter plot is a powerful tool because it allows you to view the entire data set at once, and it can easily display a wide range of relationships between the two variables. You should examine the data carefully to determine whether preprocessing is required, or to deduce a reasonable fitting approach. For example, it's typically very easy to identify outliers in a scatter plot, and to determine whether you should fit the data with a straight line, a periodic function, a sum of Gaussians, and so on.
Enhancing the Graphical Display. Curve Fitting Toolbox™ software provides several tools for enhancing the graphical display of a data set. These tools are available through the Tools menu, the GUI toolbar, and right-click menus.
You can zoom in or out, turn on or off the grid, and so on using the Tools menu and the GUI toolbar shown below.

You can change the color, line width, line style, and marker type of the displayed data points using the right-click menu shown below. You activate this menu by placing your mouse over a data point and right-clicking. Note that a similar menu is available for fitted curves.

The ENSO data is shown below after the display has been enhanced using several of these tools.

You can view the numerical values of a data set, as well as data points to be excluded from subsequent fits, with the View Data Set GUI. You open this GUI by selecting a name in the Data sets list box of the Data GUI and clicking the View button.
The View Data Set GUI for the ENSO data set is shown below, followed by a description of its features.

Data set — Lists the names of the viewed data set and the associated variables. The data is displayed graphically below this list.
The index, predictor data (X), response data (Y), and weights (if imported) are displayed numerically in the table. If the data contains Infs or NaNs, those values are labeled "ignored." If the data contains complex numbers, only the real part is displayed.
Exclusion rules — Lists all the exclusion rules that are compatible with the viewed data set. When you select an exclusion rule, the data points marked for exclusion are grayed in the table, and are identified with an "x" in the graphical display. To exclude the data points while fitting, you must create the exclusion rule in the Exclude GUI and select the exclusion rule in the Fitting GUI.
An exclusion rule is compatible with the viewed data set if their lengths are the same, or if it is created by sectioning only.
If your data is noisy, you might need to apply a smoothing algorithm to expose its features, and to provide a reasonable starting approach for parametric fitting. The two basic assumptions that underlie smoothing are
The relationship between the response data and the predictor data is smooth.
The smoothing process results in a smoothed value that is a better estimate of the original value because the noise has been reduced.
The smoothing process attempts to estimate the average of the distribution of each response value. The estimation is based on a specified number of neighboring response values.
You can think of smoothing as a local fit because a new response value is created for each original response value. Therefore, smoothing is similar to some of the nonparametric fit types supported by the toolbox, such as smoothing spline and cubic interpolation. However, this type of fitting is not the same as parametric fitting, which results in a global parameterization of the data.
Note You should not fit data with a parametric model after smoothing, because the act of smoothing invalidates the assumption that the errors are normally distributed. Instead, you should consider smoothing to be a data exploration technique. |
There are two common types of smoothing methods: filtering (averaging) and local regression. Each smoothing method requires a span. The span defines a window of neighboring points to include in the smoothing calculation for each data point. This window moves across the data set as the smoothed response value is calculated for each predictor value. A large span increases the smoothness but decreases the resolution of the smoothed data set, while a small span decreases the smoothness but increases the resolution of the smoothed data set. The optimal span value depends on your data set and the smoothing method, and usually requires some experimentation to find.
Curve Fitting Toolbox software supports these smoothing methods:
Moving average filtering — Lowpass filter that takes the average of neighboring data points.
Lowess and loess — Locally weighted scatter plot smooth. These methods use linear least-squares fitting, and a first-degree polynomial (lowess) or a second-degree polynomial (loess). Robust lowess and loess methods that are resistant to outliers are also available.
Savitzky-Golay filtering — A generalized moving average where you derive the filter coefficients by performing an unweighted linear least-squares fit using a polynomial of the specified degree.
Note that you can also smooth data using a smoothing spline. Refer to Nonparametric Fitting for more information.
You smooth data with the Smooth pane of the Data GUI. The pane is shown below followed by a description of its features.

Original data set — Select the data set you want to smooth.
Smoothed data set — Specify the name of the smoothed data set. Note that the process of smoothing the original data set always produces a new data set containing smoothed response values.
Method — Select the smoothing method. Each response value is replaced with a smoothed value that is calculated by the specified smoothing method.
Moving average — Filter the data by calculating an average.
Lowess — Locally weighted scatter plot smooth using linear least-squares fitting and a first-degree polynomial.
Loess — Locally weighted scatter plot smooth using linear least-squares fitting and a second-degree polynomial.
Savitzky-Golay — Filter the data with an unweighted linear least-squares fit using a polynomial of the specified degree.
Robust Lowess — Lowess method that is resistant to outliers.
Robust Loess — Loess method that is resistant to outliers.
Span — The number of data points used to compute each smoothed value.
For the moving average and Savitzky-Golay methods, the span must be odd. For all locally weighted smoothing methods, if the span is less than 1, it is interpreted as the percentage of the total number of data points.
Degree — The degree of the polynomial used in the Savitzky-Golay method. The degree must be smaller than the span.
Smoothed data sets — Lists all the smoothed data sets. You add a smoothed data set to the list by clicking the Create smoothed data set button. When you select a data set from the list, you can perform these actions:
Click View to open the View Data Set GUI. Using this GUI, you can view a single data set both graphically and numerically. Additionally, you can display data points to be excluded in a fit by selecting an exclusion rule.
Click Rename to change the name of a single data set.
Click Delete to delete one or more data sets. To select multiple data sets, you can use the Ctrl key and the mouse to select data sets one by one, or you can use the Shift key and the mouse to select a range of data sets.
Click Save to workspace to save a single data set to a structure.
This example smooths the ENSO data set using the moving average, lowess, loess, and Savitzky-Golay methods with the default span. As shown below, the data appears noisy. Smoothing might help you visualize patterns in the data, and provide insight toward a reasonable approach for parametric fitting.

The Smooth pane shown below displays all the new data sets generated by smoothing the original ENSO data set. Whenever you smooth a data set, a new data set of smoothed values is created. The smoothed data sets are automatically displayed in Curve Fitting Tool. You can also display a single data set graphically and numerically by clicking the View button.

Use the Plotting GUI to display only the data sets of interest. As shown below, the periodic structure of the ENSO data set becomes apparent when it is smoothed using a moving average filter with the default span. Not surprisingly, the uncovered structure is periodic, which suggests that a reasonable parametric model should include trigonometric functions.

Saving the Results. By clicking the Save to workspace button, you can save a smoothed data set as a structure to the MATLAB workspace. This example saves the moving average results contained in the enso (ma) data set.

The saved structure contains the original predictor data x and the smoothed data y.
smootheddata1
smootheddata1 =
x: [168x1 double]
y: [168x1 double]If there is justification, you might want to exclude part of a data set from a fit. Typically, you exclude data so that subsequent fits are not adversely affected. For example, if you are fitting a parametric model to measured data that has been corrupted by a faulty sensor, the resulting fit coefficients will be inaccurate.
Curve Fitting Toolbox software provides two methods to exclude data:
Marking Outliers — Outliers are defined as individual data points that you exclude because they are inconsistent with the statistical nature of the bulk of the data.
Sectioning — Sectioning excludes a window of response or predictor data. For example, if many data points in a data set are corrupted by large systematic errors, you might want to section them out of the fit.
For each of these methods, you must create an exclusion rule, which captures the range, domain, or index of the data points to be excluded.
To exclude data while fitting, you use the Fitting GUI to associate the appropriate exclusion rule with the data set to be fit. Refer to Example: Robust Fitting for more information about fitting a data set using an exclusion rule.
You mark data to be excluded from a fit with the Exclude GUI, which you open from Curve Fitting Tool. The GUI is shown below followed by a description of its features.

Exclusion rule name — Specify the name of the exclusion rule that identifies the data points to be excluded from subsequent fits.
Existing exclusion rules — Lists the names of all exclusion rules created during the current session. When you select an existing exclusion rule, you can perform these actions:
Click Copy to copy the exclusion rule. The exclusions associated with the original exclusion rule are recreated in the GUI. You can modify these exclusions and then click Create exclusion rule to save them to the copied rule.
Click Rename to change the name of the exclusion rule.
Click Delete to delete the exclusion rule. To select multiple exclusion rules, you can use the Ctrl key and the mouse to select exclusion rules one by one, or you can use the Shift key and the mouse to select a range of exclusion rules.
Click View to display the exclusion rule graphically. If a data set is associated with the exclusion rule, the data is also displayed.
Select data set — Select the data set from which data points will be marked as excluded. You must select a data set to exclude individual data points.
Exclude graphically — Open a GUI that allows you to exclude individual data points graphically.
Individually excluded data points are marked by an "x" in the GUI, and are automatically identified in the Check to exclude point table.
Check to exclude point — Select individual data points to exclude. You can sort this table by clicking on any of the column headings.
Section — Specify data to be excluded. You do not need to select a data set to create an exclusion rule by sectioning.
Exclude X — Specify beginning and ending intervals in the predictor data to be excluded.
Exclude Y — Specify beginning and ending intervals in the response data to be excluded.
Outliers are defined as individual data points that you exclude from a fit because they are inconsistent with the statistical nature of the bulk of the data, and will adversely affect the fit results. Outliers are often readily identified by a scatter plot of response data versus predictor data.
Marking outliers with Curve Fitting Tool follows these rules:
You must specify a data set before creating an exclusion rule.
In general, you should use the exclusion rule only with the specific data set it was based on. However, the toolbox does not prevent you from using the exclusion rule with another data set provided the size is the same.
Using the Exclude GUI, you can exclude outliers either graphically or numerically.
As described in Parametric Fitting, one of the basic assumptions underlying curve fitting is that the data is statistical in nature and is described by a particular distribution, which is often assumed to be Gaussian. The statistical nature of the data implies that it contains random variations along with a deterministic component.
data = deterministic component + random component
However, your data set might contain one or more data points that are non-statistical in nature, or are described by a different statistical distribution. These data points might be easy to identify, or they might be buried in the data and difficult to identify.
A non-statistical process can involve the measurement of a physical variable such as temperature or voltage in which the random variation is negligible compared to the systematic errors. For example, if your sensor calibration is inaccurate, the data measured with that sensor will be systematically inaccurate. In some cases, you might be able to quantify this non-statistical data component and correct the data accordingly. However, if the scatter plot reveals that a handful of response values are far removed from neighboring response values, these data points are considered outliers and should be excluded from the fit. Outliers are usually difficult to explain away. For example, it might be that your sensor experienced a power surge or someone wrote down the wrong number in a log book.
If you decide there is justification, you should mark outliers to be excluded from subsequent fits—particularly parametric fits. Removing these data points can have a dramatic effect on the fit results because the fitting process minimizes the square of the residuals. If you do not exclude outliers, the resulting fit will be poor for a large portion of your data. Conversely, if you do exclude the outliers and choose the appropriate model, the fit results should be reasonable.
Because outliers can have a significant effect on a fit, they are considered influential data. However, not all influential data points are outliers. For example, your data set can contain valid data points that are far removed from the rest of the data. The data is valid because it is well described by the model used in the fit. The data is influential because its exclusion will dramatically affect the fit results.
Two types of influential data points are shown below for generated data. Also shown are cubic polynomial fits and a robust fit that is resistant to outliers.

Plot (a) shows that the two influential data points are outliers and adversely affect the fit. Plot (b) shows that the two influential data points are consistent with the model and do not adversely affect the fit. Plot (c) shows that a robust fitting procedure is an acceptable alternative to marking outliers for exclusion.
Sectioning involves specifying response or predictor data to exclude. You might want to section a data set because different parts of the data set are described by different models or are corrupted by noise, large systematic errors, and so on.
Sectioning data with Curve Fitting Tool follows these rules:
If you are only sectioning data and not excluding individual data points, then you can create an exclusion rule without specifying a data set name.
You can associate an exclusion rule with any data set provided that the exclusion rule overlaps with the data. This is useful if you have multiple data sets from which you want to exclude data points using the same rule.
Use the Exclude GUI to create the exclusion rule.
You can exclude vertical strips at the edges of the data, horizontal strips at the edges of the data, or a border around the data. Refer to Example: Excluding and Sectioning Data for an example.
To exclude multiple sections of data, you can use the excludedata function from the MATLAB command line.
Two examples of sectioning by domain are shown below for generated data.

The upper shows the data set sectioned by fit type. The section to the left of 4 is fit with a linear polynomial, as shown by the bold, dashed line. The section to the right of 4 is fit with a cubic polynomial, as shown by the bold, solid line.
The lower plot shows the data set sectioned by fit type and by valid data. Here, the right-most section is not part of any fit because the data is corrupted by noise.
Note For illustrative purposes, the preceding figures have been enhanced to show portions of the curves with bold markers. Curve Fitting Toolbox software does not use bold markers in plots. |
This example modifies the ENSO data set to illustrate excluding and sectioning data. First, copy the ENSO response data to a new variable and add two outliers that are far removed from the bulk of the data.
yy = pressure; yy(ceil(length(month)*rand(1))) = mean(pressure)*2.5; yy(ceil(length(month)*rand(1))) = mean(pressure)*3.0;
Import the variables month and yy as the new data set enso1, and open the Exclude GUI.
Assume that the first and last eight months of the data set are unreliable, and should be excluded from subsequent fits. The simplest way to exclude these data points is to section the predictor data. To do this, specify the data you want to exclude in the Exclude Sections field of the Exclude GUI.

There are two ways to exclude individual data points: using the Check to exclude point table or graphically. For this example, the simplest way to exclude the outliers is graphically. To do this, select the data set name and click the Exclude graphically button, which opens the Select Points for Exclusion Rule GUI.

To mark data points for exclusion in the GUI, place the mouse cursor over the data point and left-click. The excluded data point is marked with a red x. To include an excluded data point, right-click the data point or select the Includes Them radio button and left-click. Included data points are marked with a blue circle. To select multiple data points, click the left mouse button and drag the selection rubber band so that the rubber band box encompasses the desired data points. Note that the GUI identifies sectioned data with gray strips. You cannot graphically include sectioned data.
As shown below, the first and last eight months of data are excluded from the data set by sectioning, and the two outliers are excluded graphically. Note that the graphically excluded data points are identified in the Check to exclude point table. If you decide to include an excluded data point using the table, the graph is automatically updated.

If there are fits associated with the data, you can exclude data points based on the residuals of the fit by selecting the residual data in the Y list.
The Exclude GUI for this example is shown below.

To save the exclusion rule, click the Create exclusion rule button. To exclude the data from a fit, you must select the exclusion rule from the Fitting GUI. Because the exclusion rule created in this example uses individually excluded data points, you can use it only with data sets that are the same size as the ENSO data set.
Viewing the Exclusion Rule. To view the exclusion rule, select an existing exclusion rule name and click the View button.
The View Exclusion Rule GUI shown below displays the modified ENSO data set and the excluded data points, which are grayed in the table.

Although Curve Fitting Toolbox software ignores Infs and NaNs when fitting data, and you can exclude outliers during the fitting process, you might still want to remove this data from your data set. To do so, you modify the associated data set variables from the MATLAB command line.
For example, when using toolbox functions such as fit from the command line, you must supply predictor and response vectors that contain finite numbers. To remove Infs, you can use the isinf function.
ind = find(isinf(xx)); xx(ind) = []; yy(ind) = [];
To remove NaNs, you can use the isnan function. For examples that remove NaNs and outliers from a data set, refer to Missing Data in the MATLAB documentation.
![]() | Interactive Curve Fitting | Fitting Data | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |