| Curve Fitting Toolbox™ | ![]() |
| On this page… |
|---|
You fit data using the Fitting GUI. To open the Fitting GUI, click the Fitting button from Curve Fitting Tool.
The Fitting GUI is shown below for the census data described in Getting Started, followed by the general steps you use when fitting any data set.

Select a data set and fit name.
Select the name of the current fit. When you click New fit or Copy fit, a default fit name is automatically created in the Fit name field. You can specify a new fit name by editing this field.
Select the name of the current data set from the Data set list. All imported and smoothed data sets are listed.
If you want to exclude data from a fit, select an exclusion rule from the Exclusion rule list. The list contains only exclusion rules that are compatible with the current data set. An exclusion rule is compatible with the current data set if their lengths are identical, or if it is created by sectioning only.
Select a fit type and fit options, fit the data, and evaluate the goodness of fit.
The fit type can be a library or custom parametric model, a smoothing spline, or an interpolant.
Select fit options such as the fitting algorithm, and coefficient starting points and constraints. Depending on your data and model, accepting the default fit options often produces an excellent fit.
Fit the data by clicking the Apply button or by selecting the Immediate apply check box.
Examine the fitted curve, residuals, goodness of fit statistics, confidence bounds, and prediction bounds for the current fit.
Compare the current fit and data set to previous fits and data sets by examining the goodness of fit statistics.
Use the Table Options GUI to modify which goodness of fit statistics are displayed in the Table of Fits. You can sort the table by clicking on any column heading.
If the fit is good, save the results as a structure to the MATLAB workspace. Otherwise, modify the fit options or select another model.
Parametric fitting involves finding coefficients (parameters) for one or more models that you fit to data. The data is assumed to be statistical in nature and is divided into two components: a deterministic component and a random component.
data = deterministic component + random component
The deterministic component is given by a parametric model and the random component is often described as error associated with the data.
data = model + error
The model is a function of the independent (predictor) variable and one or more coefficients. The error represents random variations in the data that follow a specific probability distribution (usually Gaussian). The variations can come from many different sources, but are always present at some level when you are dealing with measured data. Systematic variations can also exist, but they can lead to a fitted model that does not represent the data well.
The model coefficients often have physical significance. For example, suppose you have collected data that corresponds to a single decay mode of a radioactive nuclide, and you want to estimate the half-life (T1/2) of the decay. The law of radioactive decay states that the activity of a radioactive substance decays exponentially in time. Therefore, the model to use in the fit is given by
![]()
where y0 is the number of nuclei at time t = 0, and λ is the decay constant. The data can be described by
![]()
Both y0 and λ are coefficients that are estimated by the fit. Because T1/2 = ln(2)/λ, the fitted value of the decay constant yields the fitted half-life. However, because the data contains some error, the deterministic component of the equation cannot be determined exactly from the data. Therefore, the coefficients and half-life calculation will have some uncertainty associated with them. If the uncertainty is acceptable, then you are done fitting the data. If the uncertainty is not acceptable, then you might have to take steps to reduce it either by collecting more data or by reducing measurement error and collecting new data and repeating the model fit.
In other situations where there is no theory to dictate a model, you might also modify the model by adding or removing terms, or substitute an entirely different model.
Curve Fitting Toolbox parametric library models are described below.
Exponentials. The toolbox provides a one-term and a two-term exponential model.

Exponentials are often used when the rate of change of a quantity is proportional to the initial amount of the quantity. If the coefficient associated with e is negative, y represents exponential decay. If the coefficient is positive, y represents exponential growth.
For example, a single radioactive decay mode of a nuclide is described by a one-term exponential. a is interpreted as the initial number of nuclei, b is the decay constant, x is time, and y is the number of remaining nuclei after a specific amount of time passes. If two decay modes exist, then you must use the two-term exponential model. For each additional decay mode, you add another exponential term to the model.
Examples of exponential growth include contagious diseases for which a cure is unavailable, and biological populations whose growth is uninhibited by predation, environmental factors, and so on.
Fourier Series. The Fourier series is a sum of sine and cosine functions that is used to describe a periodic signal. It is represented in either the trigonometric form or the exponential form. The toolbox provides the trigonometric Fourier series form shown below,

where a0 models
a constant (intercept) term in the data and is associated with the i =
0 cosine term, w is the fundamental frequency
of the signal, n is the number of terms (harmonics)
in the series, and
.
For more information about the Fourier series, refer to Fourier Transforms in the MATLAB documentation.
Gaussian. The Gaussian model is used for fitting peaks, and is given by the equation

where a is the amplitude, b is
the centroid (location), c is related to the
peak width, n is the number of peaks to fit,
and
.
Gaussian peaks are encountered in many areas of science and engineering. For example, line emission spectra and chemical concentration assays can be described by Gaussian peaks.
Polynomials. Polynomial models are given by

where n + 1 is the order of
the polynomial, n is the degree of
the polynomial, and
. The order gives the number of coefficients to
be fit, and the degree gives the highest power of the predictor variable.
In this guide, polynomials are described in terms of their degree. For example, a third-degree (cubic) polynomial is given by
![]()
Polynomials are often used when a simple empirical model is required. The model can be used for interpolation or extrapolation, or it can be used to characterize data using a global fit. For example, the temperature-to-voltage conversion for a Type J thermocouple in the 0o to 760o temperature range is described by a seventh-degree polynomial.
Note If you do not require a global parametric fit and want to maximize the flexibility of the fit, piecewise polynomials might provide the best approach. Refer to Nonparametric Fitting for more information. |
The main advantages of polynomial fits include reasonable flexibility for data that is not too complicated, and they are linear, which means the fitting process is simple. The main disadvantage is that high-degree fits can become unstable. Additionally, polynomials of any degree can provide a good fit within the data range, but can diverge wildly outside that range. Therefore, you should exercise caution when extrapolating with polynomials. Refer to Determining the Best Fit for examples of good and poor polynomial fits to census data.
Note that when you fit with high-degree polynomials, the fitting procedure uses the predictor values as the basis for a matrix with very large values, which can result in scaling problems. To deal with this, you should normalize the data by centering it at zero mean and scaling it to unit standard deviation. You normalize data by selecting the Center and scale X data check box on the Fitting GUI.
Power Series. The toolbox provides a one-term and a two-term power series model.

Power series models are used to describe a variety of data. For example, the rate at which reactants are consumed in a chemical reaction is generally proportional to the concentration of the reactant raised to some power.
Rationals. Rational models are defined as ratios of polynomials and are given by

where n is the degree of the numerator polynomial and
, while m is
the degree of the denominator polynomial and
. Note that the
coefficient associated with
is always 1. This makes the numerator and denominator
unique when the polynomial degrees are the same.
In this guide, rationals are described in terms of the degree of the numerator/the degree of the denominator. For example, a quadratic/cubic rational equation is given by

Like polynomials, rationals are often used when a simple empirical model is required. The main advantage of rationals is their flexibility with data that has complicated structure. The main disadvantage is that they become unstable when the denominator is around zero. For an example that uses rational polynomials of various degrees, refer to Example: Rational Fit.
Sum of Sines. The sum of sines model is used for fitting periodic functions, and is given by the equation

where a is the amplitude, b is
the frequency, and c is the phase constant for
each sine wave term. n is the number of terms
in the series and
. This equation is closely related to the Fourier
series described previously. The main difference is that the sum of
sines equation includes the phase constant, and does not include a
constant (intercept) term.
Weibull Distribution. The Weibull distribution is widely used in reliability and life (failure rate) data analysis. The toolbox provides the two-parameter Weibull distribution
![]()
where a is the scale parameter and b is the shape parameter. Note that there is also a three-parameter Weibull distribution with x replaced by x – c where c is the location parameter. Additionally, there is a one-parameter Weibull distribution where the shape parameter is fixed and only the scale parameter is fitted. To use these distributions, you must create a custom equation.
Curve Fitting Toolbox software does not fit Weibull probability distributions to a sample of data. Instead, it fits curves to response and predictor data such that the curve has the same shape as a Weibull distribution.
Custom Models vs. Library Models. If the toolbox library does not contain a desired parametric equation, you can create your own custom equation. Library models, however, offer the best chance for rapid convergence. This is because:
For most library models, optimal default coefficient starting points are calculated. For custom models, the default starting points are chosen at random on the interval [0,1].
Library models use an analytic Jacobian; custom models use finite differencing.
When using the Analysis GUI, library models use analytic derivatives and integrals if the integral can be expressed in closed form; custom models use numerical approximations.
Creating Custom Models. Create custom equations with the New Custom Equation GUI. Open the GUI in one of two ways:
From Curve Fitting Tool, select Tools > Custom Equation.
From the Fitting GUI, select Custom Equations from the Type of fit list, then click the New button.
The GUI contains two panes: one for creating linear custom equations and one for creating general (nonlinear) custom equations.
Linear Equations.Linear models are linear combinations of (perhaps nonlinear) terms. They are defined by equations that are linear in the parameters. Use the Linear Equations pane on the New Custom Equation GUI to create custom linear equations. Interface controls are described below.

Independent variable — Symbol representing the independent (predictor) variable. The default symbol is x.
Equation — Symbol representing the dependent (response) variable, followed by the linear equation. The default symbol is y.
Unknown Coefficients — The unknown coefficients to be determined by the fit. The default symbols are a, b, c, and so on.
Terms — Functions of the independent variable. These may be nonlinear. Terms may not contain a coefficient to be fitted.
Unknown constant coefficient — If selected, a constant term (y-intercept) is included in the equation. Otherwise, a constant term is not included.
Add a term — Add a term to the equation. An unknown coefficient is automatically added for each new term.
Remove last term — Remove the last term added to the equation.
Equation name — The name of the equation. By default, the name is automatically updated to be identical to the custom equation given by Equation. If you override the default, the name is no longer automatically updated.
General models are, in general, nonlinear combinations of (perhaps nonlinear) terms. They are defined by equations that may be nonlinear in the parameters. Use the General Equations pane on the New Custom Equation GUI to create custom general equations. Interface controls are described below.

Independent variable — Symbol representing the independent (predictor) variable. The default symbol is x.
Equation — Symbol representing the dependent (response) variable, followed by the general equation. The default symbol is y. As you type in the terms of the equation, the unknown coefficients, associated starting values, and constraints automatically populate the table. By default, the starting values are randomly selected on the interval [0,1] and are unconstrained.
You can immediately change the default starting values and constraints in this table, or you can change them later using the Fit Options GUI.
Equation name — The name of the equation. By default, the name is automatically updated to be identical to the custom equation given by Equation. If you override the default, the name is no longer automatically updated.
Note If you use the General Equations pane to define a linear equation, a nonlinear fitting procedure is used. While this is allowed, it is inefficient, and can result in less than optimal fitted coefficients. Use the Linear Equations pane to define custom linear equations. |
Editing and Saving Custom Models. When you click OK on the New Custom Equation GUI, the displayed Equation name is saved for the current session in the Custom Equations list on the Fitting GUI. The list is highlighted in the picture of the Fitting GUI below.

To edit a custom equation, select the equation in the Custom Equations list and click the Edit button. The Edit Custom Equation GUI appears. The Edit Custom Equation GUI is identical to the New Custom Equation GUI, but is pre-populated with the selected equation. After editing an equation in the Edit Custom Equation GUI, click OK to save it back to the Custom Equations list for further use in the current session. A button to Copy and Edit is also available, if you want to save both the original and edited equations for the current session.
To save custom equations for future sessions, select the File > Save Session menu item in Curve Fitting Tool.
Example: Legendre Polynomial. This example fits data using several custom linear equations. The data is generated, and is based on the nuclear reaction 12C(e,e'α)8Be. The equations use sums of Legendre polynomial terms.
Consider an experiment in which 124 MeV electrons are scattered from 12C nuclei. In the subsequent reaction, alpha particles are emitted and produce the residual nuclei 8Be. By analyzing the number of alpha particles emitted as a function of angle, you can deduce certain information regarding the nuclear dynamics of 12C. The reaction kinematics are shown below.

The data is collected by placing solid state detectors at values of Θα ranging from 10o to 240o in 10o increments.
It is sometimes useful to describe a variable expressed as a function of angle in terms of Legendre polynomials

where Pn(x) is a Legendre polynomial of degree n, x is cos(Θα), and an are the coefficients of the fit. Refer to the legendre function for information about generating Legendre polynomials.
For the alpha-emission data, you can directly associate the coefficients with the nuclear dynamics by invoking a theoretical model. Additionally, the theoretical model introduces constraints for the infinite sum shown above. In particular, by considering the angular momentum of the reaction, a fourth-degree Legendre polynomial using only even terms should describe the data effectively.
You can generate Legendre polynomials with Rodrigues' formula:
![]()
The Legendre polynomials up to fourth degree are given below.
Legendre Polynomials up to Fourth Degree
n | Pn(x) |
|---|---|
0 | 1 |
1 | x |
2 | (1/2)(3x2– 1) |
3 | (1/2)(5x3 – 3x) |
4 | (1/8)(35x4 – 30x2 + 3) |
The first step is to load the 12C alpha-emission data from the file carbon12alpha.mat, which is provided with the toolbox.
load carbon12alpha
The workspace now contains two new variables, angle and counts:
angle is a vector of angles (in radians) ranging from 10o to 240o in 10o increments.
counts is a vector of raw alpha particle counts that correspond to the emission angles in angle.
Import these two variables into Curve Fitting Tool and name the data set C12Alpha.
The Fit Editor for a custom equation fit type is shown below.

Fit the data using a fourth-degree Legendre polynomial with only even terms:
![]()
Because the Legendre polynomials depend only on the predictor variable and constants, you use the Linear Equations pane on the Create Custom Equation GUI. This pane is shown below for the model given by y1(x). Note that because angle is given in radians, the argument of the Legendre terms is given by cos(Θα).

The fit and residuals are shown below. The fit appears to follow the trend of the data well, while the residuals appear to be randomly distributed and do not exhibit any systematic behavior.

The numerical fit results are shown below. The 95% confidence bounds indicate that the coefficients associated with P0(x) and P4(x) are known fairly accurately, but that the P2(x) coefficient has a relatively large uncertainty.

To confirm the theoretical argument that the alpha-emission data is best described by a fourth-degree Legendre polynomial with only even terms, fit the data using both even and odd terms:
![]()
The Linear Equations pane of the Create Custom Equation GUI is shown below for the model given by y2(x).

The numerical results indicate that the odd Legendre terms do not contribute significantly to the fit, and the even Legendre terms are essentially unchanged from the previous fit. This confirms that the initial model choice is the best one.

Example: Fourier Series. This example fits the ENSO data using several custom nonlinear equations. The ENSO data consists of monthly averaged atmospheric pressure differences between Easter Island and Darwin, Australia. This difference drives the trade winds in the southern hemisphere.
As shown in Example: Smoothing Data, the ENSO data is clearly periodic, which suggests it can be described by a Fourier series

where ai and bi are the amplitudes, and ci are the periods (cycles) of the data. The question to be answered in this example is how many cycles exist? As a first attempt, assume a single cycle and fit the data using one sine term and one cosine term.
![]()
If the fit does not describe the data well, add additional sine and cosine terms with unique period coefficients until a good fit is obtained.
Because there is an unknown coefficient c1 included as part of the trigonometric function arguments, the equation is nonlinear. Therefore, you must specify the equation using the General Equations pane of the Create Custom Equation GUI.
This pane is shown below for the equation given by y1(x).

Note that the toolbox includes the Fourier series as a nonlinear library equation. However, the library equation does not meet the needs of this example because its terms are defined as fixed multiples of the fundamental frequency w. Refer to Fourier Series for more information.
The numerical results shown below indicate that the fit does not describe the data well. In particular, the fitted value for c1 is unreasonably small. Because the starting points are randomly selected, your initial fit results might differ from the results shown here.

As you saw in Example: Smoothing Data, the data include a periodic component with a period of about 12 months. However, with c1 unconstrained and with a random starting point, this fit failed to find that cycle. To assist the fitting procedure, constrain c1 to a value between 10 and 14. To define constraints for unknown coefficients, use the Fit Options GUI, which you open by clicking the Fit options button in the Fitting GUI.

The fit, residuals, and numerical results are shown below.

The fit appears to be reasonable for some of the data points but clearly does not describe the entire data set very well. As predicted, the numerical results indicate a cycle of approximately 12 months. However, the residuals show a systematic periodic distribution indicating that there are additional cycles that you should include in the fit equation. Therefore, as a second attempt, add an additional sine and cosine term to y1(x)
![]()
and constrain the upper and lower bounds of c2 to be roughly twice the bounds used for c1.
The fit, residuals, and numerical results are shown below.

The fit appears to be reasonable for most of the data points. However, the residuals indicate that you should include another cycle to the fit equation. Therefore, as a third attempt, add an additional sine and cosine term to y2(x)
![]()
and constrain the lower bound of c3 to be roughly three times the value of c1.
The fit, residuals, and numerical results are shown below.

The fit is an improvement over the previous two fits, and appears to account for most of the cycles present in the ENSO data set. The residuals appear random for most of the data, although a pattern is still visible indicating that additional cycles may be present, or you can improve the fitted amplitudes.
In conclusion, Fourier analysis of the data reveals three significant cycles. The annual cycle is the strongest, but cycles with periods of approximately 44 and 22 months are also present. These cycles correspond to El Nino and the Southern Oscillation (ENSO).
Example: Gaussian with Exponential Background. This example fits two poorly resolved Gaussian peaks on a decaying exponential background using a general (nonlinear) custom model. To get started, load the data from the file gauss3.mat, which is provided with the toolbox.
load gauss3
The workspace now contains two new variables, xpeak and ypeak:
xpeak is a vector of predictor values.
ypeak is a vector of response values.
Import these two variables into Curve Fitting Tool and accept the default data set name ypeak vs. xpeak.
You will fit the data with the following equation
![]()
where ai are the peak amplitudes, bi are the peak centroids, and ci are related to the peak widths. Because there are unknown coefficients included as part of the exponential function arguments, the equation is nonlinear. Therefore, you must specify the equation using the General Equations pane of the Create Custom Equation GUI. This pane is shown below for y(x).

The data, fit, and numerical fit results are shown below. Clearly, the fit is poor.

Because the starting points are randomly selected, your initial fit results might differ from the results shown here.
The results include this warning message.
Fit computation did not converge: Maximum number of function evaluations exceeded. Increasing MaxFunEvals (in fit options) may allow for a better fit, or the current equation may not be a good model for the data.
To improve the fit for this example, specify reasonable starting points for the coefficients. Deducing the starting points is particularly easy for the current model because the Gaussian coefficients have a straightforward interpretation and the exponential background is well defined. Additionally, as the peak amplitudes and widths cannot be negative, constrain a1, a2, c1, and c2 to be greater then zero.
To define starting values and constraints for unknown coefficients, use the Fit Options GUI, which you open by clicking the Fit options button. The starting values and constraints are shown below.

The data, fit, residuals, and numerical results are shown below.

Introduction. You specify fit options with the Fit Options GUI. The fit options for the single-term exponential are shown below. The coefficient starting values and constraints are for the census data.

The available GUI options depend on whether you are fitting your data using a linear model, a nonlinear model, or a nonparametric fit type. All the options described below are available for nonlinear models. Method, Robust, and coefficient constraints (Lower and Upper) are available for linear models. Interpolants and smoothing splines include Method, but no configurable options.
Method — The fitting method.
The method is automatically selected based on the library or custom model you use. For linear models, the method is LinearLeastSquares. For nonlinear models, the method is NonlinearLeastSquares.
Robust — Specify whether to use the robust least-squares fitting method. The values are
Off — Do not use robust fitting (default).
On — Fit with default robust method (bisquare weights).
LAR — Fit by minimizing the least absolute residuals (LAR).
Bisquare — Fit by minimizing the summed square of the residuals, and down-weight outliers using bisquare weights. In most cases, this is the best choice for robust fitting.
Algorithm — Algorithm used for the fitting procedure:
Trust-Region — This is the default algorithm and must be used if you specify coefficient constraints.
Levenberg-Marquardt — If the trust-region algorithm does not produce a reasonable fit, and you do not have coefficient constraints, you should try the Levenberg-Marquardt algorithm.
Gauss-Newton — This algorithm is included for pedagogical reasons and should be the last choice for most models and data sets.
Finite Differencing Parameters.
DiffMinChange — Minimum change in coefficients for finite difference Jacobians. The default value is 10-8.
DiffMaxChange — Maximum change in coefficients for finite difference Jacobians. The default value is 0.1.
Note that DiffMinChange and DiffMaxChange apply to
Any nonlinear custom equation — that is, a nonlinear equation that you write.
Some, but not all, of the nonlinear equations provided with Curve Fitting Toolbox software.
However, DiffMinChange and DiffMaxChange do not apply to any linear equations.
MaxFunEvals — Maximum number of function (model) evaluations allowed. The default value is 600.
MaxIter — Maximum number of fit iterations allowed. The default value is 400.
TolFun — Termination tolerance used on stopping conditions involving the function (model) value. The default value is 10-6.
TolX — Termination tolerance used on stopping conditions involving the coefficients. The default value is 10-6.
Unknowns — Symbols for the unknown coefficients to be fitted.
StartPoint — The coefficient starting values. The default values depend on the model. For rational, Weibull, and custom models, default values are randomly selected within the range [0,1]. For all other nonlinear library models, the starting values depend on the data set and are calculated heuristically.
Lower — Lower bounds on the fitted coefficients. The bounds are used only with the trust region fitting algorithm. The default lower bounds for most library models are -Inf, which indicates that the coefficients are unconstrained. However, a few models have finite default lower bounds. For example, Gaussians have the width parameter constrained so that it cannot be less than 0.
Upper — Upper bounds on the fitted coefficients. The bounds are used only with the trust region fitting algorithm. The default upper bounds for all library models are Inf, which indicates that the coefficients are unconstrained.
For more information about these fit options, refer to the Optimization Toolbox documentation.
The default coefficient starting points and constraints for library and custom models are given below. If the starting points are optimized, then they are calculated heuristically based on the current data set. Random starting points are defined on the interval [0,1] and linear models do not require starting points.
If a model does not have constraints, the coefficients have neither a lower bound nor an upper bound. You can override the default starting points and constraints by providing your own values using the Fit Options GUI.
Default Starting Points and Constraints
Model | Starting Points | Constraints |
|---|---|---|
Custom linear | N/A | None |
Custom nonlinear | Random | None |
Exponentials | Optimized | None |
Fourier series | Optimized | None |
Gaussians | Optimized | ci > 0 |
Polynomials | N/A | None |
Power series | Optimized | None |
Rationals | Random | None |
Sum of sines | Optimized | bi > 0 |
Weibull | Random | a, b > 0 |
Note that the sum of sines and Fourier series models are particularly sensitive to starting points, and the optimized values might be accurate for only a few terms in the associated equations.
This example fits measured data using a rational model. The data describes the coefficient of thermal expansion for copper as a function of temperature in degrees kelvin.
To get started, load the thermal expansion data from the file hahn1.mat, which is provided with the toolbox.
load hahn1
The workspace now contains two new variables, temp and thermex:
temp is a vector of temperatures in degrees kelvin.
thermex is a vector of thermal expansion coefficients for copper.
Import these two variables into Curve Fitting Tool and name the data set CuThermEx.
For this data set, you will find the rational equation that produces the best fit. As described in Library Models, rational models are defined as a ratio of polynomials

where n is the degree of the numerator polynomial and m is the degree of the denominator polynomial. Note that the rational equations are not associated with physical parameters of the data. Instead, they provide a simple and flexible empirical model that you can use for interpolation and extrapolation.
As you can see by examining the shape of the data, a reasonable initial choice for the rational model is quadratic/quadratic. The Fitting GUI configured for this equation is shown below.

The data, fit, and residuals are shown below.

The fit clearly misses the data for the smallest and largest predictor values. Additionally, the residuals show a strong pattern throughout the entire data set indicating that a better fit is possible.
For the next fit, try a cubic/cubic equation. The data, fit, and residuals are shown below.

The numerical results shown below indicate that the fit did not converge.

Although the message in the Results window indicates that you might improve the fit if you increase the maximum number of iterations, a better choice at this stage of the fitting process is to use a different rational equation because the current fit contains several discontinuities. These discontinuities are due to the function blowing up at predictor values that correspond to the zeros of the denominator.
As the next try, fit the data using a cubic/quadratic equation. The data, fit, and residuals are shown below.

The fit is well behaved over the entire data range, and the residuals are randomly scattered about zero. Therefore, you can confidently use this fit for further analysis.
This example fits data that is assumed to contain one outlier. The data consists of the 2000 United States presidential election results for the state of Florida. The fit model is a first degree polynomial and the fit method is robust linear least squares with bisquare weights.
In the 2000 presidential election, many residents of Palm Beach County, Florida, complained that the design of the election ballot was confusing, which they claim led them to vote for the Reform candidate Pat Buchanan instead of the Democratic candidate Al Gore. The so-called "butterfly ballot" was used only in Palm Beach County and only for the election-day ballots for the presidential race. As you will see, the number of Buchanan votes for Palm Beach is far removed from the bulk of data, which suggests that the data point should be treated as an outlier.
To get started, load the Florida election result data from the file flvote2k.mat, which is provided with the toolbox.
load flvote2k
The workspace now contains these three new variables:
buchanan is a vector of votes for the Reform Party candidate Pat Buchanan.
bush is a vector of votes for the Republican Party candidate George Bush.
gore is a vector of votes for the Democratic Party candidate Al Gore.
Each variable contains 68 elements, which correspond to the 67 Florida counties plus the absentee ballots. The names of the counties are given in the variable counties. From these variables, create two data sets with the Buchanan votes as the response data: buchanan vs. bush and buchanan vs. gore.
For this example, assume that the relationship between the response and predictor data is linear with an offset of zero.
buchanan votes = (bush votes)(m1)
buchanan votes = (gore votes)(m2)
m1 is the number of Bush votes expected for each Buchanan vote, and m2 is the number of Gore votes expected for each Buchanan vote.
To create a first-degree polynomial equation with zero offset, you must create a custom linear equation. You create a custom equation using the Fitting GUI by selecting Custom Equations from the Type of fit list, and then clicking the New Equation button.
The Linear Equations pane of the Create Custom Equation GUI is shown below.

Before fitting, you should exclude the data point associated with the absentee ballots from each data set because these voters did not use the butterfly ballot. As described in Marking Outliers, you can exclude individual data points from a fit either graphically or numerically using the Exclude GUI. For this example, you should exclude the data numerically. The index of the absentee ballot data is given by
ind = find(strcmp(counties,'Absentee Ballots'))
ind =
68The Exclude GUI is shown below.

The exclusion rule is named AbsenteeVotes. You use the Fitting GUI to associate an exclusion rule with the data set to be fit.
For each data set, perform a robust fit with bisquare weights using the FlaElection equation defined above. For comparison purposes, also perform a regular linear least-squares fit.
You can identify the Palm Beach County data in the scatter plot by using the data tips feature, and knowing the index number of the data point.
ind = find(strcmp(counties,'Palm Beach'))
ind =
50The Fit Editor and the Fit Options GUI are shown below for a robust fit.

The data, robust and regular least-squares fits, and residuals for the buchanan vs. bush data set are shown below.

The graphical results show that the linear model is reasonable for the majority of data points, and the residuals appear to be randomly scattered around zero. However, two residuals stand out. The largest residual corresponds to Palm Beach County. The other residual is at the largest predictor value, and corresponds to Miami/Dade County.
The numerical results are shown below. The inverse slope of the robust fit indicates that Buchanan should receive one vote for every 197.4 Bush votes.

The data, robust and regular least-squares fits, and residuals for the buchanan vs. gore data set are shown below.

Again, the graphical results show that the linear model is reasonable for the majority of data points, and the residuals appear to be randomly scattered around zero. However, three residuals stand out. The largest residual corresponds to Palm Beach County. The other residuals are at the two largest predictor values, and correspond to Miami/Dade County and Broward County.
The numerical results are shown below. The inverse slope of the robust fit indicates that Buchanan should receive one vote for every 189.3 Gore votes.

Using the fitted slope value, you can determine the expected number of votes that Buchanan should have received for each fit. For the Buchanan versus Bush data, you evaluate the fit at a predictor value of 152,951. For the Buchanan versus Gore data, you evaluate the fit at a predictor value of 269,732. These results are shown below for both data sets and both fits.
Expected Buchanan Votes in Palm Beach County
Data Set | Fit | Expected Buchanan Votes |
|---|---|---|
Buchanan vs. Bush | Ordinary least squares | 814 |
Robust least squares | 775 | |
Buchanan vs. Gore | Ordinary least squares | 1246 |
Robust least squares | 1425 |
The robust results for the Buchanan versus Bush data suggest that Buchanan received 3411 – 775 = 2636 excess votes, while robust results for the Buchanan versus Gore data suggest that Buchanan received 3411 – 1425 = 1986 excess votes.
The margin of victory for George Bush is given by
margin = sum(bush)-sum(gore) margin = 537
Therefore, the voter intention comes into play because in both cases, the margin of victory is less than the excess Buchanan votes.
In conclusion, the analysis of the 2000 United States presidential election results for the state of Florida suggests that the Reform Party candidate received an excess number of votes in Palm Beach County, and that this excess number was a crucial factor in determining the election outcome. However, additional analysis is required before a final conclusion can be made.
In some cases, you are not concerned about extracting or interpreting fitted parameters. Instead, you might simply want to draw a smooth curve through your data. Fitting of this type is called nonparametric fitting. The Curve Fitting Toolbox software supports these nonparametric fitting methods:
Interpolants — Estimate values that lie between known data points.
Smoothing spline — Create a smooth curve through the data. You adjust the level of smoothness by varying a parameter that changes the curve from a least-squares straight-line approximation to a cubic spline interpolant.
For more information about interpolation, refer to Polynomials and the interp1 function in the MATLAB documentation.
This example fits the following data using a cubic spline interpolant and several smoothing splines.
x = (4*pi)*[0 1 rand(1,25)]; y = sin(x) + .2*(rand(size(x))-.5);
As shown below, you can fit the data with a cubic spline interpolant by selecting Interpolant from the Type of fit list.

The results shown below indicate that goodness-of-fit statistics are not defined for interpolants.

A cubic spline interpolation is defined as a piecewise polynomial that results in a structure of coefficients. The number of "pieces" in the structure is one less than the number of fitted data points, and the number of coefficients for each piece is four because the polynomial degree is three. The toolbox does not allow you to access the structure of coefficients.
As shown below, you can fit the data with a smoothing spline by selecting Smoothing Spline in the Type of fit list.

The level of smoothness is given by the Smoothing Parameter. The default smoothing parameter value depends on the data set, and is automatically calculated by the toolbox after you click the Apply button.
For this data set, the default smoothing parameter is close to 1, indicating that the smoothing spline is nearly cubic and comes very close to passing through each data point. Create a fit for the default smoothing parameter and name it Smooth1. If you do not like the level of smoothing produced by the default smoothing parameter, you can specify any value between 0 and 1. A value of 0 produces a linear polynomial fit, while a value of 1 produces a piecewise cubic polynomial fit that passes through all the data points. For comparison purposes, create another smoothing spline fit using a smoothing parameter of 0.5 and name the fit Smooth2.
The numerical results for the smoothing spline fit Smooth1 are shown below.

The data and fits are shown below. The default abscissa scale was increased to show the fit behavior beyond the data limits. You change the axes limits with Tools > Axes Limit Control menu item.

Note that the default smoothing parameter produces a curve that is smoother than the interpolant, but is a good fit to the data. In this case, decreasing the smoothing parameter from the default value produces a curve that is smoother still, but is not a good fit to the data. As the smoothing parameter increases beyond the default value, the associated curve approaches the cubic spline interpolant.
![]() | Preprocessing Data | Programmatic Curve Fitting | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |