| Statistics Toolbox™ | ![]() |
| On this page… |
|---|
Parametric nonlinear models represent the relationship between a continuous response variable and one or more predictor variables (either continuous or categorical) in the form y = f (X, β) + ε, where
y is an n by-1 vector of observations of the response variable.
X is an n-by-p design matrix determined by the predictors.
β is a p-by-1 vector of unknown parameters to be estimated.
f is any function of X and β.
ε is an n-by-1 vector of independent, identically distributed random disturbances.
Nonparametric models do not attempt to characterize the relationship between predictors and response with model parameters. Descriptions are often graphical, as in the case of Regression Trees.
The Hougen-Watson model (Bates and Watts, [2], pp. 271–272) for reaction kinetics is an example of a parametric nonlinear model. The form of the model is
![]()
where rate is the reaction rate, x1, x2, and x3 are concentrations of hydrogen, n-pentane, and isopentane, respectively, and β1, β2, ... , β5 are the unknown parameters.
The file reaction.mat contains simulated reaction data:
load reaction
The variables are:
rate — A 13-by-1 vector of observed reaction rates
reactants — A 13-by-3 matrix of reactant concentrations
beta — A 5-by-1 vector of initial parameter estimates
model — The name of an M-file function for the model
xn — The names of the reactants
yn — The name of the response
The M-file function for the model is hougen, which looks like this:
type hougen function yhat = hougen(beta,x) %HOUGEN Hougen-Watson model for reaction kinetics. % YHAT = HOUGEN(BETA,X) gives the predicted values of the % reaction rate, YHAT, as a function of the vector of % parameters, BETA, and the matrix of data, X. % BETA must have five elements and X must have three % columns. % % The model form is: % y = (b1*x2 - x3/b5)./(1+b2*x1+b3*x2+b4*x3) b1 = beta(1); b2 = beta(2); b3 = beta(3); b4 = beta(4); b5 = beta(5); x1 = x(:,1); x2 = x(:,2); x3 = x(:,3); yhat = (b1*x2 - x3/b5)./(1+b2*x1+b3*x2+b4*x3);
The function nlinfit is used to find least-squares parameter estimates for nonlinear models. It uses the Gauss-Newton algorithm with Levenberg-Marquardt modifications for global convergence.
nlinfit requires the predictor data, the responses, and an initial guess of the unknown parameters. It also requires a function handle to a function that takes the predictor data and parameter estimates and returns the responses predicted by the model.
To fit the reaction data, call nlinfit using the following syntax:
load reaction
betahat = nlinfit(reactants,rate,@hougen,beta)
betahat =
1.2526
0.0628
0.0400
0.1124
1.1914The output vector betahat contains the parameter estimates.
The function nlinfit has robust options, similar to those for robustfit, for fitting nonlinear models to data with outliers.
To compute confidence intervals for the parameter estimates, use the function nlparci, together with additional outputs from nlinfit:
[betahat,resid,J] = nlinfit(reactants,rate,@hougen,beta); betaci = nlparci(betahat,resid,J) betaci = -0.7467 3.2519 -0.0377 0.1632 -0.0312 0.1113 -0.0609 0.2857 -0.7381 3.1208
The columns of the output betaci contain the lower and upper bounds, respectively, of the (default) 95% confidence intervals for each parameter.
The function nlpredci is used to compute confidence intervals for predicted responses:
[yhat,delta] = nlpredci(@hougen,reactants,betahat,resid,J);
opd = [rate yhat delta]
opd =
8.5500 8.4179 0.2805
3.7900 3.9542 0.2474
4.8200 4.9109 0.1766
0.0200 -0.0110 0.1875
2.7500 2.6358 0.1578
14.3900 14.3402 0.4236
2.5400 2.5662 0.2425
4.3500 4.0385 0.1638
13.0000 13.0292 0.3426
8.5000 8.3904 0.3281
0.0500 -0.0216 0.3699
11.3200 11.4701 0.3237
3.1300 3.4326 0.1749The output opd contains the observed rates in the first column and the predicted rates in the second column. The (default) 95% simultaneous confidence intervals on the predictions are the values in the second column ± the values in the third column. These are not intervals for new observations at the predictors, even though most of the confidence intervals do contain the original observations.
Calling nlintool opens a graphical user interface (GUI) for interactive exploration of multidimensional nonlinear functions, and for fitting parametric nonlinear models. The GUI calls nlinfit, and requires the same inputs. The interface is analogous to polytool and rstool for polynomial models.
Open nlintool with the reaction data and the hougen model by typing
load reaction nlintool(reactants,rate,@hougen,beta,0.01,xn,yn)

You see three plots. The response variable for all plots is the reaction rate, plotted in green. The red lines show confidence intervals on predicted responses. The first plot shows hydrogen as the predictor, the second shows n-pentane, and the third shows isopentane.
Each plot displays the fitted relationship of the reaction rate to one predictor at a fixed value of the other two predictors. The fixed values are in the text boxes below each predictor axis. Change the fixed values by typing in a new value or by dragging the vertical lines in the plots to new positions. When you change the value of a predictor, all plots update to display the model at the new point in predictor space.
While this example uses only three predictors, nlintool can accommodate any number of predictors.
Note The Statistics Toolbox™ demonstration function rsmdemo generates simulated data for experimental settings specified by either the user or by a D-optimal design generated by cordexch. It uses the rstool interface to visualize response surface models fit to the data, and it uses the nlintool interface to visualize a nonlinear model fit to the data. |
Parametric models specify the form of the relationship between predictors and a response, as in the Hougen-Watson model described in Parametric Models. In many cases, however, the form of the relationship is unknown, and a parametric model requires assumptions and simplifications. Regression trees offer a nonparametric alternative. When response data are categorical, classification trees are a natural modification.
Note This section demonstrates methods for objects of the @classregtree class. These methods supersede the functions treefit, treedisp, treeval, treeprune, and treetest, which are maintained in Statistics Toolbox software only for backwards compatibility. |
Algorithm Reference. The algorithms used by Statistics Toolbox classification and regression tree functions are based on those in Breiman, L., et al., Classification and Regression Trees, Chapman & Hall, Boca Raton, 1993.
This example uses the data on cars in carsmall.mat to create a regression tree for predicting mileage using measurements of weight and the number of cylinders as predictors. Note that, in this case, one predictor (weight) is continuous and the other (cylinders) is categorical. The response (mileage) is continuous.
Load the data and use the classregtree constructor of the @classregtree class to create the regression tree:
load carsmall
t = classregtree([Weight, Cylinders],MPG,...
'cat',2,'splitmin',20,...
'names',{'Weight','Cylinders'})
t =
Decision tree for regression
1 if Weight<3085.5 then node 2 else node 3
2 if Weight<2371 then node 4 else node 5
3 if Cylinders=8 then node 6 else node 7
4 if Weight<2162 then node 8 else node 9
5 if Cylinders=6 then node 10 else node 11
6 if Weight<4381 then node 12 else node 13
7 fit = 19.2778
8 fit = 33.3056
9 fit = 29.6111
10 fit = 23.25
11 if Weight<2827.5 then node 14 else node 15
12 if Weight<3533.5 then node 16 else node 17
13 fit = 11
14 fit = 27.6389
15 fit = 24.6667
16 fit = 16.6
17 fit = 14.3889t is a classregtree object and can be operated on with any of the methods of the class.
Use the type method of the @classregtree class to show the type of the tree:
treetype = type(t) treetype = regression
classregtree creates a regression tree because MPG is a numerical vector, and the response is assumed to be continuous.
To view the tree, use the view method of the @classregtree class:
view(t)

The tree predicts the response values at the circular leaf nodes based on a series of questions about the car at the triangular branching nodes. A true answer to any question follows the branch to the left; a false follows the branch to the right.
Use the tree to predict the mileage for a 2000-pound car with either 4, 6, or 8 cylinders:
mileage2K = t([2000 4; 2000 6; 2000 8]) mileage2K = 33.3056 33.3056 33.3056
Note that the object allows for functional evaluation, of the form t(X). This is a shorthand way of calling the eval method of the @classregtree class.
The predicted responses computed above are all the same. This is because they follow a series of splits in the tree that depend only on weight, terminating at the left-most leaf node in the view above. A 4000-pound car, following the right branch from the top of the tree, leads to different predicted responses:
mileage4K = t([4000 4; 4000 6; 4000 8]) mileage4K = 19.2778 19.2778 14.3889
You can use a variety of other methods of the @classregtree class, such as cutvar, cuttype, and cutcategories, to get more information about the split at node 3 that distinguishes the 8-cylinder car:
var3 = cutvar(t,3) % What variable determines the split?
var3 =
'Cylinders'
type3 = cuttype(t,3) % What type of split is it?
type3 =
'categorical'
c = cutcategories(t,3) % Which classes are sent to the left
% child node, and which to the right?
c =
[8] [1x2 double]
c{1}
ans =
8
c{2}
ans =
4 6
Regression trees fit the original (training) data well, but may do a poor job of predicting new values. Lower branches, especially, may be strongly affected by outliers. A simpler tree often avoids over-fitting. To find the best regression tree, employing the techniques of resubstitution and cross-validation, use the test method of the @classregtree class.
![]() | Linear Regression | Multivariate Methods | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |