Products & Services Solutions Academia Support User Community Company

Learn more about Model-Based Calibration   

Selecting Models

Select Button

The Model Selection window appears when you click the Select button. This window is intended to help you select a best model by comparing several candidate models.

The Select button is under the list view in the pane at the bottom of the Model Browser display. This pane is the Test Plans list pane at startup and changes title depending on the level in the model tree that is selected. The list box in this pane always contains the child nodes of whichever node in the tree is selected.

The pane also always contains three buttons: New, Delete, and Select.

Select is only available when the lower pane lists local models, response models, or models.

You can select among the following:

However, you cannot select between response models or test plans.

Select might not be available if you are not ready to choose among the child nodes. For example, at the response node, the child nodes must have models assigned as best (using the Select feature at those levels) before you can select among them. Also, if a response feature has child nodes of alternate models, you must select the best, or the Browser cannot tell which to use to calculate that response feature. After calculating MLE, Select compares the MLE model with the previous univariate model, and you can choose the best.

The Model Selection window allows visual comparison of several models. From the response level you can compare several two-stage models. From the local level, if you have added new response features you can compare the different two-stage models (constructed using different combinations of response feature models). If you have added child nodes to response feature models, you can compare them all using the Model Selection window.

When a model is selected as best it is copied up a level in the tree together with the outliers for that model fit.

A tree node is automatically selected as best if it is the only child, except two-stage models which are never automatically selected - you must use the Model Selection window.

If a best model node is changed the parent node loses best model status (but the automatic selection process will reselect that best model if it is the only child node).

Model Selection Guide

First it is important to point out that there is no recipe for model selection. It is not possible to cover the entire topic of Model Selection in a few paragraphs. Instead, we outline some general guidelines which should be helpful in using the Model-Based Calibration Toolbox product to choose the best model for a given data set. There are many books you can go to for a fuller account of statistical modeling; see References.

Overfitting and Underfitting

When fitting a model to noisy data, we effectively make the fundamental assumption that the data have been generated from some model (the "truth") by making predictions at given values of the inputs, then adding some amount of noise to each point, where the noise is drawn from a normal distribution with an unknown variance.

Our task is to discover both this model and the width of the noise distribution. In doing so, we aim for a compromise between bias, where our model does not follow the right trend in the data (and so does not match well with the underlying truth), and variance, where our model fits the data points too closely, and so "chases" the noise rather than trying to capture the true trend. These two extremes are known as underfitting and overfitting.

An important concept in this context is the number of parameters in a model. As this number increases, the model can bend in more complicated ways. If the number of parameters in our model is larger than that in the truth, then we risk overfitting, and if our model contains fewer parameters than the truth, we could underfit.

RMSE

Our basic measure of how closely a model fits some data is the Root Mean Squared Error (RMSE), which measures the average mismatch between each data point and the model. This is why you should look at the RMSE values as your first tool to inspect the quality of the fit — high RMSE values can indicate problems. When two-stage modeling, use the RMSE Explorer to quickly investigate the local models with highest RMSE.

The smaller the RMSE, the closer our model follows the data; if a model goes through each data point exactly, then the RMSE is zero. The illustration shows how increasing the number of parameters in the model can result in overfitting. The 9 data points (shown as black circles) are generated from a cubic polynomial which contains 4 parameters (the "truth", shown as the black curve) by adding a known amount of noise. We can see that by selecting candidate models containing more parameters than the truth, we can reduce, and even eliminate, any mismatch between the data points and our model, causing the RMSE to vanish. This latter case occurs when the number of parameters in the model is the same as the number of data points (an 8th order polynomial has 9 parameters).

This does not mean that we have obtained a good fit - the model is overfitting, as we can see from the large difference between the model and the truth in the regions between the data points. By forcing our model to go though all the data points, we have included too much structure in the curve, which reduces the quality of the fit away from the data points.

Similarly, if we use a model with fewer parameters than in the truth, we risk underfitting; our model is not flexible enough to match the truth well. This is shown in the following illustration.

Other Statistics

As illustrated above, relying solely on RMSE can result in overfitting, which leads to poor model performance away from regions containing data points. In general, this problem is tackled by replacing RMSE with some other statistic, which also must be reduced to improve the fit, but which is designed to rise when we start overfitting. This is why you should consider RMSE and another tool such as the PRESS statistic to help you decide on the best model.

PRESS RMSE (Predicted sum of squares) is calculated in a similar way to RMSE, except we remove a data point from the fit, and ask the model to predict where that point lies with no knowledge of the data in that area. To calculate PRESS RMSE, this process is repeated for each point in the data set and the results are averaged. If the value of PRESS RMSE is much bigger than the RMSE then we are overfitting. Weighted PRESS and GCV are also derived from this idea.

A different approach to solving the problem of overfitting results in statistics known as "Information Criteria", such as AIC and BIC. Here, we combine a term involving RMSE with a term that rises with the number of parameters in the model. This explicitly penalizes a model for an increase in its level of structure. Try to minimize the information criteria statistics. Both AIC and BIC are approximations, which get more accurate as the number of observations increases. In general, we do not recommend using them unless the ratio of the numbers of observations to parameters is greater than 40:1 (see Section 2.4 of Burnham and Anderson, References). AICc, however, can be used with smaller samples and is the most appropriate information criterion for most problems in engine calibration.

The absolute value of AICc for a given model includes an arbitrary constant, and so is of no direct use. However, the difference between the AICc value for two models is meaningful: one rule of thumb says that if this difference is greater than about 10, then the worse model can be neglected in the selection process (see Section 2.6 of Burnham and Anderson, References).

Validation

All of the statistics mentioned above attempt to yield a model which makes good predictions both at the data points, and in the regions in between the data points. The simplest way to confirm that this is the case, is to collect additional data and test (or "validate") the model against this new data, by evaluating the new RMSE based on these data. Comparing a validation RMSE with the RMSE based on the modeling data is a good model selection statistic. You can use the Model Evaluation window to validate models against other data, and you can use validation data throughout a test plan. See Using Validation Data.

How much validation data to collect (or whether it is feasible to collect any at all) are matters governed primarily by practical considerations.

Trends

Throughout this discussion, we have focussed on using statistics for model selection. It is advisable, however, to combine a study of the model statistics with a careful examination of the trends present in the models. It would be a mistake to underestimate the importance of using engineering knowledge as a tool for comparing models. In addition, if two or more models of a different type (e.g. two different RBF kernel functions) follow the same trend, then that lends confidence to those models, because they are likely to be picking up real structure in the data. You can use the cross-section view in the Model Selection window to plot multiple models on the same axes to aid this process.

Where to Find Statistics for Comparing Models

References

Summary Statistics

Use the Summary Statistics dialog box to choose which statistics you want displayed to help you evaluate models in these tools:

The standard summary statistics are PRESS RMSE (for linear models only) and RMSE, and these are always displayed. You can choose additional statistics in Summary Statistics dialog box by selecting the check boxes.

  1. To open the Summary Statistics dialog box,

    • From any global model node, select Model > Summary Statistics.

    • From the test plan, right-click on the global model block and select Summary Statistics (or use the Test Plan menu after selecting the global model block). Use this option before building models if you want the summary statistics to apply to all the models within the test plan. Summary statistics are inherited from the test plan node or the parent node on creation of a child node.

  2. Choose additional statistics by selecting the check boxes.

  3. Click OK. Changes made from a global model node are applied immediately to the Summary table and the Models list pane (if there are child nodes to compare). Resetting a model to the default test plan model also resets the summary statistics.

Available summary statistics are:

For definitions of any of the terms in the Summary Statistics formulae, see Toolbox Terms and Statistics Definitions. Note that for ordinary least squares cases, 'p' is the number of parameters, but for non-ordinary least squares cases (rols and ridge least squares) 'p' is the effective number of parameters (p = N-df).

Using Information Criteria to Compare Models

There are information criteria available as additional summary statistics for comparison of models. See Summary Statistics for information on how to display these. This section provides some statistical background to help you compare the Akaike Information Criteria (AIC and AICc) and the Bayes Information Criterion (BIC). See also Model Selection Guide for practical guidelines on using AIC and BIC.

AIC-type criteria are based on the difference in Kullback-Leibler information between two models, or their K-L distance. K-L distance is an appealing measure because it essentially compares the information content of two curves, by calculating the entropy in each. Akaike and others found ways to estimate K-L distance based on the results of a maximum likelihood estimate of the parameters of a model, given some data. These estimates are the information criteria, and become more accurate as the sample size increases.

BIC is derived from Bayes' theorem, and essentially just applies the Occam effect to select a preferred model; the idea that if two models provide an equally good fit with some data then the simpler model is the likelier. This can be understood in the following sense: for models with greater complexity (both in terms of the number of parameters and the set of values those parameters can take) it is less remarkable that they are able to fit a given data set well. Conversely, for a simple model, if you happen to encounter a data set for which the model provides an acceptable fit, it would seem a lucky coincidence. Therefore, for data matching both models well, the odds are that the simpler one is closer to the truth.

Quantifying these ideas leads to Bayes factors (evidence ratios) which measure the relative probabilities of two models. In the context of MBC, BIC is an estimate of Bayes factors based on the results of a maximum likelihood estimate, and, like AIC, increases in accuracy in the limit of large sample size. Although priors often spring to mind in the context of Bayes theorem, all of the above can proceed with uniform priors on everything, and the Occam effect still applies.

There is a degree of controversy over which approach gives the best results. Copious literature exists on the subject of Bayesian model selection, a smaller amount on K-L distance based techniques and a still smaller amount comparing the two approaches. Bayesian authors consistently find that BIC performs better in Monte Carlo simulations (e.g. Leonard and Hsu 1999 ) whereas Burnham and Anderson 2002 (the main proponents of K-L distance techniques) reach conclusions which favour AIC.

Such tests can be set up to favour either criterion, and there are two main effects relevant to understanding this. Differences arise due to the assumptions made about the truth in each case (relevant to choosing Bayes factors or K-L distance), and due to the number of samples relative to the number of parameters in the candidate models.

Regarding the former effect: Bayes factors always seek the simplest model consistent with the data. K-L distance also has this tendency, although not as strongly as the Bayesian approach. As a result, the simulations in which BIC does well tend to be based on simple models with few parameters (Leonard and Hsu choose a simple quadratic, then consider polynomials of order 1 to 7 in their candidate set). Although both approaches choose the correct model more often than any other, AIC gives slightly more weight to the higher order models than does BIC. For this reason, Bayesians often accuse AIC of overfitting.

Burnham and Anderson, however, are biologists and as such they abandon all hope of actually finding the true model in their candidate set - they simply attempt to find the best approximation to the truth. A typical simulation of theirs considers linear models with up to 13 possible variables in the context of predicting body fat. They are not concerned with the subtle shape of curves, only with which variables they can safely throw away. In this scenario, they find that BIC favours too simple a model and hence underfits the data.

Although for BIC most authors assume that the true model is contained within the candidate set, this is not necessary for model comparison — it just concerns the normalisation of the probabilities, and hence not the ratios that form the Bayes factors.

AIC and BIC both improve as estimators of their respective statistical measures as the sample size increases, with relative errors of O(n-1), where n is the sample size. AIC is obtained from a first order Taylor expansion, and AICc is a second order correction to that for the special case of Gaussian Likelihood (there is no general second order correction) and should be used when the ratio of data samples to model parameters (in the largest model for nested sets) is less than about 40:1. For very small sample sizes, even Bayesian authors do not seem to trust BIC, but do consider AICc.

In terms of the complexity of the truth, most problems in MBC probably lie in between the two extremes described above: internal combustion engines are not so simple that we assume that our model set really contains the precise, closed-form solutions to the relevant dynamical equations, but we are dealing with a mechanical system, not trying to predict, for example, characteristics of the human body. In terms of the number of samples per model parameter, AIC is seldom likely to be a reliable statistic; AICc should be used instead. But if you have reason to prefer a more conservative estimate of the complexity of the model, BIC should be considered.

For a discussion of Bayes factors, see:

Kass and Raftery (1995). Bayes factors. Journal of the American Statistical Association 90, 773-795

See also Chapter 28 from the following book: Information Theory, Inference, and Learning Algorithms, available from

Model Selection Window

The Model Selection window comprises several different views depending on the type of models being compared:

You can change to any available view in the Model Selection window using the View menu or by clicking the buttons of the toolbar.

The Assign Best button at the bottom of the window marks the currently selected model as best or you can double-click a model in the list.

Information about each candidate model is displayed in the list at the bottom. The information includes categories such as the number of observations and parameters, and various diagnostic statistics such as RMSE and PRESS RMSE. You can click on column headers in this list to sort models by that category — for example, clicking on the column header for PRESS RMSE sorts the models in order of increasing PRESS RMSE. As this statistic is an indication of the predictive power of the model, it is a useful diagnostic statistic to look at (the lower the better), but remember to also look at other factors.

To print the current view, use the File > Printmenu item or its hot key equivalent Ctrl+P. In the Response Surface view you can also use the right-click context menu.

To close the Model Selection window, use the File > Close menu item or its hot key equivalent Ctrl+W. This window is intended to help you select a best model by comparing several candidate models. On closing the figure, you are asked to confirm the model you chose as best.

See also Model Evaluation Window, which comprises some of the same views you see in the Model Selection window, and where you can use validation data.

Tests View

For a two-stage model the initial view is as follows:

The tests view shows the data being modeled (blue dots) and models that have been fitted to this data. The black line shows the local model that has been fitted to each test separately. The green line and red lines in this case show an MLE two-stage model and the Univariate two-stage model: you can see the local model curves reconstructed using response feature values taken from the global models, and compare the fits.

This view allows you to compare several models simultaneously. Using standard Windows multiselect behavior (Shift+click and Ctrl+click) in the list view, or by clicking the Select All button, you can view several two-stage models together. A maximum of five models can be selected at once. The legend allows you to identify the different plot lines.

If the local input has more than one factor, a Predicted/Observed View appears instead.

Clicking one of the plots (and holding the mouse button down) displays information about the data for that test. For example:

Here you see the values of the global variables for this test and some diagnostic statistics describing the model fit. Also displayed are the values (for this test) of the response features used to build this two-stage model and the two-stage model's estimation of these response features.

The controls allow navigation between tests.

You can change the size of the confidence intervals; these are displayed using a right-click menu on the plots themselves.

The prediction type allows a choice of Normalor PRESS (Predicted Error Sum of Squares) — although not if you entered this view through model evaluation (rather than model selection). PRESS predictions give an indication of the model fit if that test was not used in fitting the model. For more on PRESS see PRESS statistic, Summary Table, and Stepwise.

Predicted/Observed View

For a one-stage model, or when you are comparing different models for one Response Feature, the initial view is as follows:

The plot shows the data used to fit this model, against the predicted values found by evaluating the model at these data points. The straight black line is the plot of y=x. If the model fitted the data exactly, all the blue points would lie on this line. The error bars show the 95% confidence interval of the model fit.

For single inputs, the response is plotted directly against the input.

The Predicted/Observed view only allows single selection of models for display. Right-click to toggle test number display, as you can on most plots.

Response Surface View

This view shows the model surface in a variety of ways.

The default view is a 3-D plot of the model surface, as in the example. This model has five dependent factors; you can see these in the controls at the top left (there is a scroll bar as only four can be seen at once at this size of window).

You can choose which input factors to display by using the drop-down menus below the plot. The unselected input factors are held constant and you can change their values using the controls at the top left of the view (either by clicking the arrow buttons or by typing directly in the edit box).

Display using (S - datum) — If a datum model is being displayed, this check box appears. The datum variable here is spark angle, S. When you select this box, the model is displayed in terms of spark angle relative to the datum. The appropriate local variable name appears here. See Datum Models.

Display boundary constraint — If you have boundary models you can display them by selecting the check box. Areas outside the boundary are yellow, as shown in the example. Areas outside the boundary are yellow (or gray in table view). They are shown on all display types (contour, 2-D, surface, movie and table).

Display Type— Changes the model plot. Display options are available for some of these views and are described under the relevant view. The choices are as follows:

Export model values allows the currently displayed model surface to be saved to a MAT file or to the MATLAB workspace.

Right-click on the plot to reach the context menu and change many display properties (lighting, colormap etc.) and print to figure.

Within a test plan the memory is retained of the evaluation region, plot type and the number of points resolution last displayed in the Response Surface view.

Likelihood View

The likelihood view shows two plots relating to the log likelihood function evaluated at each test. It is useful for identifying problem tests for maximum likelihood estimation (MLE).

Each plot has a right-click menu that allows test numbers to be displayed on the plots and also offers autoscaling of the plots. You can also Print to Figure.

The likelihood view allows several models to be displayed simultaneously; click the Select All button at the bottom of the window or, in the model list view, Shift+click or Ctrl+click to select the models for display.

The upper plot shows values of the negative log likelihood function for each test. This shows the contribution of each test to the overall negative log likelihood function for the model, as compared with the average, as indicated by the horizontal green line.

The lower plot shows values of the T-squared statistic for each test. This is a weighted sum squared error of the response feature models for each test. As above, the purpose of this plot is to show how each test contributes to the overall T-squared statistic for this model. The horizontal line indicates the average.

RMSE View

The Root Mean Square Errors view has three different plots, each showing standard errors in the model fit for each test.

Each plot has a right-click menu that allows test numbers to be displayed on the plots, and you can Print to Figure.

The X variable menu allows you to use different variables as the x-axis of these plots.

The RMSE view allows several models to be displayed simultaneously; click the Select All button at the bottom of the window or, in the model list view, Shift+click or Ctrl+click to select the models for display.

Local RMSE shows the root mean squared error in the local model fit for each test.

Two-Stage RMSE shows the root mean squared error in the two-stage model fit to the data for each test. You should expect this to be higher than the local RMSE.

PRESS RMSE is available when all response feature models are linear. This plot shows the root mean squared error in the PRESS two-stage model fit at each test.

For information on PRESS RMSE see Summary Table and Model Selection Guide.

Residuals View

The residuals view shows the scatter plots of observation number, predicted and observed response, input factors, and residuals.

This view allows several models to be displayed simultaneously; click the Select All button at the bottom of the window or, in the model list view, Shift+click or Ctrl+click to select the models for display.

A right-click menu allows the test number of each point to be displayed when only one model is being displayed, as shown.

The X-axis factor and Y-axis factor menus allow you to display various statistics.

Cross Section View

The cross-section view shows an array of cross sections through the model surface. You can choose the point of cross section in each factor. Data points near cross sections are displayed, and you can alter the tolerances to determine how much data is shown. The only exception is when you evaluate a model without data; in this case no data points are displayed.

You can select individual data points by test number (using the Select Data Point button). You can double-click a data point in a graph to take the display directly to that point. You can choose to use a common Y-axis limit for all graphs using the check box.

If you have boundary models you can choose to display them here using the check box; regions outside the boundary are yellow, as shown in the example.

Within a test plan the memory is retained of the point last displayed in the Cross Section view; when you reopen the view you return to the same point.

The number of plots is the same as the number of input factors to the model. The plot in S shows the value of the model for a range of values of S while the other input factors are held constant. Their values are displayed in the controls at the top left, and are indicated on the plots by the vertical orange bars.

On the plots, the dotted lines indicate a confidence interval around the model. You define the confidence associated with these bounding lines using the Display confidence level (%) edit box. You can toggle confidence intervals on and off using the check box on this control.

For each model displayed, the value of the model and the confidence interval around this are recorded in the legend at the lower left. The text colors match the plot colors. In the example shown, two models are selected for display, resulting in blue (PS22 model) and green (POLY2 model) legends on the left to correspond with the blue and green plots. You can select multiple models to display in the list at the bottom using Ctrl+click, or click Select All. The values of the input factors (for which the model is evaluated) can be found in the controls (in the Input factors pane) and seen as the orange lines on the plots.

Data points are displayed when they fall within the tolerance limit near each cross section. You can set the tolerance in the Tol edit boxes.

The following example illustrates how the tolerance level determines which data points are displayed. The tolerance for TP_REL (500) includes all points in the data set (this is an extreme example). The plot for N therefore shows the data points for all the tests. Note that you can see the structure of the data as each test shows as a vertical line of points.

You can see that the orange line on the N plot passes through a test. This orange line shows the value of N for the cross-section plot of TP_REL. You can also read the value in the edit box (N=1753.3). The tolerance for N (200) only includes data points of this test. Data in adjacent tests fall outside this tolerance. Therefore the TP_REL plot shows the data points from one test only.

Increasing the tolerance on N will mean that more data points fall within the tolerance and so would appear on the TP_REL plot.

  


Recommended Products

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.

 © 1984-2009- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS