Documentation

This is machine translation

Translated by Microsoft
Mouse over text to see original. Click the button below to return to the English verison of the page.

Model Building and Assessment

Feature selection, hyperparameter optimization, cross-validation, residual diagnostics, plots

When building a high-quality regression model, it is important to select the right features (or predictors), tune hyperparameters (model parameters not fit to the data), and assess model assumptions through residual diagnostics.

You can tune hyperparameters by iterating between choosing values for them, and cross-validating a model using your choices. This process yields multiple models, and the best model among them can be the one that minimizes the estimated generalization error. For example, to tune an SVM model, choose a set of box constraints and kernel scales, cross-validate a model for each pair of values, and then compare their 10-fold cross-validated mean-squared error estimates.

Certain nonparametric regression functions in Statistics and Machine Learning Toolbox™ additionally offer automatic hyperparameter tuning through Bayesian optimization, grid search, or random search. However, bayesopt, which is the main function to implement Bayesian optimization, is flexible enough for many other applications. For more details, see Bayesian Optimization Workflow.

Functions

sequentialfs Sequential feature selection
relieff Importance of attributes (predictors) using ReliefF algorithm
stepwiselm Create linear regression model using stepwise regression
stepwiseglm Create generalized linear regression model by stepwise regression
bayesopt Find global minimum of function using Bayesian Optimization
hyperparameters Variable descriptions for optimizing a fit function
crossval Loss estimate using cross validation
cvpartition Data partitions for cross validation
repartition Repartition data for cross-validation
test Test indices for cross-validation
training Training indices for cross-validation
coefCI Confidence intervals of coefficient estimates of linear model
coefTest Linear hypothesis test on linear regression model coefficients
dwtest Durbin-Watson test of linear model
plot Scatter plot or added variable plot of linear model
plotAdded Added variable plot or leverage plot for linear model
plotAdjustedResponse Adjusted response plot for linear regression model
plotDiagnostics Plot diagnostics of linear regression model
plotEffects Plot main effects of each predictor in linear regression model
plotInteraction Plot interaction effects of two predictors in linear regression model
plotResiduals Plot residuals of linear regression model
plotSlice Plot of slices through fitted linear regression surface
coefCI Confidence intervals of coefficient estimates of generalized linear model
coefTest Linear hypothesis test on generalized linear regression model coefficients
devianceTest Analysis of deviance
plotDiagnostics Plot diagnostics of generalized linear regression model
plotResiduals Plot residuals of generalized linear regression model
plotSlice Plot of slices through fitted generalized linear regression surface
coefCI Confidence intervals of coefficient estimates of nonlinear regression model
coefTest Linear hypothesis test on nonlinear regression model coefficients
plotDiagnostics Plot diagnostics of nonlinear regression model
plotResiduals Plot residuals of nonlinear regression model
plotSlice Plot of slices through fitted nonlinear regression surface
linhyptest Linear hypothesis test

Using Objects

BayesianOptimization Bayesian optimization results
optimizableVariable Variable description for bayesopt or other optimizers
cvpartition Data partitions for cross validation

Topics

Feature Selection

Feature Selection

Learn about feature selection algorithms, such as sequential feature selection.

Hyperparameter Optimization

Bayesian Optimization Workflow

Perform Bayesian optimization using a fit function or by calling bayesopt directly.

Variables for a Bayesian Optimization

Create variables for Bayesian optimization.

Bayesian Optimization Objective Functions

Create the objective function for Bayesian optimization.

Constraints in Bayesian Optimization

Set different types of constraint for Bayesian optimization.

Optimize a Boosted Regression Ensemble

Minimize cross-validation loss of a regression ensemble.

Bayesian Optimization Plot Functions

Visually monitor a Bayesian optimization.

Bayesian Optimization Output Functions

Monitor a Bayesian optimization.

Bayesian Optimization Algorithm

Understand underlying algorithms for Bayesian optimization.

Cross-Validation

Implement Cross-Validation Using Parallel Computing

Speed up cross-validation using parallel computing.

Linear Model Diagnostics

Interpret Linear Regression Results

Display and interpret linear regression output statistics.

Examine Quality and Adjust the Fitted Model

After fitting a model, examine the result and make adjustments.

Linear Regression with Interaction Effects

Construct and analyze a linear regression model with interaction effects and interpret the results.

Summary of Output and Diagnostic Statistics

F-statistic and t-statistic

In linear regression, the F-statistic is the test statistic for the analysis of variance (ANOVA) approach to test the significance of the model or the components in the model. The t-statistic is useful for making inferences about the regression coefficients

Coefficient of Determination (R-Squared)

Coefficient of determination (R-squared) indicates the proportionate amount of variation in the response variable y explained by the independent variables X in the linear regression model.

Coefficient Standard Errors and Confidence Intervals

Estimated coefficient variances and covariances capture the precision of regression coefficient estimates.

Residuals

Residuals are useful for detecting outlying y values and checking the linear regression assumptions with respect to the error term in the regression model.

Durbin-Watson Test

The Durbin-Watson test assesses whether there is autocorrelation among the residuals or not.

Cook's Distance

Cook's distance is useful for identifying outliers in the X values (observations for predictor variables).

Hat Matrix and Leverage

The hat matrix provides a measure of leverage.

Delete-1 Statistics

Delete-1 change in covariance (covratio) identifies the observations that are influential in the regression fit.

Generalized Linear Model Diagnostics

Examine Quality and Adjust the Fitted Model

After fitting a model, examine the result.

Nonlinear Model Diagnostics

Examine Quality and Adjust the Fitted Nonlinear Model

Diagnostic plots can help you examine the quality of a model.

Was this topic helpful?