In this example, use a database of 1985 car imports with 205 observations, 25 predictors, and 1 response, which is insurance risk rating, or "symboling." The first 15 variables are numeric
Test for the significance of the regression coefficients using t-statistic.
Display R-squared (coefficient of determination) and adjusted R-squared. Load the sample data and define the response and independent variables.
Fit a linear regression model. A typical workflow involves the following: import data, fit a regression, test its quality, modify it to improve the quality, and share it.
Compute the covariance matrix and standard errors of the coefficients.
Use the CovRatio statistics to determine the influential points in data. Load the sample data and define the response and predictor variables.
Uses a bagged ensemble so it can use all three methods of evaluating ensemble quality.
Identify and remove redundant predictors from a generalized linear model.
Uses data for predicting the insurance risk of a car based on its many attributes.
View a classification or regression tree. There are two ways to view a tree: view(tree) returns a text description and view(tree,'mode','graph') returns a graphic description of the tree.
Determine the observations that are influential on the fitted response values using Dffits values. Load the sample data and define the response and independent variables.
Test for autocorrelation among the residuals of a linear regression model.
Fit a generalized linear model and analyze the results. A typical workflow involves the following: import data, fit a generalized linear model, test its quality, modify it to improve the
Determine the observations that have large influence on coefficients using Dfbetas . Load the sample data and define the response and independent variables.
Regularize binomial regression. The default (canonical) link function for binomial regression is the logistic function.
Compute Leverage values and assess high leverage observations. Load the sample data and define the response and independent variables.
Assess the model assumptions by examining the residuals of a fitted linear regression model.
Assess the fit of the model and the significance of the regression coefficients using the F-statistic.
There are diagnostic plots to help you examine the quality of a model. plotDiagnostics(mdl) gives a variety of plots, including leverage and Cook's distance plots. plotResiduals(mdl)
Use the methods predict , feval , and random to predict and simulate responses to new data.
Compute and plot S2_i values to examine the change in the mean squared error when an observation is removed from the data. Load the sample data and define the response and independent
Hows how to predict out-of-sample responses of regression trees, and then plot the results.
Create a regression ensemble to predict mileage of cars based on their horsepower and weight, trained on the carsmall data.
Two ways of fitting a nonlinear logistic regression model. The first method uses maximum likelihood (ML) and the second method uses generalized least squares (GLS) via the function fitnlm
Make Bayesian inferences for a logistic regression model using slicesample.
Predict class labels or responses using trained classification and regression trees.
Apply Partial Least Squares Regression (PLSR) and Principal Components Regression (PCR), and discusses the effectiveness of the two methods. PLSR and PCR are both methods to model a
Analyze time series data using Statistics and Machine Learning Toolbox™ features.
Fit a nonlinear regression model for data with nonconstant error variance.
Pitfalls that can occur when fitting a nonlinear model by transforming to linearity. Imagine that we have collected measurements on two variables, x and y, and we want to model y as a function
Fitting a Gaussian Process Regression (GPR) model to data with a large number of observations, using the Block Coordinate Descent (BCD) Approximation.
Fit and evaluate generalized linear models using glmfit and glmval. Ordinary linear regression can be used to fit a straight line, or any function that is linear in its parameters, to data
Choose the appropriate split predictor selection technique for your data set when growing a random forest of regression trees. This example also shows how to decide which predictors are
Create a custom plot function for bayesopt. It further shows how to use information in the UserData property of a BayesianOptimization object.
Optimize hyperparameters of a boosted regression ensemble. The optimization minimizes the cross-validation loss of the model.
Use a custom output function with Bayesian optimization. The output function halts the optimization when the objective function, which is the cross-validation error rate, drops below
Detect outliers using quantile random forest. Quantile random forest can detect outliers with respect to the conditional distribution of given . However, this method cannot detect
Estimate conditional quantiles of a response given predictor data using quantile random forest and by estimating the conditional distribution function of the response using kernel
Implement Bayesian optimization to tune the hyperparameters of a random forest of regression trees using quantile error. Tuning a model using quantile error, rather than mean squared
Examine the resubstitution and cross-validation accuracy of a regression tree for predicting mileage based on the carsmall data.
Compare models that stepwiselm returns starting from a constant model and starting from a full interaction model.
Use lasso along with cross validation to identify important predictors.
Construct and analyze a linear regression model with interaction effects and interpret the results.
Creates a classification tree for the ionosphere data, and prunes it to a good level.
Use robust regression. It compares the results of a robust fit to a standard least-squares fit.
Do a typical nonlinear regression workflow: import data, fit a nonlinear regression, test its quality, modify it to improve the quality, and make predictions based on the model.
Consider a data set with 100 observations of 10 predictors. Generate the random data from a logistic model, with a binomial distribution of responses at each set of values for the predictors.
Regularize a model with many more predictors than observations. Wide data is data with more predictors than observations. Typically, with wide data you want to identify important
Control the depth of a decision tree, and how to choose an appropriate depth.
Predict the mileage (MPG) of a car based on its weight, displacement, horsepower, and acceleration, using the lasso and elastic net methods.