Remove terms from linear regression model
Remove Terms from Linear Regression Model
Create a linear regression model using the
hald data set. Remove terms that have high p-values.
Load the data set.
load hald X = ingredients; % predictor variables y = heat; % response variable
Fit a linear regression model to the data.
mdl = fitlm(X,y)
mdl = Linear regression model: y ~ 1 + x1 + x2 + x3 + x4 Estimated Coefficients: Estimate SE tStat pValue ________ _______ ________ ________ (Intercept) 62.405 70.071 0.8906 0.39913 x1 1.5511 0.74477 2.0827 0.070822 x2 0.51017 0.72379 0.70486 0.5009 x3 0.10191 0.75471 0.13503 0.89592 x4 -0.14406 0.70905 -0.20317 0.84407 Number of observations: 13, Error degrees of freedom: 8 Root Mean Squared Error: 2.45 R-squared: 0.982, Adjusted R-Squared: 0.974 F-statistic vs. constant model: 111, p-value = 4.76e-07
x4 terms because their p-values are high.
terms = 'x3 + x4'; % terms to remove NewMdl = removeTerms(mdl,terms)
NewMdl = Linear regression model: y ~ 1 + x1 + x2 Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 52.577 2.2862 22.998 5.4566e-10 x1 1.4683 0.1213 12.105 2.6922e-07 x2 0.66225 0.045855 14.442 5.029e-08 Number of observations: 13, Error degrees of freedom: 10 Root Mean Squared Error: 2.41 R-squared: 0.979, Adjusted R-Squared: 0.974 F-statistic vs. constant model: 230, p-value = 4.41e-09
NewMdl has the same adjusted R-squared value (0.974) as the previous model, meaning the fit is as good in the new model. All the terms in the new model have extremely low p-values.
terms — Terms to remove from regression model
character vector or string scalar formula in Wilkinson notation | t-by-p terms matrix
Terms to remove from the regression model
specified as one of the following:
Character vector or string scalar formula in Wilkinson Notation representing one or more terms. The variable names in the formula must be valid MATLAB® identifiers.
Tof size t-by-p, where t is the number of terms and p is the number of predictor variables in
mdl. The value of
T(i,j)is the exponent of variable
For example, suppose
mdlhas three variables
Cin that order. Each row of
Trepresents one term:
[0 0 0]— Constant term or intercept
[0 1 0]—
A^0 * B^1 * C^0
[1 0 1]—
[2 0 0]—
[0 1 2]—
removeTerms treats a group of indicator variables for
a categorical predictor as a single variable. Therefore, you cannot specify
an indicator variable to remove from the model. If you specify a categorical
predictor to remove from the model,
a group of indicator variables for the predictor in one step. See Modify Linear Regression Model Using step for an example that describes how to
create indicator variables manually and treat each one as a separate
NewMdl — Linear regression model with fewer terms
Linear regression model with fewer terms, returned as a
NewMdl is a newly fitted model that uses the input
data and settings in
mdl with the terms specified in
terms removed from
To overwrite the input argument
mdl, assign the newly
fitted model to
mdl = removeTerms(mdl,terms);
removeTermstreats a categorical predictor as follows:
A model with a categorical predictor that has L levels (categories) includes L – 1 indicator variables. The model uses the first category as a reference level, so it does not include the indicator variable for the reference level. If the data type of the categorical predictor is
categorical, then you can check the order of categories by using
categoriesand reorder the categories by using
reordercatsto customize the reference level. For more details about creating indicator variables, see Automatic Creation of Dummy Variables.
removeTermstreats the group of L – 1 indicator variables as a single variable. If you want to treat the indicator variables as distinct predictor variables, create indicator variables manually by using
dummyvar. Then use the indicator variables, except the one corresponding to the reference level of the categorical variable, when you fit a model. For the categorical predictor
X, if you specify all columns of
dummyvar(X)and an intercept term as predictors, then the design matrix becomes rank deficient.
Interaction terms between a continuous predictor and a categorical predictor with L levels consist of the element-wise product of the L – 1 indicator variables with the continuous predictor.
Interaction terms between two categorical predictors with L and M levels consist of the (L – 1)*(M – 1) indicator variables to include all possible combinations of the two categorical predictor levels.
You cannot specify higher-order terms for a categorical predictor because the square of an indicator is equal to itself.
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).