Documentation |
Multiple linear regression
b = regress(y,X)
[b,bint] = regress(y,X)
[b,bint,r] = regress(y,X)
[b,bint,r,rint] = regress(y,X)
[b,bint,r,rint,stats] = regress(y,X)
[...] = regress(y,X,alpha)
b = regress(y,X) returns a p-by-1 vector b of coefficient estimates for a multilinear regression of the responses in y on the predictors in X. X is an n-by-p matrix of p predictors at each of n observations. y is an n-by-1 vector of observed responses.
regress treats NaNs in X or y as missing values, and ignores them.
If the columns of X are linearly dependent, regress obtains a basic solution by setting the maximum number of elements of b to zero.
[b,bint] = regress(y,X) returns a p-by-2 matrix bint of 95% confidence intervals for the coefficient estimates. The first column of bint contains lower confidence bounds for each of the p coefficient estimates; the second column contains upper confidence bounds.
If the columns of X are linearly dependent, regress returns zeros in elements of bint corresponding to the zero elements of b.
[b,bint,r] = regress(y,X) returns an n-by-1 vector r of residuals.
[b,bint,r,rint] = regress(y,X) returns an n-by-2 matrix rint of intervals that can be used to diagnose outliers. If the interval rint(i,:) for observation i does not contain zero, the corresponding residual is larger than expected in 95% of new observations, suggesting an outlier.
In a linear model, observed values of y are random variables, and so are their residuals. Residuals have normal distributions with zero mean but with different variances at different values of the predictors. To put residuals on a comparable scale, they are "Studentized," that is, they are divided by an estimate of their standard deviation that is independent of their value. Studentized residuals have t distributions with known degrees of freedom. The intervals returned in rint are shifts of the 95% confidence intervals of these t distributions, centered at the residuals.
[b,bint,r,rint,stats] = regress(y,X) returns a 1-by-4 vector stats that contains, in order, the R^{2} statistic, the F statistic and its p value, and an estimate of the error variance.
Note: When computing statistics, X should include a column of 1s so that the model contains a constant term. The F statistic and its p value are computed under this assumption, and they are not correct for models without a constant. The F statistic is the test statistic of the F-test on the regression model, for a significant linear regression relationship between the response variable and the predictor variables. The R^{2} statistic can be negative for models without a constant, indicating that the model is not appropriate for the data. |
[...] = regress(y,X,alpha) uses a 100*(1-alpha)% confidence level to compute bint and rint.
[1] Chatterjee, S., and A. S. Hadi. "Influential Observations, High Leverage Points, and Outliers in Linear Regression." Statistical Science. Vol. 1, 1986, pp. 379–416.
fitlm | LinearModel | mvregress | rcoplot | stepwiselm