Documentation |
Regression models describe the relationship between a dependent variable, y, and independent variable or variables, X. The dependent variable is also called the response variable. Independent variables are also called explanatory or predictor variables. Continuous predictor variables might be called covariates, whereas categorical predictor variables might be also referred to as factors. The matrix, X, of observations on predictor variables is usually called the design matrix.
A multiple linear regression model is
$${y}_{i}={\beta}_{0}+{\beta}_{1}{X}_{i1}+{\beta}_{2}{X}_{i2}+\cdots +{\beta}_{p}{X}_{ip}+{\epsilon}_{i},\text{\hspace{1em}}i=1,\cdots ,n,$$
where
y_{i} is the ith response.
β_{k} is the kth coefficient, where β_{0} is the constant term in the model. Sometimes, design matrices might include information about the constant term. However, fitlm or stepwiselm by default includes a constant term in the model, so you must not enter a column of 1s into your design matrix X.
X_{ij} is the ith observation on the jth predictor variable, j = 1, ..., p.
ε_{i} is the ith noise term, that is, random error.
In general, a linear regression model can be a model of the form
$${y}_{i}={\beta}_{0}+{\displaystyle \sum _{k=1}^{K}{\beta}_{k}{f}_{k}\left({X}_{i1},{X}_{i2},\cdots ,{X}_{ip}\right)}+{\epsilon}_{i},\text{\hspace{1em}}i=1,\cdots ,n,$$
where f (.) is a scalar-valued function of the independent variables, X_{ij}s. The functions, f (X), might be in any form including nonlinear functions or polynomials. The linearity, in the linear regression models, refers to the linearity of the coefficients β_{k}. That is, the response variable, y, is a linear function of the coefficients, β_{k}.
Some examples of linear models are:
$$\begin{array}{l}{y}_{i}={\beta}_{0}+{\beta}_{1}{X}_{1i}+{\beta}_{2}{X}_{2i}+{\beta}_{3}{X}_{3i}+{\epsilon}_{i}\\ {y}_{i}={\beta}_{0}+{\beta}_{1}{X}_{1i}+{\beta}_{2}{X}_{2i}+{\beta}_{3}{X}_{1i}^{3}+{\beta}_{4}{X}_{2i}^{2}+{\epsilon}_{i}\\ {y}_{i}={\beta}_{0}+{\beta}_{1}{X}_{1i}+{\beta}_{2}{X}_{2i}+{\beta}_{3}{X}_{1i}{X}_{2i}+{\beta}_{4}\mathrm{log}{X}_{3i}+{\epsilon}_{i}\end{array}$$
The following, however, are not linear models since they are not linear in the unknown coefficients, β_{k}.
$$\begin{array}{l}\mathrm{log}{y}_{i}={\beta}_{0}+{\beta}_{1}{X}_{1i}+{\beta}_{2}{X}_{2i}+{\epsilon}_{i}\\ {y}_{i}={\beta}_{0}+{\beta}_{1}{X}_{1i}+\frac{1}{{\beta}_{2}{X}_{2i}}+{e}^{{\beta}_{3}{X}_{1i}{X}_{2i}}+{\epsilon}_{i}\end{array}$$
The usual assumptions for linear regression models are:
The noise terms, ε_{i}, are uncorrelated.
The noise terms, ε_{i}, have independent and identical normal distributions with mean zero and constant variance, σ^{2}. Thus
$$\begin{array}{l}E\left({y}_{i}\right)=E\left({\displaystyle \sum _{k=0}^{K}{\beta}_{k}{f}_{k}\left({X}_{i1},{X}_{i2},\cdots ,{X}_{ip}\right)}+{\epsilon}_{i}\right)\\ \text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}={\displaystyle \sum _{k=0}^{K}{\beta}_{k}{f}_{k}\left({X}_{i1},{X}_{i2},\cdots ,{X}_{ip}\right)}+E\left({\epsilon}_{i}\right)\\ \text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}={\displaystyle \sum _{k=0}^{K}{\beta}_{k}{f}_{k}\left({X}_{i1},{X}_{i2},\cdots ,{X}_{ip}\right)}\end{array}$$
and
$$V\left({y}_{i}\right)=V\left({\displaystyle \sum _{k=0}^{K}{\beta}_{k}{f}_{k}\left({X}_{i1},{X}_{i2},\cdots ,{X}_{ip}\right)}+{\epsilon}_{i}\right)=V\left({\epsilon}_{i}\right)={\sigma}^{2}$$
So the variance of y_{i} is the same for all levels of X_{ij}.
The responses y_{i} are uncorrelated.
The fitted linear function is
$${\widehat{y}}_{i}={b}_{0}+{\displaystyle \sum _{k=1}^{K}{b}_{k}{f}_{k}\left({X}_{i1},{X}_{i2},\cdots ,{X}_{ip}\right)},\text{\hspace{1em}}i=1,\cdots ,n,$$
where $${\widehat{y}}_{i}$$ is the estimated response and b_{k}s are the fitted coefficients. The coefficients are estimated so as to minimize the mean squared difference between the prediction vector bf(X) and the true response vector y, that is $$\widehat{y}-y$$. This method is called the method of least squares. Under the assumptions on the noise terms, these coefficients also maximize the likelihood of the prediction vector.
In a linear regression model of the form y = β_{1}X_{1} + β_{2}X_{2} + ... + β_{p}X_{p}, the coefficient β_{k} expresses the impact of a one-unit change in predictor variable, X_{j}, on the mean of the response, E(y) provided that all other variables are held constant. The sign of the coefficient gives the direction of the effect. For example, if the linear model is E(y) = 1.8 – 2.35X_{1} + X_{2}, then –2.35 indicates a 2.35 unit decrease in the mean response with a one-unit increase in X_{1}, given X_{2} is held constant. If the model is E(y) = 1.1 + 1.5X_{1}^{2} + X_{2}, the coefficient of X_{1}^{2} indicates a 1.5 unit increase in the mean of Y with a one-unit increase in X_{1}^{2} given all else held constant. However, in the case of E(y) = 1.1 + 2.1X_{1} + 1.5X_{1}^{2}, it is difficult to interpret the coefficients similarly, since it is not possible to hold X_{1} constant when X_{1}^{2} changes or vice versa.
[1] Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman. Applied Linear Statistical Models. IRWIN, The McGraw-Hill Companies, Inc., 1996.
[2] Seber, G. A. F. Linear Regression Analysis. Wiley Series in Probability and Mathematical Statistics. John Wiley and Sons, Inc., 1977.
fitlm | LinearModel | stepwiselm