version 1.0 (7.74 KB) by
Bill Whiten

Select terms in linear regression (equation fitting) using eigenvectors and term significance

linfitregsel selects relevant terms in linear regression and calculate the regression equation. The value of x is calculated to minimise the sum of squares of A*x-b. A combination of first rejecting small eigenvalues with their associated eigenvectors, and then removing insignificant terms is used, until both the eigenvalue and significance criteria are satisfied. This may gives a different result from stepwise or all subsets term selection (which usually also give different results). In particular correlated terms are handled very differently, giving an equation that balances correlated terms. As the coefficient calculation is different from normal regression the variance matrix of the result is also different.

The function is used as:

[x,vm,info]=linfitregsel(A,b,optn)

Where A is the matrix of predictor (independent) variables (including a unit column if a constant term is required), and b is a column of values to be predicted (dependent values) using A. The optional argument (optn) can be used to change default values:

optn.reg is the ratio of the largest eigenvalues below which eigenvalues are rejected (default 0.01).

optn.treg is the ratio of coefficient values to their standard errors below which the term is excluded from the regression. This can be given as a vector of increasing values to allow for changes in term significance as the equation is refined. These values can be chosen using Student-t probabilities (Default [0.5,1.0,2.0]).

optn.sel allows the forcing of terms out of, or into, the regression (default all columns available for selection).

Outputs are:

x is the result of the regression, with no rejection of terms it is A\b, rejected terms are set to zero.

vm is the variance matrix of x, in particular sqrt(diag(vm)) gives the standard errors of x, and sqrt(a’*vm*a) is the standard error of a’*x. Entries corresponding to rejected terms are zero. If no terms are rejected vm is inv(A’*A)*info.sdr.

info.nreg is the number of eigenvectors used in the regresion.

info.sdr is the standard deviation of the residuals.

The accuracy and usefulness of the regression equation depends on the experimental design, range covered by the predictor matrix, and the accuracy of the data. The experimental design should cover the range of interest and the ratio of the extreme eigenvalues of A should not be small (possibly after normalisation of the data).

Withholding part of the data for validation is strongly recommended.

As this regression method allows for correlated predictors, by calculating an equation that balances the different predictors it may give more robust predictions than ordinary linear regression using the same predictors.

MATLAB 8.0 (R2012b)

**Inspired by:**
Optional function arguments, Restore project status for selected project

**Inspired:**
Greyboxbuild: complete a greybox model