Skip to Main Content Skip to Search
Product Documentation

Center Selection Algorithms

Rols

This is the basic algorithm as described in Chen, Chng, and Alkadhimi [See References]. In Rols (Regularized Orthogonal Least Squares) the centers are chosen one at a time from a candidate set consisting of all the data points or a subset thereof. It picks new centers in a forward selection procedure. Starting from zero centers, at each step the center that reduces the regularized error the most is selected. At each step the regression matrix X is decomposed using the Gram-Schmidt algorithm into a product X = WB where W has orthogonal columns and B is upper triangular with ones on the diagonal. This is similar in nature to a QR decomposition. Regularized error is given by where g = Bw and e is the residual, given by . Minimizing regularized error makes the sum square error small, while at the same time not letting get too large. As g is related to the weights by g = Bw, this has the effect of keeping the weights under control and reducing overfit. The term rather than the sum of the squares of the weights is used to improve efficiency.

The algorithm terminates either when the maximum number of centers is reached, or adding new centers does not decrease the regularized error ratio significantly (controlled by a user-defined tolerance).

Fit Parameters

Maximum number of centers — The maximum number of centers that the algorithm can select. The default is the smaller of 25 centers or π of the number of data points. The format is min(nObs/4, 25). You can enter a value (for example, entering 10 produces ten centers) or edit the existing formula (for example, (nObs/2, 25) produces half the number of data points or 25, whichever is smaller).

Percentage of data to be candidate centers — The percentage of the data points that should be used as candidate centers. This determines the subset of the data points that form the pool to select the centers from. The default is 100%, that is, to consider all the data points as possible new centers. This can be reduced to speed up the execution time.

Regularized error tolerance — Controls how many centers are selected before the algorithm stops. See Chen, Chng, and Alkadhimi [References] for details. This parameter should be a positive number between 0 and 1. Larger tolerances mean that fewer centers are selected. The default is 0.0001. If less than the maximum number of centers is being chosen, and you want to force the selection of the maximum number, then reduce the tolerance to epsilon (eps).

RedErr

RedErr stands for Reduced Error. This algorithm also starts from zero centers, and selects centers in a forward selection procedure. The algorithm finds (among the data points not yet selected) the data point with the largest residual, and chooses that data point as the next center. This process is repeated until the maximum number of centers is reached.

Fit Parameters

Only has Number of centers.

WiggleCenters

This algorithm is based on a heuristic that you should put more centers in a region where there is more variation in the residual. For each data point, a set of neighbors is identified as the data points within a distance of sqrt(nf) divided by the maximum number of centers, where nf is the number of factors. The average residuals within the set of neighbors is computed, then the amount of wiggle of the residual in the region of that data point is defined to be the sum of the squares of the differences between the residual at each neighbor and the average residuals of the neighbors. The data point with the most wiggle is selected to be the next center.

Fit Parameters

Almost as in the Rols algorithm, except no Regularized error.

CenterExchange

This algorithm takes a concept from optimal Design of Experiments and applies it to the center selection problem in radial basis functions. A candidate set of centers is generated by a Latin hypercube, a method that provides a quasi-uniform distribution of points. From this candidate set, n centers are chosen at random. This set is augmented by p new centers, then this set of n+p centers is reduced to n by iteratively removing the center that yields the best PRESS statistic (as in stepwise). This process is repeated the number of times specified in Number of augment/reduce cycles.

CentreExchange and Tree Regression (see Tree Regression) are the only algorithms that permit centers that are not located at the data points. This means that you do not see centers on model plots. The CentreExchange algorithm has the potential to be more flexible than the other center selection algorithms that choose the centers to be a subset of the data points; however, it is significantly more time consuming and not recommended on larger problems.

Fit Parameters

Number of centers — The number of centers that will be chosen

Number of augment/reduce cycles — The number of times that the center set is augmented, then reduced

Number of centers to augment by — How many centers to augment by

  


Recommended Products

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.

 © 1984-2012- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS