Sequential feature selection
inmodel = sequentialfs(fun,X,y)
inmodel = sequentialfs(fun,X,Y,Z,...)
[inmodel,history] = sequentialfs(fun,X,...)
 = sequentialfs(...,
inmodel = sequentialfs(fun,X,y) selects
a subset of features from the data matrix
best predict the data in
y by sequentially selecting
features until there is no improvement in prediction. Rows of
to observations; columns correspond to variables or features.
a column vector of response values or class labels for each observation
have the same number of rows.
fun is a function
handle to a function that defines the criterion used to select features
and to determine when to stop. The output
a logical vector indicating which features are finally chosen.
Starting from an empty feature set,
candidate feature subsets by sequentially adding each of the features
not yet selected. For each candidate feature subset,
10-fold cross-validation by repeatedly calling
different training subsets of
and test subsets of
criterion = fun(XTRAIN,ytrain,XTEST,ytest)
the same subset of rows of
the complementary subset of rows.
the data taken from the columns of
X that correspond
to the current candidate feature set.
Each time it is called,
fun must return a
train or fit a model, then predicts values for
that model, and finally returns some measure of distance, or loss,
of those predicted values from
ytest. In the cross-validation
calculation for a given candidate feature set,
the values returned by
fun and divides that sum
by the total number of test observations. It then uses that mean value
to evaluate each candidate feature subset.
Typical loss measures include sum of squared errors for regression
sequentialfs computes the mean-squared
error in this case), and the number of misclassified observations
for classification models (
the misclassification rate in this case).
sequentialfs divides the sum of the values
fun across all test sets by the total
number of test observations. Accordingly,
not divide its output value by the number of test observations.
After computing the mean
for each candidate feature subset,
the candidate feature subset that minimizes the mean criterion value.
This process continues until adding more features does not decrease
inmodel = sequentialfs(fun,X,Y,Z,...) allows
any number of input variables
sequentialfs chooses features (columns)
X, but otherwise imposes no interpretation
... . All data inputs, whether column vectors or matrices, must have
the same number of rows.
training and test subsets of
... as follows:
criterion = fun(XTRAIN,YTRAIN,ZTRAIN,..., XTEST,YTEST,ZTEST,...)
... by selecting subsets of the rows of
fun must return a scalar value
but may compute that value in any way. Elements of the logical vector
to columns of
X and indicate which features are
[inmodel,history] = sequentialfs(fun,X,...) returns
information on which feature is chosen at each step.
a scalar structure with the following fields:
Crit — A vector containing
the criterion values computed at each step.
In — A logical matrix in
i indicates the features selected at
 = sequentialfs(..., specifies
optional parameter name/value pairs from the following table.
The validation method used to compute the criterion for each candidate feature subset.
The default value is
So-called wrapper methods use a function
A positive integer indicating the number of Monte-Carlo
repetitions for cross-validation. The default value is
The direction of the sequential search. The default is
A logical vector or a vector of column numbers specifying features that must be included. The default is empty.
A logical vector or a vector of column numbers specifying features that must be excluded. The default is empty.
The number of features at which
A logical value, indicating whether or not the null model
(containing no features from
Options structure for the iterative sequential search
algorithm, as created by
Perform sequential feature selection for classification of noisy features:
load fisheriris; X = randn(150,10); X(:,[1 3 5 7 ])= meas; y = species; c = cvpartition(y,'k',10); opts = statset('display','iter'); fun = @(XT,yT,Xt,yt)... (sum(~strcmp(yt,classify(Xt,XT,yT,'quadratic')))); [fs,history] = sequentialfs(fun,X,y,'cv',c,'options',opts) Start forward sequential feature selection: Initial columns included: none Columns that cannot be included: none Step 1, added column 7, criterion value 0.04 Step 2, added column 5, criterion value 0.0266667 Final columns included: 5 7 fs = 0 0 0 0 1 0 1 0 0 0 history = In: [2x10 logical] Crit: [0.0400 0.0267] history.In ans = 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0