Generate randomized subset of features
[IDX, Z] = randfeatures(X, Group, '
randfeatures(..., 'Classifier', C)
randfeatures(..., 'ClassOptions', CO)
randfeatures(..., 'PerformanceThreshold', PT)
randfeatures(..., 'ConfidenceThreshold', CT)
randfeatures(..., 'SubsetSize', SS)
randfeatures(..., 'PoolSize', PS)
randfeatures(..., 'NumberOfIndices', N)
randfeatures(..., 'CrossNorm', CN)
randfeatures(..., 'Verbose', VerboseValue)
[IDX, Z] = randfeatures(X, Group, ' performs
a randomized subset feature search reinforced by classification.
generates subsets of features used to classify the samples. Every
subset is evaluated with the apparent error. Only the best subsets
are kept, and they are joined into a single final pool. The cardinality
for every feature in the pool gives the measurement of the significance.
X contains the training samples. Every column
X is an observed vector.
the class labels.
Group can be a numeric vector
or a cell array of character vectors;
be the same as the number of columns in
be greater than or equal to
the classification significance for every feature.
the indices after sorting
Z; i.e., the first one
points to the most significant feature.
randfeatures(..., 'Classifier', C) sets
the classifier. Options are
'da' (default) Discriminant analysis 'knn' K nearest neighbors
randfeatures(..., 'ClassOptions', CO) is
a cell with extra options for the selected classifier. When you specify
the discriminant analysis model (
'da') as a classifier,
classify function with its
default parameters. For the KNN classifier,
fitcknn with the following default options.
PT) sets the correct classification threshold used to pick
the subsets included in the final pool. For the
the default is
0.8. For the
the default is
CT) uses the posterior probability of the discriminant
analysis to invalidate classified subvectors with low confidence.
When using the
'da' model, the default is
of classes). When using the
the default is 1, meaning any classified subvector must have all k neighbors
classified to the same class in order to be kept in the pool.
randfeatures(..., 'SubsetSize', SS) sets
the number of features considered in every subset. Default is
randfeatures(..., 'PoolSize', PS) sets
the targeted number of accepted subsets for the final pool. Default
N) sets the number of output indices in
Default is the same as the number of features.
randfeatures(..., 'CrossNorm', CN) applies
independent normalization across the observations for every feature.
Cross-normalization ensures comparability among different features,
although it is not always necessary because the selected classifier
properties might already account for this. Options are
'none' (default) Intensities are not cross-normalized. 'meanvar' x_new = (x - mean(x))/std(x) 'softmax' x_new = (1+exp((mean(x)-x)/std(x)))^-1 'minmax' x_new = (x - min(x))/(max(x)-min(x))
randfeatures(..., 'Verbose', VerboseValue),
off verbosity. Default is
Find a reduced set of genes that is sufficient for classification of all the cancer types in the t-matrix NCI60 data set. Load sample data.
I = randfeatures(X,GROUP,'SubsetSize',15,'Classifier','da');
Test features with a linear discriminant classifier.
C = classify(X(I(1:25),:)',X(I(1:25),:)',GROUP); cp = classperf(GROUP,C); cp.CorrectRate
ans = 1
 Li, L., Umbach, D.M., Terry, P., and Taylor, J.A. (2003). Application of the GA/KNN method to SELDI proteomics data. PNAS. 20, 1638-1640.
 Liu, H., Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic Publishers.
 Ross, D.T. et.al. (2000). Systematic Variation in Gene Expression Patterns in Human Cancer Cell Lines. Nature Genetics. 24 (3), 227-235.