## Documentation Center |

Generate randomized subset of features

`[IDX, Z] = randfeatures(X, Group, 'PropertyName', PropertyValue...)`

randfeatures(..., 'Classifier', C)

randfeatures(..., 'ClassOptions', CO)

randfeatures(..., 'PerformanceThreshold', PT)

randfeatures(..., 'ConfidenceThreshold', CT)

randfeatures(..., 'SubsetSize', SS)

randfeatures(..., 'PoolSize', PS)

randfeatures(..., 'NumberOfIndices', N)

randfeatures(..., 'CrossNorm', CN)

randfeatures(..., 'Verbose', VerboseValue)

`[IDX, Z] = randfeatures(X, Group, 'PropertyName', PropertyValue...)` performs
a randomized subset feature search reinforced by classification.

`X` contains the training samples. Every column
of `X` is an observed vector. `Group` contains
the class labels. `Group` can be a numeric vector
or a cell array of strings;` numel(Group)` must be
the same as the number of columns in `X`, and `numel(unique(Group))` must
be greater than or equal to `2`. `Z` is
the classification significance for every feature. `IDX` contains
the indices after sorting `Z`; i.e., the first one
points to the most significant feature.

`randfeatures(..., 'Classifier', C)` sets
the classifier. Options are

'da' (default) Discriminant analysis 'knn' K nearest neighbors

`randfeatures(..., 'ClassOptions', CO)`is
a cell with extra options for the selected classifier. Defaults are `{5,'correlation','consensus'}` for `KNN` and `{'linear'}` for `DA`.
See `knnclassify` and `classify` for more information.

`randfeatures(..., 'PerformanceThreshold',
PT)` sets the correct classification threshold used to pick
the subsets included in the final pool. Default is `0.8` (`80%`).

`randfeatures(..., 'ConfidenceThreshold',
CT)` uses the posterior probability of the discriminant
analysis to invalidate classified subvectors with low confidence.
This option is only valid when `Classifier` is `'da'`.
Using it has the same effect as using `'consensus'` in `KNN`;
i.e., it makes the selection of approved subsets very stringent. Default
is `0.95.^(number of classes)`.

`randfeatures(..., 'SubsetSize', SS)` sets
the number of features considered in every subset. Default is `20`.

`randfeatures(..., 'PoolSize', PS)` sets
the targeted number of accepted subsets for the final pool. Default
is `1000`.

`randfeatures(..., 'NumberOfIndices',
N)` sets the number of output indices in `IDX`.
Default is the same as the number of features.

`randfeatures(..., 'CrossNorm', CN)` applies
independent normalization across the observations for every feature.
Cross-normalization ensures comparability among different features,
although it is not always necessary because the selected classifier
properties might already account for this. Options are

'none' (default) Intensities are not cross-normalized. 'meanvar' x_new = (x - mean(x))/std(x) 'softmax' x_new = (1+exp((mean(x)-x)/std(x)))^-1 'minmax' x_new = (x - min(x))/(max(x)-min(x))

`randfeatures(..., 'Verbose', VerboseValue)`,
when `Verbose` is `true`, turns
off verbosity. Default is `true`.

Find a reduced set of genes that is sufficient for classification of all the cancer types in the t-matrix NCI60 data set. Load sample data.

load NCI60tmatrix

Select features.

I = randfeatures(X,GROUP,'SubsetSize',15,'Classifier','da');

Test features with a linear discriminant classifier.

C = classify(X(I(1:25),:)',X(I(1:25),:)',GROUP); cp = classperf(GROUP,C); cp.CorrectRate

`classify` | `classperf` | `crossvalind` | `knnclassify` | `rankfeatures` | `sequentialfs` | `svmclassify`

Was this topic helpful?