Rank key features by class separability criteria

`[`

* IDX*,

`Z`

`X`

`Group`

[

`IDX`

`Z`

`X`

`Group`

`CriterionValue`

[

`IDX`

`Z`

`X`

`Group`

`ALPHA`

[

`IDX`

`Z`

`X`

`Group`

`BETA`

[

`IDX`

`Z`

`X`

`Group`

`N`

[

`IDX`

`Z`

`X`

`Group`

`CN`

`[`

ranks
the features in * IDX*,

`Z`

`X`

`Group`

`X`

`X`

`Group`

* IDX* is the list of indices to the
rows in

`X`

`Z`

* Group* can be a numeric vector, a cell array of character
vectors or string vector.

`numel(Group)`

is the same as the number
of columns in `X`

`Group`

`X`

`[`

calls * IDX*,

`Z`

`X`

`Group`

`PropertyName`

`PropertyValue`

`rankfeatures`

with optional
properties that use property name/property value pairs. You can specify
one or more properties in any order. Each `PropertyName`

```
[
```

sets
the criterion used to assess the significance of every feature for
separating two labeled groups. Choices are:* IDX*,

`Z`

`X`

`Group`

`CriterionValue`

`'ttest'`

(default) — Absolute value two-sample t-test with pooled variance estimate.`'entropy'`

— Relative entropy, also known as Kullback-Leibler distance or divergence.`'bhattacharyya'`

— Minimum attainable classification error or Chernoff bound.`'roc'`

— Area between the empirical receiver operating characteristic (ROC) curve and the random classifier slope.`'wilcoxon'`

— Absolute value of the standardized u-statistic of a two-sample unpaired Wilcoxon test, also known as Mann-Whitney.

`'ttest'`

, `'entropy'`

, and `'bhattacharyya'`

assume
normal distributed classes while `'roc'`

and `'wilcoxon'`

are
nonparametric tests. All tests are feature independent.

`[`

uses
correlation information to outweigh the * IDX*,

`Z`

`X`

`Group`

`ALPHA`

`Z`

`Z`

*
(1-`ALPHA`

*(RHO))

, where `RHO`

is
the average of the absolute values of the cross-correlation coefficient
between the candidate feature and all previously selected features. `ALPHA`

`0`

and `1`

.
When `ALPHA`

`0`

(default)
potential features are not weighted. A large value of `RHO`

(close
to `1`

) outweighs the significance statistic; this
means that features that are highly correlated with the features already
picked are less likely to be included in the output list.`[`

uses
regional information to outweigh the * IDX*,

`Z`

`X`

`Group`

`BETA`

`Z`

`Z`

*
(1-exp(-(DIST/`BETA`

).^2))

, where `DIST`

is
the distance (in rows) between the candidate feature and previously
selected features. `BETA`

`0`

. When `BETA`

is `0`

(default)
potential features are not weighted. A small `DIST`

(close
to `0`

) outweighs the significance statistics of
only close features. This means that features that are close to already
picked features are less likely to be included in the output list.
This option is useful for extracting features from time series with
temporal correlation.* BETA* can also be a function of the
feature location, specified using

`@`

or an anonymous
function. In both cases `rankfeatures`

passes the
row position of the feature to `BETA()`

and expects
back a value greater than or equal to `0`

. You can use `'CCWeighting'`

and `'NWeighting'`

together.

`[`

sets
the number of output indices in * IDX*,

`Z`

`X`

`Group`

`N`

`IDX`

`ALPHA`

`BETA`

`0`

,
or `20`

otherwise.`[`

applies
independent normalization across the observations for every feature.
Cross-normalization ensures comparability among different features,
although it is not always necessary because the selected criterion
might already account for this. Choices are:* IDX*,

`Z`

`X`

`Group`

`CN`

`'none'`

(default) — Intensities are not cross-normalized.`'meanvar'`

—`x_new = (x - mean(x))/std(x)`

`'softmax'`

—`x_new = (1+exp((mean(x)-x)/std(x)))^-1`

`'minmax'`

—`x_new = (x - min(x))/(max(x)-min(x))`

[1] Theodoridis, S., and Koutroumbas, K. (1999). Pattern Recognition, Academic Press, 341-342.

[2] Liu, H., Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic Publishers.

[3] Ross, D.T. et.al. (2000). Systematic Variation in Gene Expression Patterns in Human Cancer Cell Lines. Nature Genetics. 24 (3), 227-235.

`classify`

| `classperf`

| `crossvalind`

| `randfeatures`

| `sequentialfs`