relieff

Importance of attributes (predictors) using ReliefF algorithm

Syntax

[RANKED,WEIGHT] = relieff(X,Y,K)
[RANKED,WEIGHT] = relieff(X,Y,K,'PARAM1',val1,'PARAM2',val2,...)

Description

[RANKED,WEIGHT] = relieff(X,Y,K) computes ranks and weights of attributes (predictors) for input data matrix X and response vector Y using the ReliefF algorithm for classification or RReliefF for regression with K nearest neighbors. For classification, relieff uses K nearest neighbors per class. RANKED are indices of columns in X ordered by attribute importance, meaning RANKED(1) is the index of the most important predictor. WEIGHT are attribute weights ranging from -1 to 1 with large positive weights assigned to important attributes.

If Y is numeric, relieff by default performs RReliefF analysis for regression. If Y is categorical, logical, a character array, or a cell array of strings, relieff by default performs ReliefF analysis for classification.

Attribute ranks and weights computed by relieff usually depend on K. If you set K to 1, the estimates computed by relieff can be unreliable for noisy data. If you set K to a value comparable with the number of observations (rows) in X, relieff can fail to find important attributes. You can start with K = 10 and investigate the stability and reliability of relieff ranks and weights for various values of K.

[RANKED,WEIGHT] = relieff(X,Y,K,'PARAM1',val1,'PARAM2',val2,...) specifies optional parameter name/value pairs.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'method'

Either 'regression' (default if Y is numeric) or 'classification' (default if Y is not numeric).

'prior'

Prior probabilities for each class, specified as a string ('empirical' or 'uniform') or as a vector (one value for each distinct group name) or as a structure S with two fields:

  • S.group containing the group names as a categorical variable, character array, or cell array of strings

  • S.prob containing a vector of corresponding probabilities

If the input value is 'empirical' (default), class probabilities are determined from class frequencies in Y. If the input value is 'uniform', all class probabilities are set equal.

'updates'

Number of observations to select at random for computing the weight of every attribute. By default all observations are used.

'categoricalx'

'on' or 'off', 'off' by default. If 'on', treat all predictors in X as categorical. If 'off', treat all predictors in X as numerical. You cannot mix numerical and categorical predictors.

'sigma'

Distance scaling factor. For observation i, influence on the attribute weight from its nearest neighbor j is multiplied by exp((-rank(i,j)/sigma)^2), where rank(i,j) is the position of j in the list of nearest neighbors of i sorted by distance in the ascending order. Default is Inf (all nearest neighbors have the same influence) for classification and 50 for regression.

Examples

expand all

Rank Predictors by Importance

Load the sample data.

load ionosphere;

Rank the predictors based on importance.

[ranked,weights] = relieff(X,Y,10);

Create a bar plot of predictor importance weights.

bar(weights(ranked));
xlabel('Predictor rank');
ylabel('Predictor importance weight');

Determine the Important Predictors

Load the sample data.

load fisheriris

Find the important predictors.

[ranked,weight] = relieff(meas,species,10)
ranked =

     4     3     1     2


weight =

    0.1399    0.1226    0.3590    0.3754

The fourth predictor is the most important, and the second predictor is the least important.

References

[1] Kononenko, I., Simec, E., & Robnik-Sikonja, M. (1997). Overcoming the myopia of inductive learning algorithms with RELIEFF. Retrieved from CiteSeerX: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.4740

[2] Robnik-Sikonja, M., & Kononenko, I. (1997). An adaptation of Relief for attribute estimation in regression. Retrieved from CiteSeerX: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.8381

[3] Robnik-Sikonja, M., & Kononenko, I. (2003). Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning , 53, 23–69.

See Also

|

Was this topic helpful?