Rank importance of predictors using ReliefF or RReliefF algorithm
ranks predictors using either the ReliefF or RReliefF algorithm with
weights] = relieff(
k nearest neighbors. The input matrix
X contains predictor variables, and the vector
y contains a response vector. The function returns
idx, which contains the indices of the most important
weights, which contains the weights of the
y is numeric,
RReliefF analysis for regression by default. Otherwise,
performs ReliefF analysis for classification using
neighbors per class. For more information on ReliefF and RReliefF, see Algorithms.
Determine Important Predictors
Load the sample data.
Find the important predictors using 10 nearest neighbors.
[idx,weights] = relieff(meas,species,10)
idx = 1×4 4 3 1 2
weights = 1×4 0.1399 0.1226 0.3590 0.3754
idx shows the predictor numbers listed according to their ranking. The fourth predictor is the most important, and the second predictor is the least important.
weights gives the weight values in the same order as the predictors. The first predictor has a weight of 0.1399, and the fourth predictor has a weight of 0.3754.
Rank Predictors by Importance
Load the sample data.
Rank the predictors based on importance using 10 nearest neighbors.
[idx,weights] = relieff(X,Y,10);
Create a bar plot of predictor importance weights.
bar(weights(idx)) xlabel('Predictor rank') ylabel('Predictor importance weight')
Select the top 5 most important predictors. Find the columns of these predictors in
ans = 1×5 24 3 8 5 14
The 24th column of
X is the most important predictor of
Determine Important Categorical Predictors
Rank categorical predictors using
Load the sample data.
Convert the categorical predictor variables
Origin to numerical values, and combine them into an input matrix. Specify the response variable
X = [grp2idx(Mfg) grp2idx(Model) grp2idx(Origin)]; y = MPG;
Find the ranks and weights of the predictor variables using 10 nearest neighbors and treating the data in
X as categorical.
[idx,weights] = relieff(X,y,10,'categoricalx','on')
idx = 1×3 2 3 1
weights = 1×3 -0.0019 0.0501 0.0114
Model predictor is the most important in predicting
Mfg variable has a negative weight, indicating it is not a good predictor of
X — Predictor data
Predictor data, specified as a numeric matrix. Each row of
X corresponds to one observation, and each column
corresponds to one variable.
y — Response data
numeric vector | categorical vector | logical vector | character array | string array | cell array of character vectors
Response data, specified as a numeric vector, categorical vector, logical vector, character array, string array, or cell array of character vectors.
k — Number of nearest neighbors
positive integer scalar
Number of nearest neighbors, specified as a positive integer scalar.
Specify optional pairs of arguments as
the argument name and
Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name in quotes.
specifies 5 nearest neighbors and treats the response variable and predictor data as
method — Method for computing weights
Method for computing weights, specified as the comma-separated pair
'method' and either
'regression' is the default method.
'classification' is the default.
prior — Prior probabilities for each class
'empirical' (default) |
'uniform' | numeric vector | structure
Prior probabilities for each class, specified as the comma-separated
pair consisting of
'prior' and a value in this
|The class probabilities are determined from class
frequencies in |
|All class probabilities are equal.|
|numeric vector||One value exists for each distinct group name.|
updates — Number of observations for computing weights
'all' (default) | positive integer scalar
Number of observations to select at random for computing weights,
specified as the comma-separated pair consisting of
'updates' and either
a positive integer scalar. By default,
categoricalx — Categorical predictors flag
'off' (default) |
Categorical predictors flag, specified as the comma-separated pair
'categoricalx' and either
'off'. If you specify
all predictors in
X as categorical. Otherwise, it
treats all predictors in
X as numeric. You cannot
mix numeric and categorical predictors.
sigma — Distance scaling factor
numeric positive scalar
Distance scaling factor, specified as the comma-separated pair
'sigma' and a numeric positive scalar.
For observation i, influence on the predictor weight
from its nearest neighbor j is multiplied by . rank(i,j) is the position of the jth
observation among the nearest neighbors of the ith
observation, sorted by distance. The default is
for classification (all nearest neighbors have the same influence) and
50 for regression.
idx — Indices of predictors ordered by predictor importance
Indices of predictors in
X ordered by predictor
importance, returned as a numeric vector. For example, if
5, then the third most
important predictor is the fifth column in
weights — Weights of predictors
Weights of the predictors, returned as a numeric vector. The values in
weights have the same order as the predictors in
weights range from
1, with large positive
weights assigned to important predictors.
Predictor ranks and weights usually depend on
k. If you set
kto 1, then the estimates can be unreliable for noisy data. If you set
kto a value comparable with the number of observations (rows) in
relieffcan fail to find important predictors. You can start with
10and investigate the stability and reliability of
relieffranks and weights for various values of
relieffremoves observations with
ReliefF finds the weights of predictors in the case where
is a multiclass categorical variable. The algorithm penalizes the predictors that
give different values to neighbors of the same class, and rewards predictors that
give different values to neighbors of different classes.
ReliefF first sets all predictor weights Wj to 0. Then, the algorithm iteratively selects a random observation xr, finds the k-nearest observations to xr for each class, and updates, for each nearest neighbor xq, all the weights for the predictors Fj as follows:
If xr and xq are in the same class,
If xr and xq are in different classes,
Wji is the weight of the predictor Fj at the ith iteration step.
pyr is the prior probability of the class to which xr belongs, and pyq is the prior probability of the class to which xq belongs.
m is the number of iterations specified by
is the difference in the value of the predictor Fj between observations xr and xq. Let xrj denote the value of the jth predictor for observation xr, and let xqj denote the value of the jth predictor for observation xq.
For discrete Fj,
For continuous Fj,
drq is a distance function of the form
The distance is subject to the scaling
where rank(r,q) is the position of the qth observation among the nearest neighbors of the rth observation, sorted by distance. k is the number of nearest neighbors, specified by
k. You can change the scaling by specifying
RReliefF works with continuous
y. Similar to ReliefF,
RReliefF also penalizes the predictors that give different values to neighbors with
the same response values, and rewards predictors that give different values to
neighbors with different response values. However, RReliefF uses intermediate
weights to compute the final predictor weights.
Given two nearest neighbors, assume the following:
Wdy is the weight of having different values for the response y.
Wdj is the weight of having different values for the predictor Fj.
is the weight of having different response values and different values for the predictor Fj.
RReliefF first sets the weights Wdy, Wdj, , and Wj equal to 0. Then, the algorithm iteratively selects a random observation xr, finds the k-nearest observations to xr, and updates, for each nearest neighbor xq, all the intermediate weights as follows:
The i and i-1 superscripts denote the iteration step number. m is the number of iterations specified by
is the difference in the value of the continuous response y between observations xr and xq. Let yr denote the value of the response for observation xr, and let yq denote the value of the response for observation xq.
The and drq functions are the same as for ReliefF.
RReliefF calculates the predictor weights Wj after fully updating all the intermediate weights.
For more information, see .
 Kononenko, I., E. Simec, and M. Robnik-Sikonja. (1997). “Overcoming the
myopia of inductive learning algorithms with RELIEFF.” Retrieved from CiteSeerX:
 Robnik-Sikonja, M., and I.
Kononenko. (1997). “An adaptation of Relief for attribute estimation in
regression.” Retrieved from CiteSeerX:
 Robnik-Sikonja, M., and I. Kononenko. (2003). “Theoretical and empirical analysis of ReliefF and RReliefF.” Machine Learning, 53, 23–69.