Neighborhood Component Analysis (NCA) Feature Selection
Neighborhood component analysis (NCA) is a non-parametric method for selecting features with the goal of maximizing prediction accuracy of regression and classification algorithms. The Statistics and Machine Learning Toolbox™ functions
fsrnca perform NCA feature selection with regularization to learn feature weights for minimization of an objective function that measures the average leave-one-out classification or regression loss over the training data.
NCA Feature Selection for Classification
Consider a multi-class classification problem with a training set containing n observations:
where are the feature vectors, are the class labels, and c is the number of classes. The aim is to learn a classifier that accepts a feature vector and makes a prediction for the true label of .
Consider a randomized classifier that:
Randomly picks a point, , from as the ‘reference point’ for
Labels using the label of the reference point .
This scheme is similar to that of a 1-NN classifier where the reference point is chosen to be the nearest neighbor of the new point . In NCA, the reference point is chosen randomly and all points in have some probability of being selected as the reference point. The probability that point is picked from as the reference point for is higher if is closer to as measured by the distance function , where
and are the feature weights. Assume that
where is some kernel or a similarity function that assumes large values when is small. Suppose it is
as suggested in . The reference point for is chosen from , so sum of for all j must be equal to 1. Therefore, it is possible to write
Now consider the leave-one-out application of this randomized classifier, that is, predicting the label of using the data in , the training set excluding the point . The probability that point is picked as the reference point for is
The average leave-one-out probability of correct classification is the probability that the randomized classifier correctly classifies observation i using .
The average leave-one-out probability of correct classification using the randomized classifier can be written as
where is the regularization parameter. The regularization term drives many of the weights in to 0.
After choosing the kernel parameter in as 1, finding the weight vector can be expressed as the following minimization problem for given .
where f(w) = -F(w) and fi(w) = -Fi(w).
and the argument of the minimum does not change if you add a constant to an objective function. Therefore, you can rewrite the objective function by adding the constant 1.
where the loss function is defined as
The argument of the minimum is the weight vector that minimizes the classification error. You can specify a custom loss function using the
LossFunction name-value pair argument in the call to
NCA Feature Selection for Regression
fsrnca function performs NCA feature selection modified for regression. Given n observations
the only difference from the classification problem is that the response values are continuous. In this case, the aim is to predict the response given the training set .
Consider a randomized regression model that:
Randomly picks a point () from as the ‘reference point’ for
Sets the response value at equal to the response value of the reference point .
Again, the probability that point is picked from as the reference point for is
Now consider the leave-one-out application of this randomized regression model, that is, predicting the response for using the data in , the training set excluding the point . The probability that point is picked as the reference point for is
Let be the response value the randomized regression model predicts and be the actual response for . And let be a loss function that measures the disagreement between and . Then, the average value of is
After adding the regularization term, the objective function for minimization is:
The default loss function for NCA for regression is mean absolute deviation, but you can specify other loss functions, including a custom one, using the
LossFunction name-value pair argument in the call to
Impact of Standardization
The regularization term derives the weights of irrelevant predictors to zero. In the objective functions for NCA for classification or regression, there is only one regularization parameter for all weights. This fact requires the magnitudes of the weights to be comparable to each other. When the feature vectors in are in different scales, this might result in weights that are in different scales and not meaningful. To avoid this situation, standardize the predictors to have zero mean and unit standard deviation before applying NCA. You can standardize the predictors using the
'Standardize',true name-value pair argument in the call to
Choosing the Regularization Parameter Value
It is usually necessary to select a value of the regularization parameter by calculating the accuracy of the randomized NCA classifier or regression model on an independent test set. If you use cross-validation instead of a single test set, select the value that minimizes the average loss across the cross-validation folds. For examples, see Tune Regularization Parameter to Detect Features Using NCA for Classification and Tune Regularization Parameter in NCA for Regression.
 Yang, W., K. Wang, W. Zuo. "Neighborhood Component Feature Selection for High-Dimensional Data." Journal of Computers. Vol. 7, Number 1, January, 2012.