Feature selection is a dimensionality reduction technique that selects only a subset of measured features (predictor variables) that provide the best predictive power in modeling the data. It is particularly useful when dealing with very high-dimensional data or when modeling with all features is undesirable.
Feature selection can be used to:
- Improve the accuracy of a machine learning algorithm
- Boost the performance on very high-dimensional data
- Improve model interpretability
- Prevent overfitting
There are several common approaches to feature selection:
- Stepwise regression sequentially adds or removes features until there is no improvement in prediction; used with linear regression or generalized linear regression algorithms. Similarly, sequential feature selection for any supervised learning sequentially builds up a feature set algorithm until accuracy (or a custom performance measure) stop improving.
- Automated feature selection such as neighborhood component analysis (NCA) identifies a subset of features that maximize classification performance based on their predictive power.
- Boosted and bagged decision trees are ensemble methods that compute variable importance from out-of-bag estimates.
- Regularization (lasso and elastic nets) is a shrinkage estimator used to remove redundant features by reducing their weights (coefficients) to zero.
Another dimensionality reduction approach is to use feature extraction or feature transformation techniques, which transform existing features into new features (predictor variables) with the less descriptive features dropped.
Approaches to feature transformation include:
- Principal component analysis (PCA), used to summarize data in fewer dimensions by projection onto a unique orthogonal basis
- Factor analysis, used to build explanatory models of data correlations
- Nonnegative matrix factorization, used when model terms must represent non-negative quantities, such as physical quantities