Classification and feature selection techniques are among the most commonly used mathematical approaches for analysis and interpretation of biological data. The ultimate goal of classification is to design a classifier based on available data that is able to correctly assign class membership of new cases. Depending on the nature of the data, the classifier may fail to correctly assign the membership of new objects resulting in classification error. The most popular error estimation techniques (resubstitution, bootstrapping, cross-validation) strikingly vary in performance, which is defined based on the accuracy of error estimation and the computing speed. Recently, a new bolstered error estimation technique has been proposed that optimally combines speed and accuracy . It uses a Monte-Carlo sampling based algorithm for classification for the general case, but for the case of linear classification an analytic solution may be applied [1,2].
This is geometric approach for bolstered error estimation based on calculation of N-dimensional spherical cap volume [forluma is given in 3]. Geometric bolstered error estimation algorithms are very fast error estimation techniques characterized by accuracy comparable with LOO and having lower variance. These algorithms are useful for analyzing extremely large numbers of features and may find their applications in wide fields of -omics data analysis.
1. Braga-Neto U, Dougherty E (2004) Bolstered error estimation. Pattern Recognition 37: 1267–1281. doi:10.1016/j.patcog.2003.08.017.
2. Okun O (2011) Feature selection and ensemble methods for bioinformatics: algorithmic classification and implementations. Hershey PA: Medical Information Science Reference. 445 p.
3. Li S (2011) Concise Formulas for the Area and Volume of a Hyperspherical Cap. Asian Journal of Mathematics & Statistics 4: 66–70. doi:10.3923/ajms.2011.66.70.