[1] Agresti, A. *Categorical Data
Analysis*, 2nd Ed. John Wiley & Sons, Inc.: Hoboken,
NJ, 2002.

[2] Allwein, E., R. Schapire, and Y. Singer.
"Reducing multiclass to binary: A unifying approach for margin
classiﬁers." *Journal of Machine Learning
Research*. Vol. 1, 2000, pp. 113–141.

[3] Alpaydin, E. "Combined 5 x 2 CV F
Test for Comparing Supervised Classification Learning Algorithms." *Neural
Computation*, Vol. 11, No. 8, pp. 1885–1992, 1999.

[4] Blackard, J. A. and D. J. Dean. *Comparative
accuracies of artificial neural networks and discriminant analysis
in predicting forest cover types from cartographic variables.* Computers
and Electronics in Agriculture 24, pp. 131–151, 1999.

[5] Bottou, L., and Chih-Jen Lin. *Support
Vector Machine Solvers*. Available at `http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.64.4209`

.

&rep=rep1&type=pdf

[6] Bouckaert. R. "Choosing Between Two
Learning Algorithms Based on Calibrated Tests." *International
Conference on Machine Learning*, pp. 51–58, 2003.

[7] Bouckaert, R. and E. Frank. "Evaluating
the Replicability of Significance Tests for Comparing Learning Algorithms." *In
Advances in Knowledge Discovery and Data Mining, 8th Pacific-Asia
Conference*, pp. 3–12, 2004.

[8] Breiman, L. *Bagging Predictors.* Machine
Learning 26, pp. 123–140, 1996.

[9] Breiman, L. *Random Forests.* Machine
Learning 45, pp. 5–32, 2001.

[10] Breiman, L. `http://www.stat.berkeley.edu/~breiman/RandomForests/`

[11] Breiman, L., et al. *Classification
and Regression Trees.* Chapman & Hall, Boca Raton,
1993.

[12] Christianini, N., and J. Shawe-Taylor. *An
Introduction to Support Vector Machines and Other Kernel-Based Learning
Methods*. Cambridge University Press, Cambridge, UK, 2000.

[13] Dietterich, T. "Approximate statistical
tests for comparing supervised classification learning algorithms." *Neural
Computation*, Vol. 10, No. 7: pp. 1895–1923, 1998.

[14] Dietterich, T., and G. Bakiri. "Solving
Multiclass Learning Problems Via Error-Correcting Output Codes." *Journal
of Artificial Intelligence Research*. Vol. 2, 1995, pp.
263–286.

[15] Escalera, S., O. Pujol, and P. Radeva.
"On the decoding process in ternary error-correcting output
codes." *IEEE Transactions on Pattern Analysis and
Machine Intelligence*. Vol. 32, Issue 7, 2010, pp. 120–134.

[16] Escalera, S., O. Pujol, and P. Radeva.
"Separability of ternary codes for sparse designs of error-correcting
output codes." *Pattern Recogn*. Vol.
30, Issue 3, 2009, pp. 285–297.

[17] Fan, R.-E., P.-H. Chen, and C.-J. Lin. "Working
set selection using second order information for training support
vector machines." *Journal of Machine Learning Research*,
Vol 6, 2005, pp. 1889–1918.

[18] Fagerlan, M.W., S Lydersen, P. Laake. "The
McNemar Test for Binary Matched-Pairs Data: Mid-p and Asymptotic Are
Better Than Exact Conditional." *BMC Medical Research
Methodology*. Vol. 13, 2013, pp. 1–8.

[19] Freund, Y. *A more robust boosting
algorithm.* arXiv:0905.2138v1, 2009.

[20] Freund, Y. and R. E. Schapire. *A
Decision-Theoretic Generalization of On-Line Learning and an Application
to Boosting.* J. of Computer and System Sciences, Vol.
55, pp. 119–139, 1997.

[21] Friedman, J. *Greedy function
approximation: A gradient boosting machine.* Annals of
Statistics, Vol. 29, No. 5, pp. 1189–1232, 2001.

[22] Friedman, J., T. Hastie, and R. Tibshirani. *Additive
logistic regression: A statistical view of boosting.* Annals
of Statistics, Vol. 28, No. 2, pp. 337–407, 2000.

[23] Hastie, T., and R. Tibshirani. "Classification
by Pairwise Coupling." *Annals of Statistics*.
Vol. 26, Issue 2, 1998, pp. 451–471.

[24] Hastie, T., R. Tibshirani, and J. Friedman. *The
Elements of Statistical Learning*, second edition. Springer,
New York, 2008.

[25] Ho, C. H. and C. J. Lin. "Large-Scale
Linear Support Vector Regression." *Journal of Machine
Learning Research*, Vol. 13, 2012, pp. 3323–3348.

[26] Ho, T. K. *The random subspace
method for constructing decision forests.* IEEE Transactions
on Pattern Analysis and Machine Intelligence, Vol. 20, No. 8, pp.
832–844, 1998.

[27] Hsieh, C. J., K. W. Chang, C. J. Lin,
S. S. Keerthi, and S. Sundararajan. "A Dual Coordinate Descent
Method for Large-Scale Linear SVM." *Proceedings
of the 25th International Conference on Machine Learning, ICML '08*,
2001, pp. 408–415.

[28] Hsu, Chih-Wei, Chih-Chung Chang, and Chih-Jen
Lin. *A Practical Guide to Support Vector Classification*.
Available at `http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf`

.

[29] Hu, Q., X. Che, L. Zhang, and D. Yu. "Feature
Evaluation and Selection Based on Neighborhood Soft Margin." *Neurocomputing*.
Vol. 73, 2010, pp. 2114–2124.

[30] Kecman V., T. -M. Huang, and M. Vogt. "Iterative
Single Data Algorithm for Training Kernel Machines from Huge Data
Sets: Theory and Performance." In *Support Vector
Machines: Theory and Applications*. Edited by Lipo Wang,
255–274. Berlin: Springer-Verlag, 2005.

[31] Kohavi, R. "Scaling Up the Accuracy of Naive-Bayes
Classifiers: a Decision-Tree Hybrid." *Proceedings
of the Second International Conference on Knowledge Discovery and
Data Mining*, 1996.

[32] Lancaster, H.O. "Significance Tests
in Discrete Distributions." *JASA*, Vol.
56, Number 294, 1961, pp. 223–234.

[33] Langford, J., L. Li, and T. Zhang. "Sparse
Online Learning Via Truncated Gradient." *J. Mach.
Learn. Res.*, Vol. 10, 2009, pp. 777–801.

[34] McNemar, Q. "Note on the Sampling
Error of the Difference Between Correlated Proportions or Percentages." *Psychometrika*,
Vol. 12, Number 2, 1947, pp. 153–157.

[35] Mosteller, F. "Some Statistical Problems
in Measuring the Subjective Response to Drugs." *Biometrics*,
Vol. 8, Number 3, 1952, pp. 220–226.

[36] Nocedal, J. and S. J. Wright. *Numerical
Optimization*, 2nd ed., New York: Springer, 2006.

[37] Schapire, R. E. et al. *Boosting
the margin: A new explanation for the effectiveness of voting methods.* Annals
of Statistics, Vol. 26, No. 5, pp. 1651–1686, 1998.

[38] Schapire, R., and Y. Singer. *Improved
boosting algorithms using confidence-rated predictions.* Machine
Learning, Vol. 37, No. 3, pp. 297–336, 1999.

[39] Shalev-Shwartz, S., Y. Singer, and N.
Srebro. "Pegasos: Primal Estimated Sub-Gradient Solver for
SVM." *Proceedings of the 24th International Conference
on Machine Learning, ICML '07*, 2007, pp. 807–814.

[40] Seiffert, C., T. Khoshgoftaar, J. Hulse,
and A. Napolitano. *RUSBoost: Improving clasification performance
when training data is skewed.* 19th International Conference
on Pattern Recognition, pp. 1–4, 2008.

[41] Warmuth, M., J. Liao, and G. Ratsch. *Totally
corrective boosting algorithms that maximize the margin.* Proc.
23rd Int'l. Conf. on Machine Learning, ACM, New York, pp. 1001–1008,
2006.

[42] Wu, T. F., C. J. Lin, and R. Weng. "Probability
Estimates for Multi-Class Classification by Pairwise Coupling." *Journal
of Machine Learning Research*. Vol. 5, 2004, pp. 975–1005.

[43] Wright, S. J., R. D. Nowak, and M. A. T. Figueiredo.
"Sparse Reconstruction by Separable Approximation." *Trans.
Sig. Proc.*, Vol. 57, No 7, 2009, pp. 2479–2493.

[44] Xiao, Lin. "Dual Averaging Methods
for Regularized Stochastic Learning and Online Optimization." *J.
Mach. Learn. Res.*, Vol. 11, 2010, pp. 2543–2596.

[45] Xu, Wei. "Towards Optimal One Pass
Large Scale Learning with Averaged Stochastic Gradient Descent." *CoRR*,
abs/1107.2490, 2011.

[46] Zadrozny, B. "Reducing Multiclass
to Binary by Coupling Probability Estimates." *NIPS
2001: Proceedings of Advances in Neural Information Processing Systems
14*, 2001, pp. 1041–1048.

[47] Zadrozny, B., J. Langford, and N. Abe. *Cost-Sensitive
Learning by Cost-Proportionate Example Weighting.* CiteSeerX.
[Online] 2003. `http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.9780`

[48] Zhou, Z.-H. and X.-Y. Liu. *On
Multi-Class Cost-Sensitive Learning.* CiteSeerX. [Online]
2006. `http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.92.9999`

Was this topic helpful?