MATLAB Examples

Tutorial for adaboost Package (AdaBoost_samme with two_level_decision_tree)

By Jarek Tuszynski

Adaboost package consists of two multi-class adaboost classifiers:

  • AdaBoost_samme - multi-class extension to classic adaboost.M1 algorithm (which was for two-class problems) which was first described in a paper by Ji Zhu, Saharon Rosset, Hui Zou and Trevor Hastie, “Multi-class AdaBoost”, January 12, 2006. https://web.stanford.edu/~hastie/Papers/samme.pdf
  • AdaBoost_mult - solves same problem using a bank of two-class adaboost classifiers. A three class problem will use three 2-class classifiers solving class 1 vs. 2 & 3, class 2 vs. 1 & 3 and class 3 vs. 1 and 2 problems, than each sample is tested with each of the three classifiers and class is assigned based on the one with the maximum score.

Boosting classifiers work by using a multiple "weak-learner" classifiers. In this package we provide two weak-learner classifiers:

  • decision_stump - a single node decision "tree". For two-class problems decision_stump class calls "train_stump_2" function which performs exhaustive search of all possible thresholds for all features and picks the optimal one. For multi-class problems "train_stump_N" function is called which calls "train_stump_2" for all pairs of classes to pick the optimal one.
  • two_level_decision_tree - three nodes in two layers decision "tree" class.

This demo concentrates on AdaBoost_samme class with two_level_decision_tree weak learner

Contents

Change History

Licence

The package is distributed under BSD License

format compact; % viewing preference
clear variables;
close all
type('license.txt')
Copyright (C) 2011 by Kenneth Morton and contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

Read in test file

Cardiotocography (CTG) Data Set has measurements of fetal heart rate (FHR) and uterine contraction (UC) features on cardiotocograms classified by expert obstetricians as: normal, suspect and pathologic Paper: Ayres de Campos et al. (2000) SisPorto 2.0 A Program for Automated Analysis of Cardiotocograms. J Matern Fetal Med 5:311-318 URL: https://archive.ics.uci.edu/ml/datasets/Cardiotocography

[~, ~, ctg] = xlsread('CTG-1.xlsx', 'Raw Data');
X = cell2mat(ctg(3:end, 4:end-2));
y = cell2mat(ctg(3:end, end));
colLabel = ctg(1, 4:end-2);

verbose = true;
classifier = AdaBoost_samme(two_level_decision_tree, verbose); % blank classifier
nTree = 20;
C = classifier.train(X, y, [], nTree);
  1) X( 35)>  0.500 ? (X( 10)> 68.500 ? 2 : 3) : (X( 33)>  0.500 ? 1 : 3): weight(  1)=3.155; stump accuracy(  1)=  92.1%; overall accuracy=  92.1% 167 misclassified
  2) X(  9)>  0.550 ? (X( 30)>  0.500 ? 1 : 2) : (X( 34)>  0.500 ? 2 : 3): weight(  2)=2.352; stump accuracy(  2)=  87.5%; overall accuracy=  92.1% 167 misclassified
  3) X( 34)>  0.500 ? (X(  2)>359.000 ? 2 : 3) : (X( 33)>  0.500 ? 2 : 3): weight(  3)=2.206; stump accuracy(  3)=  22.1%; overall accuracy=  93.6% 137 misclassified
  4) X( 32)>  0.500 ? (X( 18)>172.500 ? 1 : 2) : (X( 34)>  0.500 ? 1 : 3): weight(  4)=2.498; stump accuracy(  4)=  80.1%; overall accuracy=  95.2% 101 misclassified
  5) X( 33)>  0.500 ? (X(  1)>  0.000 ? 3 : 3) : (X( 10)> 68.500 ? 2 : 3): weight(  5)=1.485; stump accuracy(  5)=  20.9%; overall accuracy=  92.9% 152 misclassified
  6) X( 33)>  0.500 ? (X(  1)>  0.000 ? 3 : 3) : (X( 34)>  0.500 ? 1 : 3): weight(  6)=1.757; stump accuracy(  6)=  86.1%; overall accuracy=  95.3% 99 misclassified
  7) X( 10)> 68.500 ? (X( 26)>  0.500 ? 3 : 1) : (X( 33)>  0.500 ? 2 : 3): weight(  7)=1.564; stump accuracy(  7)=  21.1%; overall accuracy=  92.9% 152 misclassified
  8) X( 34)>  0.500 ? (X( 24)> 45.500 ? 3 : 2) : (X( 33)>  0.500 ? 1 : 3): weight(  8)=1.559; stump accuracy(  8)=  86.1%; overall accuracy=  95.4% 98 misclassified
  9) X( 10)> 68.500 ? (X( 26)>  0.500 ? 3 : 1) : (X( 33)>  0.500 ? 2 : 3): weight(  9)=1.467; stump accuracy(  9)=  21.1%; overall accuracy=  92.9% 151 misclassified
 10) X( 35)>  0.500 ? (X( 10)> 68.500 ? 2 : 3) : (X( 34)>  0.500 ? 1 : 3): weight( 10)=1.587; stump accuracy( 10)=  90.3%; overall accuracy=  95.4% 98 misclassified
 11) X(  8)> 79.500 ? (X( 34)>  0.500 ? 1 : 3) : (X( 33)>  0.500 ? 2 : 3): weight( 11)=1.500; stump accuracy( 11)=  20.4%; overall accuracy=  93.0% 148 misclassified
 12) X( 14)>  1.500 ? (X( 33)>  0.500 ? 2 : 3) : (X( 35)>  0.500 ? 1 : 3): weight( 12)=1.516; stump accuracy( 12)=  81.1%; overall accuracy=  95.5% 96 misclassified
 13) X(  8)> 79.500 ? (X( 34)>  0.500 ? 1 : 3) : (X( 10)> 68.500 ? 2 : 3): weight( 13)=1.429; stump accuracy( 13)=  17.1%; overall accuracy=  94.4% 119 misclassified
 14) X( 30)>  0.500 ? (X(  2)>294.000 ? 1 : 2) : (X( 34)>  0.500 ? 1 : 3): weight( 14)=2.195; stump accuracy( 14)=  84.1%; overall accuracy=  98.6% 30 misclassified
 15) X( 33)>  0.500 ? (X(  1)>  0.000 ? 3 : 3) : (X( 10)> 68.500 ? 2 : 3): weight( 15)=1.584; stump accuracy( 15)=  20.9%; overall accuracy=  98.6% 30 misclassified
 16) X( 22)>107.500 ? (X( 34)>  0.500 ? 1 : 3) : (X(  8)> 24.500 ? 2 : 3): weight( 16)=1.636; stump accuracy( 16)=  85.0%; overall accuracy=  98.6% 29 misclassified
 17) X( 10)> 68.500 ? (X( 26)>  0.500 ? 3 : 1) : (X( 33)>  0.500 ? 2 : 3): weight( 17)=1.443; stump accuracy( 17)=  21.1%; overall accuracy=  98.5% 31 misclassified
 18) X( 35)>  0.500 ? (X( 10)> 68.500 ? 2 : 3) : (X( 34)>  0.500 ? 1 : 3): weight( 18)=1.481; stump accuracy( 18)=  90.3%; overall accuracy=  98.6% 29 misclassified
 19) X( 22)>105.500 ? (X( 33)>  0.500 ? 2 : 3) : (X( 33)>  0.500 ? 1 : 3): weight( 19)=1.481; stump accuracy( 19)=  19.2%; overall accuracy=  98.6% 30 misclassified
 20) X( 34)>  0.500 ? (X(  2)>359.000 ? 2 : 3) : (X( 33)>  0.500 ? 1 : 3): weight( 20)=1.459; stump accuracy( 20)=  86.1%; overall accuracy=  98.6% 30 misclassified

Track performance per iteration using calc_accuracy metchod

[accuracy, tpr] = C.calc_accuracy( X, y, nTree);
hold on
plot(tpr)
plot(accuracy, 'LineWidth',2)
xlabel('Number of iterations')
legend({'true positive rate of normal patients', 'true positive rate of suspect patients', ...
  'true positive rate of Pathologic patients', 'overall accuracy'},'Location','southeast')

Check which features are being used

[hx, hy] = C.feature_hist();
hy = 100*hy/sum(hy);  % normalize and convert to percent
hx(hy==0) = [];       % delete unused features
hy(hy==0) = [];
[hy, idx] = sort(hy); % sort
hx = hx(idx);
clf
barh(hy);
axis tight
xlabel('Percent used')
ylabel('Feature name')
ax = gca;
ax.YTick = 1:length(hx);
ax.YTickLabel = colLabel(hx);

Test export_model and import_model functions

[strList, labels, header] = C.export_model();
CC = classifier.import_model(strList, labels); % initialize new model
Y  = C .predict(X);
YY = CC.predict(X);
fprintf('Number of mismatches between models: %i\n', nnz(Y~=YY));
Number of mismatches between models: 0

Test save_adaboost_model and load_adaboost_model functions

save_adaboost_model(C, 'classifier.csv');
CC = load_adaboost_model(classifier, 'classifier.csv');
Y  = C .predict(X);
YY = CC.predict(X);
fprintf('Number of mismatches between models: %i\n', nnz(Y~=YY));
type('classifier.csv')
Number of mismatches between models: 0

Classifier#,nIter,label #1,label #2,label #3
1,20,1,2,3
dimension1, dimension2, dimension3, threshold1, threshold2, threshold3,  label 1, label 2, label 3, label 4, weights
35,10,33,5.000000e-01,6.850000e+01,5.000000e-01,2,3,1,3,3.155343e+00
9,30,34,5.500000e-01,5.000000e-01,5.000000e-01,1,2,2,3,2.351993e+00
34,2,33,5.000000e-01,3.590000e+02,5.000000e-01,2,3,2,3,2.206240e+00
32,18,34,5.000000e-01,1.725000e+02,5.000000e-01,1,2,1,3,2.497649e+00
33,1,10,5.000000e-01,0.000000e+00,6.850000e+01,3,3,2,3,1.484926e+00
33,1,34,5.000000e-01,0.000000e+00,5.000000e-01,3,3,1,3,1.757134e+00
10,26,33,6.850000e+01,5.000000e-01,5.000000e-01,3,1,2,3,1.564343e+00
34,24,33,5.000000e-01,4.550000e+01,5.000000e-01,3,2,1,3,1.558576e+00
10,26,33,6.850000e+01,5.000000e-01,5.000000e-01,3,1,2,3,1.466524e+00
35,10,34,5.000000e-01,6.850000e+01,5.000000e-01,2,3,1,3,1.587135e+00
8,34,33,7.950000e+01,5.000000e-01,5.000000e-01,1,3,2,3,1.500161e+00
14,33,35,1.500000e+00,5.000000e-01,5.000000e-01,2,3,1,3,1.516331e+00
8,34,10,7.950000e+01,5.000000e-01,6.850000e+01,1,3,2,3,1.429478e+00
30,2,34,5.000000e-01,2.940000e+02,5.000000e-01,1,2,1,3,2.195269e+00
33,1,10,5.000000e-01,0.000000e+00,6.850000e+01,3,3,2,3,1.583904e+00
22,34,8,1.075000e+02,5.000000e-01,2.450000e+01,1,3,2,3,1.635855e+00
10,26,33,6.850000e+01,5.000000e-01,5.000000e-01,3,1,2,3,1.443462e+00
35,10,34,5.000000e-01,6.850000e+01,5.000000e-01,2,3,1,3,1.481294e+00
22,33,33,1.055000e+02,5.000000e-01,5.000000e-01,2,3,1,3,1.480922e+00
34,2,33,5.000000e-01,3.590000e+02,5.000000e-01,2,3,1,3,1.459293e+00

Use cross validation to prevent training and testing on the same data

fprintf('Classification is %i%% accurate when training and testing on the same data.\n', ...
  round(100*mean(y==Y)));
classifier = AdaBoost_samme(two_level_decision_tree);
nFold = 10; % ten-fold validation
Y = cross_validation( classifier, X, y, nTree, nFold);
fprintf('Classification is %i%% accurate when using 10-fold cross validation\n', ...
  round(100*mean(y==Y)));
Classification is 99% accurate when training and testing on the same data.
Classification is 97% accurate when using 10-fold cross validation

Test classifier on 2-class "donut" problem

nSamp = 1000;
[Xb,Yb] = pol2cart((1:nSamp)*2*pi/nSamp,3);
X = 2*[randn(nSamp,2); randn(nSamp,2)+ [Xb' ,Yb'] ];
y = [1+zeros(nSamp,1); 2+zeros(nSamp,1)];
nTree   = 10;   % number of trees
C = classifier.train(X, y, [], nTree);
AdaBoost_demo_plot(C, X, y);

Test classifier on 3-class "donut" problem

nSamp = 1000;
[Xb,Yb] = pol2cart((1:nSamp)*2*pi/nSamp,3);
[Xc,Yc] = pol2cart((1:nSamp)*2*pi/nSamp,6);
X= 1.25*[randn(nSamp,2); randn(nSamp,2)+[Xb',Yb']; randn(nSamp,2)+[Xc',Yc'] ];
y = [1+zeros(nSamp,1); 2+zeros(nSamp,1); 3+zeros(nSamp,1)];
nTree = 20;   % number of trees
C = classifier.train(X, y, [], nTree);
Y = C.predict(X);
AdaBoost_demo_plot(C, X, y);

Test classifier on 2-class "diagonals" problem

nSamp = 1000;
Xa = (1:nSamp)*10/nSamp;
d = 6;
s = sign(randn(nSamp,1));
X = [randn(nSamp,2)+[Xa',Xa']; randn(nSamp,2)+[Xa',Xa'+s.*d]]-5;
y = [1+zeros(nSamp,1); 2+zeros(nSamp,1)];
nTree   = 20;   % number of trees
C = classifier.train(X, y, [], nTree);
AdaBoost_demo_plot(C, X, y);

Test classifier on 3-class "diagonals" problem

nSamp = 1000;
Xa = (1:nSamp)*10/nSamp;
d = 4;
X = [randn(nSamp,2)+[Xa',Xa']; randn(nSamp,2)+[Xa',Xa'+d]-1; randn(nSamp,2)+[Xa',Xa'-d]+1]-5;
y = [1+zeros(nSamp,1); 2+zeros(nSamp,1); 3+zeros(nSamp,1)];
nTree = 20;   % number of trees
C = classifier.train(X, y, [], nTree);
AdaBoost_demo_plot(C, X, y);