MATLAB Examples

Test Ensemble Quality

This example uses a bagged ensemble so it can use all three methods of evaluating ensemble quality.

Contents

Generate an artificial dataset with 20 predictors. Each entry is a random number from 0 to 1. The initial classification is if and otherwise.

```rng(1,'twister') % for reproducibility X = rand(2000,20); Y = sum(X(:,1:5),2) > 2.5; ```

In addition, to add noise to the results, randomly switch 10% of the classifications:

```idx = randsample(2000,200); Y(idx) = ~Y(idx); ```

Independent Test Set

Create independent training and test sets of data. Use 70% of the data for a training set by calling cvpartition using the holdout option:

```cvpart = cvpartition(Y,'holdout',0.3); Xtrain = X(training(cvpart),:); Ytrain = Y(training(cvpart),:); Xtest = X(test(cvpart),:); Ytest = Y(test(cvpart),:); ```

Create a bagged classification ensemble of 200 trees from the training data:

```bag = fitcensemble(Xtrain,Ytrain,'Method','Bag','NumLearningCycles',200) ```
```bag = classreg.learning.classif.ClassificationBaggedEnsemble ResponseName: 'Y' CategoricalPredictors: [] ClassNames: [0 1] ScoreTransform: 'none' NumObservations: 1400 NumTrained: 200 Method: 'Bag' LearnerNames: {'Tree'} ReasonForTermination: 'Terminated normally after completing the requested number of training cycles.' FitInfo: [] FitInfoDescription: 'None' FResample: 1 Replace: 1 UseObsForLearner: [1400x200 logical] ```

Plot the loss (misclassification) of the test data as a function of the number of trained trees in the ensemble:

```figure; plot(loss(bag,Xtest,Ytest,'mode','cumulative')); xlabel('Number of trees'); ylabel('Test classification error'); ```

Cross Validation

Generate a five-fold cross-validated bagged ensemble:

```cv = fitcensemble(X,Y,'Method','Bag','NumLearningCycles',200,'Kfold',5) ```
```cv = classreg.learning.partition.ClassificationPartitionedEnsemble CrossValidatedModel: 'Bag' PredictorNames: {1x20 cell} ResponseName: 'Y' NumObservations: 2000 KFold: 5 Partition: [1x1 cvpartition] NumTrainedPerFold: [200 200 200 200 200] ClassNames: [0 1] ScoreTransform: 'none' ```

Examine the cross-validation loss as a function of the number of trees in the ensemble:

```figure; plot(loss(bag,Xtest,Ytest,'mode','cumulative')); hold on; plot(kfoldLoss(cv,'mode','cumulative'),'r.'); hold off; xlabel('Number of trees'); ylabel('Classification error'); legend('Test','Cross-validation','Location','NE'); ```

Cross validating gives comparable estimates to those of the independent set.

Out-of-Bag Estimates

Generate the loss curve for out-of-bag estimates, and plot it along with the other curves:

```figure; plot(loss(bag,Xtest,Ytest,'mode','cumulative')); hold on; plot(kfoldLoss(cv,'mode','cumulative'),'r.'); plot(oobLoss(bag,'mode','cumulative'),'k--'); hold off; xlabel('Number of trees'); ylabel('Classification error'); legend('Test','Cross-validation','Out of bag','Location','NE'); ```

The out-of-bag estimates are again comparable to those of the other methods.