This example shows how to train multiple models in Classification Learner, and determine the best-performing models based on their validation accuracy. Check the test accuracy for the best-performing models trained on the full data set, including training and validation data.
In the MATLAB® Command Window, load the
ionosphere data set,
and create a table containing the data. Separate the table into training and
load ionosphere tbl = array2table(X); tbl.Y = Y; rng('default') % For reproducibility of the data split partition = cvpartition(Y,'Holdout',0.15); idxTrain = training(partition); % Indices for the training set tblTrain = tbl(idxTrain,:); tblTest = tbl(~idxTrain,:);
Open Classification Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.
On the Classification Learner tab, in the File section, click New Session and select From Workspace.
In the New Session from Workspace dialog box, select the
tblTrain table from the Data Set
As shown in the dialog box, the app selects the response and predictor
variables. The default response variable is
protect against overfitting, the default validation option is 5-fold
cross-validation. For this example, do not change the default settings.
To accept the default options and continue, click Start Session.
Train all preset models. On the Classification Learner tab, in the Model Type section, click the arrow to open the gallery. In the Get Started group, click All. In the Training section, click Train. The app trains one of each preset model type and displays the models in the Models pane.
If you have Parallel Computing Toolbox™, you can train all the models (All) simultaneously by selecting the Use Parallel button in the Training section before clicking Train. After you click Train, the Opening Parallel Pool dialog box opens and remains open while the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, the app trains the models simultaneously.
Sort the trained models based on the validation accuracy. In the
Models pane, open the Sort by list
In the Models pane, click the star icons next to the three models with the highest validation accuracy. The app highlights the highest validation accuracy by outlining it in a box. In this example, the trained Medium Gaussian SVM model has the highest validation accuracy.
The app displays a validation confusion matrix for the first model (model 1.1). Blue values indicate correct classifications, and red values indicate incorrect classifications. The Models pane on the left shows the validation accuracy for each model.
Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example.
Check the test set performance of the best-performing models. Begin by importing test data into the app.
On the Classification Learner tab, in the Testing section, click Test Data and select From Workspace.
In the Import Test Data dialog box, select the
tblTest table from the Test Data Set
As shown in the dialog box, the app identifies the response and predictor variables.
Compute the accuracy of the best preset models on the
tblTest data. For convenience, compute the test set
accuracy for all models at once. On the Classification
Learner tab, in the Testing section, click
Test All and select Test All.
The app computes the test set performance of the model trained on the full data
set, including training and validation data.
Sort the models based on the test set accuracy. In the
Models pane, open the Sort by list
Accuracy (Test). The app still outlines
the metric for the model with the highest validation accuracy, despite
displaying the test accuracy.
Visually check the test set performance of the models. For each of the starred models, select the model in the Models pane. On the Classification Learner tab, in the Plots section, click the arrow to open the gallery, and then click Confusion Matrix (Test) in the Test Results group.
Rearrange the layout of the plots to better compare them. First, close the
validation confusion matrix for Model 1.1. Then, click the
Document Actions arrow located to the far right of the model plot tabs. Select
Tile All option and specify a 1-by-3 layout.
Click the Hide plot options button in the top right of the plots to make more
room for the plots.
In this example, the trained SVM Kernel model remains one of the best-performing models on the test set data.
To return to the original layout, you can click the Layout button in the Plots section and select Single model (Default).
Compare the validation and test accuracy for the trained SVM Kernel model. In the Current Model Summary pane, compare the Accuracy (Validation) value under Training Results to the Accuracy (Test) value under Test Results. In this example, the two values are close, which indicates that the validation accuracy is a good estimate of the test accuracy for this model.