How many times does Holdout validation in Classification Learner App run on a data-set

3 views (last 30 days)
Hi, wanted to ask know if I choose the holdout validation scheme in Classification Leaner app how many times does it run on a data-set?? ..... can I somehow change some parameter to decide how many times it runs ?? ..... also does it partition the data randomly or in a fixed manner??.
  1 Comment
Anoop Somashekar
Anoop Somashekar on 30 Mar 2017
I presume you are asking about number of times the training is done during holdout validation scheme.While creating a model with holdout validation using ClassificationLearner App, the classifier is trained twice.
a) First the entire dataset is used to train the model. This is the model which is saved and returned.
b) Again holdout data is used to train the model which is then used for validation and reporting the accuracy.
The only purpose of the classifier that is trained on the hold out data is to estimate the out-of-sample accuracy of the model that is trained on the full dataset.
The model trained on the full dataset is the one that is expected to perform better on out-of-sample data, because it is fit to more data. The key point here is that the algorithm tries to optimize the expected accuracy on out-of-sample data.
Both classifiers are trained using the same hyper-parameters (i.e., options like “number of learners”). Because they use the same hyper-parameters, fitting to more data should only increase out-of-sample accuracy.

Sign in to comment.

Answers (1)

Drew
Drew on 31 Jan 2023
Lets say that, in the new session dialogue, you select to use 10% of the data for hold out validation. In newer releases of the Learner apps (for example, in R2022b), it is also possible to set aside some data for testing. So, lets assume that you also set aside 10% of the data for testing. Then, the Learner apps will build two models:
(1) A final model which is built with 90% of the data (80% train plus 10% validation data). This is the model that will be exported, and this is the model that will be used for any prediction results on the test data, that is, "Test Results".
(2) A validation model which is built with 80% of the data (80% train). This is the model that will be used for any prediction results on the validation data, that is, "Validation results".
The hold-out partition is randomly generated, based on the percentage of data requested.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!