Cross-validate Gaussian process regression model
cvMdl = crossval(gprMdl)
cvmdl = crossval(gprMdl,Name,Value)
the partitioned model,
cvMdl = crossval(
cvMdl, built from the
Gaussian process regression (GPR) model,
using 10-fold cross validation.
cvmdl is a
gprMdl is a
the partitioned model,
cvmdl = crossval(
cvmdl, with additional
options specified by one or more
arguments. For example, you can specify the number of folds or the
fraction of the data to use for testing.
gprMdl— Gaussian process regression model
Gaussian process regression model, specified as a
object. You cannot call
crossval on a compact regression
comma-separated pairs of
the argument name and
Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
'CVPartition'— Random partition for a k-fold cross validation
'Holdout'— Fraction of data to use for testing
Fraction of the data to use for testing in holdout validation,
specified as the comma-separated pair consisting of
a scalar value in the range from 0 to 1. If you specify
1. Randomly reserves p*100% of the data as validation data, and trains the model using the rest of the data
2. Stores the compact, trained model in
'Holdout', 0.3 uses 30% of the data
for testing and 70% of the data for training.
'KFold'— Number of folds
Number of folds to use in cross-validated GPR model, specified
as the comma-separated pair consisting of
a positive integer value greater than 1.
be greater than 1. If you specify
1. Randomly partitions the data into k sets.
2. For each set, reserves the set as test data, and trains the model using the other k – 1 sets.
3. Stores the k compact, trained models in the cells of a k-by-1 cell array in
'KFold',5 uses 5 folds in cross-validation.
That is, for each fold, it uses that fold as test data, and trains
the model on the remaining 4 folds.
'Leaveout'— Indicator for leave-one-out cross-validation
Indicator for leave-one-out cross-validation, specified as the
comma-separated pair consisting of
'off'. If you
'Leaveout','on', then, for each of the n observations,
1. Reserves the observation as test data, and trains the model using the other n – 1 observations.
2. Stores the n compact, trained models in the cells of a n-by-1 cell array in
cvgprMdl— Partitioned Gaussian process regression model
Partitioned Gaussian process regression model, returned as a
The dataset has 506 observations. The first 13 columns contain the predictor values and the last column contains the response values. The goal is to predict the median value of owner-occupied homes in suburban Boston as a function of 13 predictors.
Load the data and define the response vector and the predictor matrix.
load('housing.data'); X = housing(:,1:13); y = housing(:,end);
Fit a GPR model using the squared exponential kernel function with separate length scale for each predictor. Standardize the predictor variables.
gprMdl = fitrgp(X,y,'KernelFunction','ardsquaredexponential','Standardize',1);
Create a cross-validation partition for data using predictor 4 as a grouping variable.
rng('default') % For reproducibility cvp = cvpartition(X(:,4),'kfold',10);
Create a 10-fold cross-validated model using the partitioned
cvgprMdl = crossval(gprMdl,'CVPartition',cvp);
Compute the regression loss for in-fold observations using models trained on out-of-fold observations.
L = kfoldLoss(cvgprMdl)
L = 9.5299
Predict the response for in-fold observations, i.e. observations not used for training.
ypred = kfoldPredict(cvgprMdl);
For every fold,
kfoldPredict predicts responses
for observations in that fold using the models trained on out-of-fold
Plot the actual responses and prediction data.
plot(y,'r.'); hold on; plot(ypred,'b--.'); axis([0 510 -15 65]); legend('True response','GPR prediction','Location','Best'); hold off;
Read the data into a
tbl = readtable('abalone.data','Filetype','text','ReadVariableNames',false);
The dataset has 4177 observations. The goal is to predict the age of abalone from 8 physical measurements.
Fit a GPR model using the subset of regressors (
method for parameter estimation and fully independent conditional
fic) method for prediction. Standardize the predictors
and use a squared exponential kernel function with a separate length
scale for each predictor.
gprMdl = fitrgp(tbl,tbl(:,end),'KernelFunction','ardsquaredexponential',... 'FitMethod','sr','PredictMethod','fic','Standardize',1);
Cross-validate the model using 4-fold cross validation.
This partitions the data into 4 sets. For each set,
that set (25% of the data) as the test data, and trains the model
on the remaining 3 sets (75% of the data).
rng('default') % For reproducibility cvgprMdl = crossval(gprMdl,'KFold',4);
Compute the loss over individual folds.
L = kfoldLoss(cvgprMdl,'mode','individual')
L = 4.3669 4.6896 4.0565 4.3162
Compute the average cross-validated loss on over all folds. The default is the mean squared error.
L2 = kfoldLoss(cvgprMdl)
L2 = 4.3573
This is equal to the mean loss over individual folds.
mse = mean(L)
mse = 4.3573
You can only use one of the name-value pair arguments at a time.
You cannot compute the prediction intervals for a cross-validated model.
Alternatively, you can train a cross-validated model using the
related name-value pair arguments in
If you supply a custom
'ActiveSet' in the
fitrgp, then you cannot cross validate
the GPR model.
 Harrison, D. and D.L., Rubinfeld. "Hedonic prices and the demand for clean air." J. Environ. Economics & Management. Vol.5, 1978, pp. 81-102.
 Warwick J. N., T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford. "The Population Biology of Abalone (_Haliotis_ species) in Tasmania. I. Blacklip Abalone (_H. rubra_) from the North Coast and Islands of Bass Strait." Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288), 1994.
 S. Waugh. "Extending and Benchmarking Cascade-Correlation", PhD Thesis. Computer Science Department, University of Tasmania, 1995.
 Lichman, M. UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science, 2013. http://archive.ics.uci.edu/ml.