Is it possible to self define training and validation sets in fitrgp?

10 views (last 30 days)
fitrgp supports cross validation during hyper-parameters optimizations. Is it possible to self define the training sets as well as the validation sets.
Since I am working on a time series and I am more interested in expolation rather than interpolation. So I am thinking to use the 'future' data as the validation set during model selection. Is it possible to do it while still applying fitrgp and not changing lots of the codes structure?

Accepted Answer

Sai Pavan
Sai Pavan on 17 Apr 2024 at 3:37
Hello,
I understand that you want to perform extrapolation of time series data by fitting a Gaussian process regression model on the past data to predict data points.
The "fitrgp" function allows for custom definition of training and validation sets during hyperparameter optimization, but to define specific future data as a validation set, we will need to manage the training and validation process more manually while still leveraging fitrgp for model fitting.
Please refer to the below workflow to achieve this:
  1. Manually split the time series data into training and validation sets ensuring that the validation set consists of data points that occur after all the points in your training set.
  2. Use "fitrgp" to fit a Gaussian process regression model to your training data. We can specify options for hyperparameter optimization if needed, but since we are handling validation manually, you won't use fitrgp's built-in cross-validation features here.
  3. After fitting the model on the training set, use the predict function of the GPR model to make predictions on the validation set. Then, assess the model's performance using appropriate metrics, such as mean squared error (MSE).
  4. For hyperparameter tuning, we might need to loop over different hyperparameter configurations manually, fitting a model with each configuration to the training set, evaluating it on the validation set, and selecting the configuration that results in the best performance according to your chosen metric.
Please refer to the below code snippet that illustrates the workflow mentioned above:
% Assuming 'data' is the time series and 'responses' are the corresponding values
N = 20; % Number of points to use for validation
trainData = data(1:end-N, :);
trainResponses = responses(1:end-N, :);
validationData = data(end-N+1:end, :);
validationResponses = responses(end-N+1:end, :);
% Fit a GPR model to the training data
gprMdl = fitrgp(trainData, trainResponses, 'BasisFunction', 'constant', 'KernelFunction', 'squaredexponential', ...
'FitMethod', 'exact', 'PredictMethod', 'exact');
% Make predictions on the validation set
validationPredictions = predict(gprMdl, validationData);
% Calculate the mean squared error on the validation set
mseValidation = mean((validationPredictions - validationResponses).^2);
disp(['Validation MSE: ', num2str(mseValidation)]);
Please refer to the below documentation to learn more about "fitrgp" function: https://www.mathworks.com/help/stats/fitrgp.html
Hope it helps!

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!