Subset of Data Approximation for GPR Models
Training a GPR model with the exact method (when
'Exact') requires the inversion of an n-by-n matrix. Therefore, the computational complexity is O(kn3), where k is the number of function evaluations required for estimating , , and , and n is the number of observations. For large n, estimation of parameters or computing predictions can be very expensive.
One simple way to solve the computational complexity problem with large data sets is to select m < n observations out of n and then apply exact GPR model to these m points to estimate , , and while ignoring the other (n – m) points. This smaller subset is known as the active set or inducing input set. And this approximation method is called the Subset of Data (SD) method.
The computational complexity when using SD method is O(km3), where k is the number of function evaluations and m is the active set size. The storage requirements are O(m2) since only a part of the full kernel matrix needs to be stored in memory.
You can specify the SD method for parameter estimation by using the
'FitMethod','sd' name-value pair argument in the call to
fitrgp. To specify the SD method for prediction, use the
'PredictMethod','sd' name-value pair argument.