An instance of response y from a Gaussian process regression (GPR) model can be modeled as
Hence, making predictions for new data from a GPR model requires:
Knowledge of the coefficient vector, , of fixed basis functions
Ability to evaluate the covariance function for arbitrary and , given the kernel parameters or hyperparameters, .
Knowledge of the noise variance that appears in the density
That is, one needs first to estimate , , and from the data .
One approach for estimating the parameters , , and of a GPR model is by maximizing the likelihood as a function of , , and . That is, if , , and are the estimates of , , and , respectively, then:
the marginal log likelihood function is as follows:
where is the vector of explicit basis functions, and is the covariance function matrix (for more information, see Gaussian Process Regression Models).
To estimate the parameters, the software first computes , which maximizes the log likelihood function with respect to for given and . It then uses this estimate to compute the -profiled likelihood:
The estimate of for given , and is
Then, the -profiled log likelihood is given by
The software then maximizes the -profiled log-likelihood over , and to find their estimates.
Making probabilistic predictions from a GPR model with known parameters requires the density . Using the definition of conditional probabilities, one can write:
To find the joint density in the numerator, it is necessary to introduce the latent variables and corresponding to , and , respectively. Then, it is possible to use the joint distribution for , , , and to compute :
Gaussian process models assume that each response only depends on the corresponding latent variable and the feature vector . Writing as a product of conditional densities and based on this assumption produces:
After integrating with respect to , the result only depends on and :
Again using the definition of conditional probabilities,
it is possible to write as follows:
Using the facts that
one can rewrite as follows:
It is also possible to show that
Hence, the required density is:
It can be shown that
After the integration and required algebra, the density of the new response at a new point , given , is found as
The expected value of prediction at a new point given , , and parameters , , and is
Training a GPR model with the exact method (when
requires the inversion of an n-by-n kernel
The memory requirement for this step scales as O(n^2)
be stored in memory. One evaluation of scales
as O(n^3). Therefore, the computational complexity
is O(k*n^3), where k is
the number of function evaluations needed for maximization and n is
the number of observations.
Making predictions on new data involves the computation of . If prediction intervals are desired, this step could also involve the computation and storage of the Cholesky factor of for later use. The computational complexity of this step using the direct computation of is O(n^3) and the memory requirement is O(n^2).
Hence, for large n, estimation of parameters or computing predictions can be very expensive. The approximation methods usually involve rearranging the computation so as to avoid the inversion of an n-by-n matrix. For the available approximation methods, please see the related links at the bottom of the page.
 Rasmussen, C. E. and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press. Cambridge, Massachusetts, 2006.