This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Kernel (Covariance) Function Options

In supervised learning, it is expected that the points with similar predictor values xi, naturally have close response (target) values yi. In Gaussian processes, the covariance function expresses this similarity [1]. It specifies the covariance between the two latent variables f(xi) and f(xj), where both xi and xj are d-by-1 vectors. In other words, it determines how the response at one point xi is affected by responses at other points xj, ij, i = 1, 2, ..., n. The covariance function k(xi,xj) can be defined by various kernel functions. It can be parameterized in terms of the kernel parameters in vector θ. Hence, it is possible to express the covariance function as k(xi,xj|θ).

For many standard kernel functions, the kernel parameters are based on the signal standard deviation σf and the characteristic length scale σl. The characteristic length scales briefly define how far apart the input values xi can be for the response values to become uncorrelated. Both σl and σf need to be greater than 0, and this can be enforced by the unconstrained parametrization vector θ, such that

θ1=logσl,θ2=logσf.

The built-in kernel (covariance) functions with same length scale for each predictor are:

  • Squared Exponential Kernel

    This is one of the most commonly used covariance functions and is the default option for fitrgp. The squared exponential kernel function is defined as

    k(xi,xj|θ)=σf2exp[12(xi xj)T(xi xj)σl2].

    where σl is the characteristic length scale, and σf is the signal standard deviation.

  • Exponential Kernel

    You can specify the exponential kernel function using the 'KernelFunction','exponential' name-value pair argument. This covariance function is defined by

    k(xi,xj|θ)=σf2exp(rσl),

    where σl is the characteristic length scale and

    r=(xi xj)T(xi xj)

    is the Euclidean distance between xi and xj.

  • Matern 3/2

    You can specify the Matern 3/2 kernel function using the 'KernelFunction','matern32' name-value pair argument. This covariance function is defined by

    k(xi,xj|θ)=σf2(1+3rσl)exp(3rσl),

    where

    r=(xi xj)T(xi xj)

    is the Euclidean distance between xi and xj.

  • Matern 5/2

    You can specify the Matern 5/2 kernel function using the 'KernelFunction','matern52' name-value pair argument. The Matern 5/2 covariance function is defined as

    k(xi,xj)=σf2(1+5rσl+5r23σl2)exp(5rσl),

    where

    r=(xi xj)T(xi xj)

    is the Euclidean distance between xi and xj.

  • Rational Quadratic Kernel

    You can specify the rational quadratic kernel function using the 'KernelFunction','rationalquadratic' name-value pair argument. This covariance function is defined by

    k(xi,xj|θ)=σf2(1+r22ασl2)α,

    where σl is the characteristic length scale, α is a positive-valued scale-mixture parameter, and

    r=(xi xj)T(xi xj)

    is the Euclidean distance between xi and xj.

It is possible to use a separate length scale σm for each predictor m, m = 1, 2, ...,d. The built-in kernel (covariance) functions with a separate length scale for each predictor implement automatic relevance determination (ARD) [2]. The unconstrained parametrization θ in this case is

θm=logσm,form=1,2,...,dθd+1=logσf.

The built-in kernel (covariance) functions with separate length scale for each predictor are:

  • ARD Squared Exponential Kernel

    You can specify this kernel function using the 'KernelFunction','ardsquaredexponential' name-value pair argument. This covariance function is the squared exponential kernel function, with a separate length scale for each predictor. It is defined as

    k(xi,xj|θ)=σf2exp[12m=1d(ximxjm)2σm2].

  • ARD Exponential Kernel

    You can specify this kernel function using the 'KernelFunction','ardexponential' name-value pair argument. This covariance function is the exponential kernel function, with a separate length scale for each predictor. It is defined as

    k(xi,xj|θ)=σf2exp(r),

    where

    r=m=1d(ximxjm)2σm2.

  • ARD Matern 3/2

    You can specify this kernel function using the 'KernelFunction','ardmatern32' name-value pair argument. This covariance function is the Matern 3/2 kernel function, with a different length scale for each predictor. It is defined as

    k(xi,xj|θ)=σf2(1+3r)exp(3r),

    where

    r=m=1d(ximxjm)2σm2.

  • ARD Matern 5/2

    You can specify this kernel function using the 'KernelFunction','ardmatern52' name-value pair argument. This covariance function is the Matern 5/2 kernel function, with a different length scale for each predictor. It is defined as

    k(xi,xj|θ)=σf2(1+5r+53r2)exp(5r),

    where

    r=m=1d(ximxjm)2σm2.

  • ARD Rational Quadratic Kernel

    You can specify this kernel function using the 'KernelFunction','ardrationalquadratic' name-value pair argument. This covariance function is the rational quadratic kernel function, with a separate length scale for each predictor. It is defined as

    k(xi,xj|θ)=σf2(1+12αm=1d(ximxjm)2σm2)α.

You can specify the kernel function using the KernelFunction name-value pair argument in a call to fitrgp. You can either specify one of the built-in kernel parameter options, or specify a custom function. When providing the initial kernel parameter values for a built-in kernel function, input the initial values for signal standard deviation and the characteristic length scale(s) as a numeric vector. When providing the initial kernel parameter values for a custom kernel function, input the initial values the unconstrained parametrization vector θ. fitrgp uses analytical derivatives to estimate parameters when using a built-in kernel function, whereas when using a custom kernel function it uses numerical derivatives.

References

[1] Rasmussen, C. E. and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press. Cambridge, Massachusetts, 2006.

[2] Neal, R. M. Bayesian Learning for Neural Networks. Springer, New York. Lecture Notes in Statistics, 118, 1996.

See Also

|

Related Topics