Quantcast

Documentation Center

  • Trial Software
  • Product Updates

Contents

ksdensity

Kernel smoothing function estimate

Syntax

  • [f,xi] = ksdensity(x) example
  • [f,xi] = ksdensity(x,pts) example
  • [f,xi] = ksdensity(x,pts,Name,Value) example
  • [f,xi,bw] = ksdensity(___) example
  • ksdensity(___) example
  • ksdensity(ax,___)

Description

example

[f,xi] = ksdensity(x) returns a probability density estimate, f, for the sample in the vector x. The estimate is based on a normal kernel function, and is evaluated at 100 equally spaced points, xi, that cover the range of the data in x.

ksdensity works best with continuously distributed samples.

example

[f,xi] = ksdensity(x,pts) returns a probability density estimate, f, for the sample in the vector x, evaluated at the specified values in vector pts. Here, the xi and pts vectors contain identical values.

example

[f,xi] = ksdensity(x,pts,Name,Value) returns a probability density estimate, f, for the sample in the vector x, with additional options specified by one or more Name,Value pair arguments.

For example, you can define the function type ksdensity evaluates, such as probability density, cumulative probability, survivor function, and so on. Or you can specify the bandwidth of the smoothing window.

example

[f,xi,bw] = ksdensity(___) also returns the bandwidth of the kernel smoothing window, bw. The default bandwidth is the optimal for normal densities.

example

ksdensity(___) plots the kernel smoothing function estimate.

ksdensity(ax,___) plots the results using axes with the handle, ax, instead of the current axes returned by gca.

Examples

expand all

Estimate Density

Generate a sample data set from a mixture of two normal distributions.

rng('default') % For reproducibility
x = [randn(30,1); 5+randn(30,1)];

Plot the estimated density.

[f,xi] = ksdensity(x);
figure()
plot(xi,f);

The density estimate shows the bimodality of the sample.

Estimate Cumulative Distribution Function at Specified Values

Load the sample data.

load hospital

Compute and plot the estimated cdf evaluated at a specified set of values.

pts = (min(hospital.Weight):2:max(hospital.Weight));
figure()
ecdf(hospital.Weight)
hold on
[f,xi,bw] = ksdensity(hospital.Weight,pts,'support','positive',...
	'function','cdf');
plot(xi,f,'-g','LineWidth',2)
legend('empirical cdf','kernel-bw:default','Location','NorthWest')
xlabel('Patient weights')
ylabel('Estimated cdf')

ksdensity seems to smooth the cumulative distribution function estimate too much. An estimate with a smaller bandwidth might produce a closer estimate to the empirical cumulative distribution function.

Return the bandwidth of the smoothing window.

bw
bw =

    0.1070

Plot the cumulative distribution function estimate using a smaller bandwidth.

[f,xi] = ksdensity(hospital.Weight,pts,'support','positive',...
	'function','cdf','bandwidth',0.05);
plot(xi,f,'--r','LineWidth',2)
legend('empirical cdf','kernel-bw:default','kernel-bw:0.05',...
	'Location','NorthWest')
hold off

The ksdensity estimate with a smaller bandwidth matches the empirical cumulative distribution function better.

Plot Estimated Cumulative Density Function for Given Number of Points

Load the sample data.

load hospital

Plot the estimated cdf evaluated at 50 equally spaced points.

figure()
ksdensity(hospital.Weight,'support','positive','function','cdf',...
'npoints',50)
xlabel('Patient weights')
ylabel('Estimated cdf')

Estimate Survivor and Cumulative Hazard for Censored Failure Data

Generate sample data from an exponential distribution with mean 3.

rng('default') % For reproducibility
x = random('exp',3,100,1);

Create a logical vector that indicates censoring. Here, observations with lifetimes longer than 10 are censored.

T = 10;
cens = (x>10);

Compute and plot the estimated density function.

figure()
ksdensity(x,'support','positive','censoring',cens);

Compute and plot the survivor function.

figure()
ksdensity(x,'support','positive','censoring',cens,...
'function','survivor');

Compute and plot the cumulative hazard function.

figure()
ksdensity(x,'support','positive','censoring',cens,...
'function','cumhazard');

Estimate Inverse Cumulative Distribution Function for Specified Probability Values

Generate a mixture of two normal distributions, and plot the estimated inverse cumulative distribution function at a specified set of probability values.

rng('default') % For reproducibility
x = [randn(30,1); 5+randn(30,1)];
pi = linspace(.01,.99,99);
figure()
ksdensity(x,pi,'function','icdf');

Return Bandwidth of Smoothing Window

Generate a mixture of two normal distributions.

rng('default') % For reproducibility
x = [randn(30,1); 5+randn(30,1)];

Return the bandwidth of the smoothing window for the probability density estimate.

[f,xi,bw] = ksdensity(x);
bw
bw =

    1.5141

The default bandwidth is optimal for normal densities.

Plot the estimated density.

figure()
plot(xi,f);
xlabel('xi')
ylabel('f')
hold on

Plot the density using an increased bandwidth value.

[f,xi] = ksdensity(x,'width',1.8);
plot(xi,f,'--r','LineWidth',1.5)

A higher bandwidth further smooths the density estimate, which might mask some characteristics of the distribution.

Now, plot the density using a decreased bandwidth value.

[f,xi] = ksdensity(x,'width',0.8);
plot(xi,f,'-.k','LineWidth',1.5)
legend('bw = default','bw = 1.8','bw = 0.8')
hold off

A smaller bandwidth smooths the density estimate less, which exaggerates some characteristics of the sample.

Input Arguments

expand all

x — Sample datacolumn vector

Sample data, for which ksdensity returns f values, specified as a column vector.

Example: [f,xi] = ksdensity(x)

Data Types: single | double

pts — Points to evaluate fvector

Points to evaluate f at, specified as a vector. pts can be a row or column vector. f has the same dimensions as pts.

Example: pts = (0:1:25); ksdensity(x,pts);

Data Types: single | double

ax — Axes handlehandle

Axes handle for the figure ksdensity plots to, specified as a handle.

For example, if h is a handle for a figure, then ksdensity can plot to that figure as follows.

Example: ksdensity(h,x)

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'censoring',cens,'kernel','triangle','npoints',20,'function','cdf' specifies that ksdensity estimates the cdf by evaluating at 20 equally spaced points that covers the range of data, using the triangle kernel smoothing function and accounting for the censored data information in vector cens.

'censoring' — Logical vectorvector of 0s (default) | vector of 0s and 1s

Logical vector indicating which entries are censored, specified as a vector of binary values. A value of 0 indicates there is no censoring, 1 indicates that observation is censored. Default is there is no censoring.

Example: 'censoring',censdata

Data Types: logical

'kernel' — Type of kernel smoother'normal' (default) | 'box' | 'triangle' | 'epanechnikov' | function handle | string

Type of kernel smoother, specified as the comma-separated pair consisting of 'kernel' and one of the following.

  • 'normal' (default)

  • 'box'

  • 'triangle'

  • 'epanechnikov'

  • You can also specify a custom kernel function, as a function handle or as a string, e.g., @normpdf or 'normpdf'. This calls the function with one argument that is an array of distances between data values and locations where the density is evaluated. The function must return an array of the same size containing corresponding values of the kernel function.

    When 'function' is 'pdf', this kernel function returns density values. Otherwise, it returns cumulative probability values.

    Specifying a custom kernel when 'function' is 'icdf' returns an error.

If 'support' is 'positive', then ksdensity transforms x using a log function, estimates the density of the transformed values, and transforms back to the original scale. If 'support' is a vector [L U], then ksdensity uses the transformation log((X-L)/(U-X)). The width parameter and bw outputs are on the scale of the transformed values.

Example: 'kernel','box'

Data Types: char | function_handle

'npoints' — Number of equally spaced points100 (default) | scalar value

Number of equally spaced points in xi, specified as the comma-separated pair consisting of 'npoints' and a scalar value.

For instance, for a kernel smooth estimate of a specified function at 80 equally spaced points within the range of sample data, input:

Example: 'npoints',80

Data Types: single | double

'support' — Support for the density'unbounded' (default) | 'positive' | two-element vector, [L U]

Support for the density, specified as the comma-separated pair consisting of 'support' and one of the following.

'unbounded'Default. Allow the density to extend over the whole real line.
'positive'Restrict the density to positive values.
Two-element vector, [L U]Give the finite lower and upper bounds for the support of the density.

Example: 'support','positive'

Example: 'support',[0 10]

Data Types: single | double | char

'weights' — Weights for each x valuevector

Weights for each x value, specified as the comma-separated pair consisting of 'weights' and a vector of the same length as x.

For instance, if the weights for the data values are in vector xw, then you can specify the weights as follows.

Example: 'weights',xw

Data Types: single | double

'bandwidth' — Bandwidth of the kernel smoothing windowoptimal value for normal densities (default) | scalar value

The bandwidth of the kernel-smoothing window, which is a function of the number of points in x, specified as the comma-separated pair consisting of 'width' and a scalar. The default is optimal for estimating normal densities, but you might want to choose a larger or smaller value to smooth more or less.

Example: 'bandwidth',0.8

Data Types: single | double

'function' — Function to estimate'pdf' (default) | 'cdf' | 'icdf' | 'survivor' | 'cumhazard'

Function to estimate, specified as the comma-separated pair consisting of 'function' and one of the following.

'pdf'Default. Probability density function.
'cdf'Cumulative distribution function.
'icdf'

Inverse cumulative distribution function. For 'icdf', f = ksdensity(x,pi,'function','icdf') computes the estimated inverse cdf of the values in x, and evaluates it at the probability values specified in pi.

'survivor'Survivor function.
'cumhazard'Cumulative hazard function.

Example: 'function','icdf'

Data Types: char

Output Arguments

expand all

f — Estimated function valuesvector

Estimated function values, returned as a vector of the same dimension as xi or pts.

xi — Evaluation points100 equally spaced points (default) | vector

Evaluation points at which ksdensity calculates f, returned as a vector. Default is 100 equally spaced points that cover the range of data in x.

bw — Bandwidth of smoothing windowscalar value

Bandwidth of smoothing window, returned as a scalar value.

References

[1] Bowman, A. W., and A. Azzalini. Applied Smoothing Techniques for Data Analysis. New York: Oxford University Press Inc., 1997.

See Also

Was this topic helpful?