Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

**Class: **gmdistribution

Construct clusters from Gaussian mixture distribution

`idx = cluster(obj,X)`

[idx,nlogl] = cluster(obj,X)

[idx,nlogl,P] = cluster(obj,X)

[idx,nlogl,P,logpdf] = cluster(obj,X)

[idx,nlogl,P,logpdf,M] = cluster(obj,X)

`idx = cluster(obj,X)`

partitions data in
the *n*-by-*d* matrix `X`

,
where *n* is the number of observations and *d* is
the dimension of the data, into *k* clusters determined
by the *k* components of the Gaussian mixture distribution
defined by `obj`

. `obj`

is an object
created by `gmdistribution`

or `fitgmdist`

. `idx`

is an *n*-by-1
vector, where `idx(I)`

is the cluster index of observation `I`

.
The cluster index gives the component with the largest posterior probability
for the observation, weighted by the component probability.

The data in `X`

is typically the same as the
data used to create the Gaussian mixture distribution defined by `obj`

.
Clustering with `cluster`

is treated as a separate
step, apart from density estimation. For `cluster`

to
provide meaningful clustering with new data, `X`

should
come from the same population as the data used to create `obj`

.

`cluster`

treats `NaN`

values
as missing data. Rows of `X`

with `NaN`

values
are excluded from the partition.

`[idx,nlogl] = cluster(obj,X)`

also returns `nlogl`

,
the negative log-likelihood of the data.

`[idx,nlogl,P] = cluster(obj,X)`

also returns
the posterior probabilities of each component for each observation
in the *n*-by-*k* matrix `P`

. `P(I,J)`

is
the probability of component `J`

given observation `I`

.

`[idx,nlogl,P,logpdf] = cluster(obj,X)`

also
returns the *n*-by-1 vector `logpdf`

containing
the logarithm of the estimated probability density function for each
observation. The density estimate for observation `I`

is
a sum over all components of the component density at `I`

times
the component probability.

`[idx,nlogl,P,logpdf,M] = cluster(obj,X)`

also
returns an *n*-by-*k* matrix `M`

containing
Mahalanobis distances in squared units. `M(I,J)`

is
the Mahalanobis distance of observation `I`

from
the mean of component `J`

.

Was this topic helpful?