Documentation |
idx = cluster(obj,X)
[idx,nlogl] = cluster(obj,X)
[idx,nlogl,P] = cluster(obj,X)
[idx,nlogl,P,logpdf] = cluster(obj,X)
[idx,nlogl,P,logpdf,M] = cluster(obj,X)
idx = cluster(obj,X) partitions data in the n-by-d matrix X, where n is the number of observations and d is the dimension of the data, into k clusters determined by the k components of the Gaussian mixture distribution defined by obj. obj is an object created by gmdistribution or fitgmdist. idx is an n-by-1 vector, where idx(I) is the cluster index of observation I. The cluster index gives the component with the largest posterior probability for the observation, weighted by the component probability.
Note: The data in X is typically the same as the data used to create the Gaussian mixture distribution defined by obj. Clustering with cluster is treated as a separate step, apart from density estimation. For cluster to provide meaningful clustering with new data, X should come from the same population as the data used to create obj. |
cluster treats NaN values as missing data. Rows of X with NaN values are excluded from the partition.
[idx,nlogl] = cluster(obj,X) also returns nlogl, the negative log-likelihood of the data.
[idx,nlogl,P] = cluster(obj,X) also returns the posterior probabilities of each component for each observation in the n-by-k matrix P. P(I,J) is the probability of component J given observation I.
[idx,nlogl,P,logpdf] = cluster(obj,X) also returns the n-by-1 vector logpdf containing the logarithm of the estimated probability density function for each observation. The density estimate for observation I is a sum over all components of the component density at I times the component probability.
[idx,nlogl,P,logpdf,M] = cluster(obj,X) also returns an n-by-k matrix M containing Mahalanobis distances in squared units. M(I,J) is the Mahalanobis distance of observation I from the mean of component J.