Flexible mixture models for automatic clustering

Version 0.81 (119 KB) by Statovic
Matlab implementation of clustering (i.e., finite mixture models, unsupervised classification).
792 Downloads
Updated 2 Mar 2023

View License

Snob is a MATLAB implementation of finite mixture models of univariate and multivariate distributions. Snob uses the minimum message length (MML) criterion to estimate the structure of the mixture model (i.e., the number of sub-populations; which sample belongs to which sub-population) and estimate all mixture model parameters. For larger sample sizes, the MML criterion is equivalent to the popular Bayesian information criterion (BIC) and this possess all the favourable properties of BIC, such as model selection consistency. However, for smaller sample sizes, MML takes into account the parametric complexity of the model and does not simply count the number of free parameters which generally results in improved performance over BIC.
Snob allows the user to specify the desired number of sub-populations, however if this is not specified, Snob will automatically try to discover this information using the MML criterion. Currently, Snob supports mixtures of the following distributions:
-Beta distribution
-Dirichlet distribution
-Exponential distribution
-Exponential distribution with type I censoring
-Gamma distribution
-Geometric distribution
-Inverse Gaussian distribution
-Laplace distribution
-Gaussian linear regression
-Logistic regression
-Lognormal distribution
-Multinomial distribution
-Multivariate Gaussian distribution (general covariance structure)
-Multivariate Gaussian distribution (single factor analysis; principal component analysis)
-Negative binomial distribution
-Normal distribution
-Pareto distribution (Type II)
-Poisson distribution
-von Mises-Fisher distribution
-Weibull distribution
-Weibull distribution with Type I censoring
The program is easy to use and allows missing data which should be coded as NaN. Examples of how to use the program are provided; please see data/mm_example?.m.
UPDATE VERSION 0.8.1 (02/03/2023):
Latest updates:
-added function minmis() to compute minimum number of misclassifications by label rotation
-updated examples
-fixed a minor bug in the 'skip' model

Cite As

Wallace, C. S. & Dowe, D. L. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing, 2000 , 10, pp. 73-83

Wallace, C. S. Intrinsic Classification of Spatially Correlated Data. The Computer Journal, 1998, 41, pp. 602-611

Wallace, C. S. Statistical and Inductive Inference by Minimum Message Length. Springer, 2005

Schmidt, D. F. & Makalic, E. Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions. AI 2012: Advances in Artificial Intelligence, Springer Berlin Heidelberg, 2012, 7691, pp. 672-682

Edwards, R. T. & Dowe, D. L. Single factor analysis in MML mixture modelling. Research and Development in Knowledge Discovery and Data Mining, Second Pacific-Asia Conference (PAKDD-98), 1998, 1394

MATLAB Release Compatibility
Created with R2021b
Compatible with any release
Platform Compatibility
Windows macOS Linux
Categories
Find more on Statistics and Machine Learning Toolbox in Help Center and MATLAB Answers

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Version Published Release Notes
0.81

-added function minmis() to compute the minimum number of misclassifications by label rotation
-updated examples
-fixed a minor bug in the 'skip' model

0.80

-added mixtures of principal component analyzers and a new example

0.75

-added Pareto (Type II) mixture models
-added an example of fitting Pareto mixtures

0.70

-added mixtures of Dirichlet distributions and a new example

0.65

-added mixtures of exponential and Weibull models with type I (right) censoring
-added more examples
-fixed a numerical issue with fitting mixtures of linear regressions

0.60

-added the lognormal distribution
-fixed typos in some of the documentation

0.50

-added mixture models for censored exponential and Weibull distributions
-added new function to compute Kullback-Leibler divergences for a mixture model
-improved documentation
-added new examples of usage
-added ability to name attributes

0.40

-added beta and von Mises Fisher distributions
-improvements to numerical accuracy
-updated documentation and examples

0.30

-significant speed improvement in gamma, Laplace mixture models
-added logistic regression, negative binomial distribution
-added a new example(s)

0.2.2

-Added mixtures of Laplace distributions
-Improved documentation
-Improved output of summary function
-Added more examples

0.2.1

-Minor title change
-Fixed some typos in the description

0.2.0