The purpose of the development of this toolbox was to compile a continuously extensible, standard tool, which is useful for any MATLAB user for one's aim. In Chapter 1 of the downloadable related documentation one can find a theoretical introduction containing the theory of the algorithms, the definition of the validity measures and the tools of visualization, which help to understand the programmed MATLAB files.
Chapter 2 deals with the exposition of the
files and the description of the particular algorithms, and they are illustrated with simple examples, while in Chapter 3 the whole
Toolbox is tested on real data sets during the solution of three clustering problems: comparison and selection of algorithms; estimating the optimal number of clusters; and examining
multidimensional data sets.
About the Toolbox
The Fuzzy Clustering and Data Analysis Toolbox is a collection of MATLAB functions. The toolbox provides five categories of functions:
- Clustering algorithms. These functions group the given data set into clusters by different approaches: functions Kmeans and Kmedoid
are hard partitioning methods, FCMclust, GKclust, GGclust are fuzzy partitioning methods with different distance norms.
- Evaluation with cluster prototypes. On the score of the clustering results of a data set there is a possibility to calculate membership for "unseen" data sets with these set of functions. In 2-dimensional case the functions draw a contour-map in the data space to visualize
the results.
- Validation. The validity function provides cluster validity measures for each partition. It is useful when the number of cluster is unknown a priori. The optimal partition can be determined by the point of the extrema of the validation indexes in dependence of the number of clusters. The indexes calculated are: Partition Coefficient (PC), Classification Entropy (CE), Partition Index (SC), Separation Index (S), Xie and Beni's Index (XB), Dunn's Index (DI) and Alternative Dunn Index (DII).
- Visualization. The Visualization part of this toolbox provides the modified Sammon mapping of the data. This mapping method is a
multidimensional scaling method described by Sammon.
- Examples. An example based on industrial data set to present the usefulness of these toolbox and algorithms. |