subclust

Find cluster centers with subtractive clustering

Syntax

[C,S] = subclust(X,radii,xBounds,options) 

Description

[C,S] = subclust(X,radii,xBounds,options) estimates the cluster centers in a set of data by using the subtractive clustering method.

The function returns the cluster centers in the matrix C. Each row of C contains the position of a cluster center. The returned S vector contains the sigma values that specify the range of influence of a cluster center in each of the data dimensions. All cluster centers share the same set of sigma values.

The subtractive clustering method assumes each data point is a potential cluster center and calculates a measure of the likelihood that each data point would define the cluster center, based on the density of surrounding data points. The algorithm does the following:

  • Selects the data point with the highest potential to be the first cluster center

  • Removes all data points in the vicinity of the first cluster center (as determined by radii), in order to determine the next data cluster and its center location

  • Iterates on this process until all of the data is within radii of a cluster center

The subtractive clustering method is an extension of the mountain clustering method proposed by R. Yager.

The matrix X contains the data to be clustered; each row of X is a data point. The variable radii is a vector of entries between 0 and 1 that specifies a cluster center's range of influence in each of the data dimensions, assuming the data falls within a unit hyperbox. Small radii values generally result in finding a few large clusters. The best values for radii are usually between 0.2 and 0.5.

For example, if the data dimension is two (X has two columns), radii=[0.5 0.25] specifies that the range of influence in the first data dimension is half the width of the data space and the range of influence in the second data dimension is one quarter the width of the data space. If radii is a scalar, then the scalar value is applied to all data dimensions, i.e., each cluster center has a spherical neighborhood of influence with the given radius.

xBounds is a 2-by-N matrix that specifies how to map the data in X into a unit hyperbox, where N is the data dimension. This argument is optional if X is already normalized. The first row contains the minimum axis range values and the second row contains the maximum axis range values for scaling the data in each dimension.

For example, xBounds = [-10 -5; 10 5] specifies that data values in the first data dimension are to be scaled from the range [-10 +10] into values in the range [0 1]; data values in the second data dimension are to be scaled from the range [-5 +5] into values in the range [0 1]. If xBounds is an empty matrix or not provided, then xBounds defaults to the minimum and maximum data values found in each data dimension.

The options vector can be used for specifying clustering algorithm parameters to override the default values. These components of the vector options are specified as follows:

  • options(1) = quashFactor: This factor is used to multiply the radii values that determine the neighborhood of a cluster center, so as to quash the potential for outlying points to be considered as part of that cluster. (default: 1.25)

  • options(2) = acceptRatio: This factor sets the potential, as a fraction of the potential of the first cluster center, above which another data point is accepted as a cluster center. (default: 0.5)

  • options(3) = rejectRatio: This factor sets the potential, as a fraction of the potential of the first cluster center, below which a data point is rejected as a cluster center. (default: 0.15)

  • options(4) = verbose: If this term is not zero, then progress information is printed as the clustering process proceeds. (default: 0)

Examples

[C,S] = subclust(X,0.5)

This command sets the minimum number of arguments needed to use this function. A range of influence of 0.5 has been specified for all data dimensions.

[C,S] = subclust(X,[0.5 0.25 0.3],[],[2.0 0.8 0.7])

This command assumes the data dimension is 3 (X has 3 columns) and uses a range of influence of 0.5, 0.25, and 0.3 for the first, second, and third data dimension, respectively. The scaling factors for mapping the data into a unit hyperbox are obtained from the minimum and maximum data values. The squashFactor is set to 2.0, indicating that you only want to find clusters that are far from each other. The acceptRatio is set to 0.8, indicating that only data points that have a very strong potential for being cluster centers are accepted. The rejectRatio is set to 0.7, indicating that you want to reject all data points without a strong potential.

References

Chiu, S., "Fuzzy Model Identification Based on Cluster Estimation," Journal of Intelligent & Fuzzy Systems, Vol. 2, No. 3, Sept. 1994.

Yager, R. and D. Filev, "Generation of Fuzzy Rules by Mountain Clustering," Journal of Intelligent & Fuzzy Systems, Vol. 2, No. 3, pp. 209-219, 1994.

See Also

Was this topic helpful?