Find cluster centers with subtractive clustering

```
[C,S] = subclust(X,radii,xBounds,options)
```

`[C,S] = subclust(X,radii,xBounds,options)`

estimates the cluster centers in a set of data by using the subtractive
clustering method.

The function returns the cluster centers in the matrix `C`

.
Each row of `C`

contains the position of a cluster
center. The returned `S`

vector contains the sigma
values that specify the range of influence of a cluster center in
each of the data dimensions. All cluster centers share the same set
of sigma values.

The subtractive clustering method assumes each data point is a potential cluster center and calculates a measure of the likelihood that each data point would define the cluster center, based on the density of surrounding data points. The algorithm does the following:

Selects the data point with the highest potential to be the first cluster center

Removes all data points in the vicinity of the first cluster center (as determined by

`radii`

), in order to determine the next data cluster and its center locationIterates on this process until all of the data is within

`radii`

of a cluster center

The subtractive clustering method is an extension of the mountain clustering method proposed by R. Yager.

The matrix `X`

contains the data to be clustered;
each row of `X`

is a data point. The variable `radii`

is
a vector of entries between 0 and 1 that specifies a cluster center's
range of influence in each of the data dimensions, assuming the data
falls within a unit hyperbox. Small `radii`

values
generally result in finding a few large clusters. The best values
for `radii`

are usually between 0.2 and 0.5.

For example, if the data dimension is two (`X`

has
two columns), `radii`

=[0.5 0.25] specifies that
the range of influence in the first data dimension is half the width
of the data space and the range of influence in the second data dimension
is one quarter the width of the data space. If `radii`

is
a scalar, then the scalar value is applied to all data dimensions,
i.e., each cluster center has a spherical neighborhood of influence
with the given radius.

`xBounds`

is a 2-by-N matrix that specifies
how to map the data in `X`

into a unit hyperbox,
where N is the data dimension. This argument is optional if `X`

is
already normalized. The first row contains the minimum axis range
values and the second row contains the maximum axis range values for
scaling the data in each dimension.

For example, `xBounds`

= [-10 -5; 10 5] specifies
that data values in the first data dimension are to be scaled from
the range [-10 +10] into values in the range [0 1]; data values in
the second data dimension are to be scaled from the range [-5 +5]
into values in the range [0 1]. If `xBounds`

is an
empty matrix or not provided, then `xBounds`

defaults
to the minimum and maximum data values found in each data dimension.

The `options`

vector can be used for specifying
clustering algorithm parameters to override the default values. These
components of the vector `options`

are specified
as follows:

`options(1) = quashFactor`

: This factor is used to multiply the radii values that determine the neighborhood of a cluster center, so as to quash the potential for outlying points to be considered as part of that cluster. (default: 1.25)`options(2) = acceptRatio`

: This factor sets the potential, as a fraction of the potential of the first cluster center, above which another data point is accepted as a cluster center. (default: 0.5)`options(3) = rejectRatio`

: This factor sets the potential, as a fraction of the potential of the first cluster center, below which a data point is rejected as a cluster center. (default: 0.15)`options(4) = verbose`

: If this term is not zero, then progress information is printed as the clustering process proceeds. (default: 0)

[C,S] = subclust(X,0.5)

This command sets the minimum number of arguments needed to use this function. A range of influence of 0.5 has been specified for all data dimensions.

[C,S] = subclust(X,[0.5 0.25 0.3],[],[2.0 0.8 0.7])

This command assumes the data dimension is 3 (`X`

has
3 columns) and uses a range of influence of 0.5, 0.25, and 0.3 for
the first, second, and third data dimension, respectively. The scaling
factors for mapping the data into a unit hyperbox are obtained from
the minimum and maximum data values. The `squashFactor`

is
set to 2.0, indicating that you only want to find clusters that are
far from each other. The `acceptRatio`

is set to
0.8, indicating that only data points that have a very strong potential
for being cluster centers are accepted. The `rejectRatio`

is
set to 0.7, indicating that you want to reject all data points without
a strong potential.

Chiu, S., "Fuzzy Model Identification Based on Cluster Estimation," *Journal
of Intelligent & Fuzzy Systems*, Vol. 2, No. 3, Sept.
1994.

Yager, R. and D. Filev, "Generation of Fuzzy Rules by Mountain
Clustering," *Journal of Intelligent & Fuzzy Systems*,
Vol. 2, No. 3, pp. 209-219, 1994.

Was this topic helpful?