subclust (Fuzzy Logic Toolbox)

Find cluster centers with subtractive clustering.

Syntax

[C,S] = subclust(X,radii,xBounds,options)

Description

This function estimates the cluster centers in a set of data by using the subtractive clustering method. The subtractive clustering method assumes each data point is a potential cluster center and calculates a measure of the likelihood that each data point would define the cluster center, based on the density of surrounding data points. The algorithm:

Selects the data point with the highest potential to be the first cluster center
Removes all data points in the vicinity of the first cluster center (as determined by radii), in order to determine the next data cluster and its center location
Iterates on this process until all of the data is within radii of a cluster center

The subtractive clustering method is an extension of the mountain clustering method proposed by R. Yager.

The matrix X contains the data to be clustered; each row of X is a data point. The variable radii is a vector of entries between 0 and 1 that specifies a cluster center's range of influence in each of the data dimensions, assuming the data falls within a unit hyperbox. Small radii values generally result in finding a few large clusters. Good values for radii are usually between 0.2 and 0.5.

For example, if the data dimension is two (X has two columns),
radii = [0.5 0.25] specifies that the range of influence in the first data dimension is half the width of the data space and the range of influence in the second data dimension is one quarter the width of the data space. If radii is a scalar, then the scalar value is applied to all data dimensions, i.e., each cluster center will have a spherical neighborhood of influence with the given radius.

xBounds is a 2-by-N matrix that specifies how to map the data in X into a unit hyperbox, where N is the data dimension. This argument is optional if X is already normalized. The first row contains the minimum axis range values and the second row contains the maximum axis range values for scaling the data in each dimension. For example, xBounds = [-10 -5; 10 5] specifies that data values in the first data dimension are to be scaled from the range [-10 +10] into values in the range [0 1]; data values in the second data dimension are to be scaled from the range [-5 +5] into values in the range [0 1]. If xBounds is an empty matrix or not provided, then xBounds defaults to the minimum and maximum data values found in each data dimension.

The options vector can be used for specifying clustering algorithm parameters to override the default values. These components of the vector options are specified as follows:

options(1) = quashFactor: This is the factor used to multiply the radii values that determine the neighborhood of a cluster center, so as to quash the potential for outlying points to be considered as part of that cluster. (default: 1.25)
options(2) = acceptRatio: This sets the potential, as a fraction of the potential of the first cluster center, above which another data point will be accepted as a cluster center. (default: 0.5)
options(3) = rejectRatio: This sets the potential, as a fraction of the potential of the first cluster center, below which a data point will be rejected as a cluster center. (default: 0.15)
options(4) = verbose: If this term is not zero, then progress information will be printed as the clustering process proceeds. (default: 0)

The function returns the cluster centers in the matrix C; each row of C contains the position of a cluster center. The returned S vector contains the sigma values that specify the range of influence of a cluster center in each of the data dimensions. All cluster centers share the same set of sigma values.

Examples

```
[C,S] = subclust(X,0.5)
```

This is the minimum number of arguments needed to use this function. A range of influence of 0.5 has been specified for all data dimensions.

[C,S] = subclust(X,[0.5 0.25 0.3],[],[2.0 0.8 0.7])

This assumes the data dimension is 3 (X has 3 columns) and uses a range of influence of 0.5, 0.25, and 0.3 for the first, second and third data dimension, respectively. The scaling factors for mapping the data into a unit hyperbox will be obtained from the minimum and maximum data values. The squashFactor is set to 2.0, indicating that we only want to find clusters that are far from each other. The acceptRatio is set to 0.8, indicating that we will only accept data points that have a very strong potential for being cluster centers. The rejectRatio is set to 0.7, indicating that we want to reject all data points without a strong potential.

See Also

genfis2

References

Chiu, S., "Fuzzy Model Identification Based on Cluster Estimation," Journal of Intelligent & Fuzzy Systems, Vol. 2, No. 3, Sept. 1994.

Yager, R. and D. Filev, "Generation of Fuzzy Rules by Mountain Clustering," Journal of Intelligent & Fuzzy Systems, Vol. 2, No. 3, pp. 209-219, 1994.

smf surfview