File Exchange

image thumbnail

Generate Data for Clustering

version 1.2.0.2 (4.75 KB) by Nuno Fachada
Generates 2D data for clustering.

11 Downloads

Updated 27 Aug 2019

GitHub view license on GitHub

Generates 2D data for clustering; data is created along straight lines, which can be more or less parallel depending on slopeStd argument.

Inputs:

slope - Base direction of the lines on which clusters are based.
slopeStd - Standard deviation of the slope; used to obtain a random slope variation from the normal distribution, which is added to the base slope in order to obtain the final slope of each cluster.
numClusts - Number of clusters (and therefore of lines) to generate.
xClustAvgSep - Average separation of line centers along the X axis.
yClustAvgSep - Average separation of line centers along the Y axis.
lengthAvg - The base length of lines on which clusters are based.
lengthStd - Standard deviation of line length; used to obtain a random length variation from the normal distribution, which is added to the base length in order to obtain the final length of each line.
lateralStd - "Cluster fatness", i.e., the standard deviation of the distance from each point to the respective line, in both x and y directions; this distance is obtained from the normal distribution.
totalPoints - Total points in generated data (will be randomly divided among clusters).

Outputs:

data - Matrix (totalPoints x 2) with the generated data
clustPoints - Vector (numClusts x 1) containing number of points in each cluster
idx - Vector (totalPoints x 1) containing the cluster indices of each point
centers - Matrix (numClusts x 2) containing centers from where clusters were generated
slopes - Vector (numClusts x 1) containing the effective slopes used to generate clusters
lengths - Vector (numClusts x 1) containing the effective lengths used to generate clusters

-----------------
Usage example:

[data cp idx] = generateData(1, 0.5, 5, 15, 15, 5, 1, 2, 200);

This creates 5 clusters with a total of 200 points, with a base slope of 1 (std=0.5), separated in average by 15 units in both x and y directions, with average length of 5 units (std=1) and a "fatness" or spread of 2 units.

To take a quick look at the clusters just do:

scatter(data(:,1), data(:,2), 8, idx);
-----------------

If you use this script please cite the following reference:

Fachada, N., Figueiredo, M.A.T., Lopes, V.V., Martins, R.C., Rosa, A.C., Spectrometric differentiation of yeast strains using minimum volume increase and minimum direction change clustering criteria, Pattern Recognition Letters, vol. 45, pp. 55-61 (2014), doi: http://dx.doi.org/10.1016/j.patrec.2014.03.008

Cite As

Fachada, N., Figueiredo, M.A.T., Lopes, V.V., Martins, R.C., Rosa, A.C., Spectrometric differentiation of yeast strains using minimum volume increase and minimum direction change clustering criteria, Pattern Recognition Letters, vol. 45, pp. 55-61 (2014), doi: http://dx.doi.org/10.1016/j.patrec.2014.03.008

Comments and Ratings (4)

Ahmed Ragab

John

Very helpful!!

isaac

Updates

1.2.0.2

Small text update

1.2.0.1

Add paper reference

1.2.0.0

Link to GitHub page.

1.1.0.0

- Function returns more information.
- More comments w/ example.
- Fix x/yClustAvgSep to conform with specification in comment
- totalPoints is exact number of points
- No clusters with zero elements

MATLAB Release Compatibility
Created with R2011b
Compatible with any release
Platform Compatibility
Windows macOS Linux