generateData Generates 2D data for clustering; data is created along straight lines, which can be more or less parallel depending on slopeStd argument.
slope - Base direction of the lines on which clusters are based.
slopeStd - Standard deviation of the slope; used to obtain a random slope variation from the normal distribution, which is added to the base slope in order to obtain the final slope of each cluster.
numClusts - Number of clusters (and therefore of lines) to generate.
xClustAvgSep - Average separation of line centers along the X axis.
yClustAvgSep - Average separation of line centers along the Y axis.
lengthAvg - The base length of lines on which clusters are based.
lengthStd - Standard deviation of line length; used to obtain a random length variation from the normal distribution, which is added to the base length in order to obtain the final length of each line.
lateralStd - "Cluster fatness", i.e., the standard deviation of the distance from each point to the respective line, in both x and y directions; this distance is obtained from the normal distribution.
totalPoints - Total points in generated data (will be randomly divided among clusters).
data - Matrix (totalPoints x 2) with the generated data
clustPoints - Vector (numClusts x 1) containing number of points in each cluster
idx - Vector (totalPoints x 1) containing the cluster indices of each point
centers - Matrix (numClusts x 2) containing centers from where clusters were generated
slopes - Vector (numClusts x 1) containing the effective slopes used to generate clusters
lengths - Vector (numClusts x 1) containing the effective lengths used to generate clusters
[data cp idx] = generateData(1, 0.5, 5, 15, 15, 5, 1, 2, 200);
This creates 5 clusters with a total of 200 points, with a base slope of 1 (std=0.5), separated in average by 15 units in both x and y directions, with average length of 5 units (std=1) and a "fatness" or spread of 2 units.
To take a quick look at the clusters just do:
scatter(data(:,1), data(:,2), 8, idx);