Code covered by the BSD License  

Highlights from
Generate Data for Clustering

Be the first to rate this file! 23 Downloads (last 30 days) File Size: 2.74 KB File ID: #37435
image thumbnail

Generate Data for Clustering

by Nuno Fachada

 

10 Jul 2012 (Updated 03 Aug 2012)

Generates 2D data for clustering.

| Watch this File

File Information
Description

generateData Generates 2D data for clustering; data is created along straight lines, which can be more or less parallel depending on slopeStd argument.

Inputs:

slope - Base direction of the lines on which clusters are based.
slopeStd - Standard deviation of the slope; used to obtain a random slope variation from the normal distribution, which is added to the base slope in order to obtain the final slope of each cluster.
numClusts - Number of clusters (and therefore of lines) to generate.
xClustAvgSep - Average separation of line centers along the X axis.
yClustAvgSep - Average separation of line centers along the Y axis.
lengthAvg - The base length of lines on which clusters are based.
lengthStd - Standard deviation of line length; used to obtain a random length variation from the normal distribution, which is added to the base length in order to obtain the final length of each line.
lateralStd - "Cluster fatness", i.e., the standard deviation of the distance from each point to the respective line, in both x and y directions; this distance is obtained from the normal distribution.
totalPoints - Total points in generated data (will be randomly divided among clusters).

Outputs:

data - Matrix (totalPoints x 2) with the generated data
clustPoints - Vector (numClusts x 1) containing number of points in each cluster
idx - Vector (totalPoints x 1) containing the cluster indices of each point
centers - Matrix (numClusts x 2) containing centers from where clusters were generated
slopes - Vector (numClusts x 1) containing the effective slopes used to generate clusters
lengths - Vector (numClusts x 1) containing the effective lengths used to generate clusters

-----------------
Usage example:

[data cp idx] = generateData(1, 0.5, 5, 15, 15, 5, 1, 2, 200);

This creates 5 clusters with a total of 200 points, with a base slope of 1 (std=0.5), separated in average by 15 units in both x and y directions, with average length of 5 units (std=1) and a "fatness" or spread of 2 units.

To take a quick look at the clusters just do:

scatter(data(:,1), data(:,2), 8, idx);

Required Products MATLAB
MATLAB release MATLAB 7.13 (R2011b)
Tags for This File  
Everyone's Tags
clustering(3), data, data exploration, generator
Tags I've Applied
Add New Tags Please login to tag files.
Please login to add a comment or rating.
Updates
03 Aug 2012

- Function returns more information.
- More comments w/ example.
- Fix x/yClustAvgSep to conform with specification in comment
- totalPoints is exact number of points
- No clusters with zero elements

Contact us