View License

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video

Highlights from

4.4 | 8 ratings Rate this file 28 Downloads (last 30 days) File Size: 3.43 KB File ID: #35014 Version:
image thumbnail



Brett Shoelson (view profile)


27 Jun 2012 (Updated )

Clusters an MxN array of data into an unspecified number (P) of bins.

| Watch this File

File Information

No a priori knowledge of the number of bins, or the distance between bins, is required. This approach relies on the relative difference between (sorted) elements of the data, and works well when the difference between clusters is bigger than the difference between elements within a cluster.
CLUSTERS = clusterData(DATA);
Operates column-by-column. An optional input allows you to specify the sensitivity of each columnwise clustering. Additional outputs also specify the indices of the cluster each row of data, and the bounds used to separate them.

Each column may have a different interpretation. For instance, an Mx4 array of data may represent x-data in the first column, y- in the second, z- in the third, and t- in the fourth. Returns a Px1 cell array, CLUSTERS, specifying the data points in each of the P clusters detected.

The final clustering utilizes all columns.

NOTE: This submission incorporates, expands, and replaces my earlier submission ezCluster.


This file inspired Data Clustering Using Bat Algorithm.

MATLAB release MATLAB 7.13 (R2011b)
Other requirements Should be Toolbox and platform independent.
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (21)
20 Oct 2016 Brett Shoelson

Brett Shoelson (view profile)

clusterData works on vectors, not on matrices. Well, sort of. If you input a non-vector matrix, it clusters each column, and then clusters based on the columnwise clustering. What do your data represent?

Comment only
18 Oct 2016 sayanti sankhari

how I can cluster a dataset of 1700X400 matrix.?
Shall I directly run this code upon my dataset?

Comment only
20 Nov 2015 Fritz

Fritz (view profile)

06 Jan 2015 Brett Shoelson

Brett Shoelson (view profile)


Comment only
06 Jan 2015 nadjoua

please can you indicate me how can i obtain the number of clusters?

Comment only
22 Oct 2013 tsan toso

Ah thanks for the catch Brett, just got around to run the code.

21 Oct 2013 Brett Shoelson

Brett Shoelson (view profile)

Hi Tsan,

I didn’t spend a lot of time trying to understand your data, but I did manage to cluster them in less than 1 second, using clusterData. I noticed that your column 2 isn’t fully filled out. I think that’s why you’re seeing the long delay when you include column 2. If you were to exclude the pairs with missing values, it would process a lot faster. (In fact, I’m not sure how I treated missing variables. Maybe as NaNs.)

Let me know if the clustering you get with

[clusters,clusterInds,clusterBounds] = clusterData(Binningbydensity(1:3216,:));

works for you. (Those are the rows without missing column-two values.)


Comment only
20 Oct 2013 tsan toso

Hi Brett,

If I use your suggested method would it just group data together based on densities and not consider the relative distance of the data between each other? For example let’s just say the data ranges from 1 to 10. The observations of 1 are the same as 10. Observations in between are markedly different, would your function then just put 1 & 10 in the same bin?

For my purposes, I would just want to group bins that are adjacent of the same/similar density together.

I also included a web link for my data just to give you an idea of what kind of data I am dealing with. I provided 2 cols, each is a different random variable.

Another question is that for the dataset on the 2nd column it seem to run for a particularly long time, the data are just integers centering around 1 with dispersion to as far as 7, any work around for that?


Comment only
19 Oct 2013 Brett Shoelson

Brett Shoelson (view profile)

@tsan: Hi Tsan,
It's difficult to comment without seeing your data, but it sounds like you could just create and analyze a vector of densities. ClusterData will spit out the indices for the groupings. (You may need to tweak the sensitivity.)

Comment only
19 Oct 2013 tsan toso

Hi Brett,

Great code, I got a question on how I could use the code for my purposes:

How would you recommend I could use the code if I am looking to bin a sample of data together based on its density (Number of Occurrence/ Length of edge). And the length of the edges are determined by if the adjacent data groups have similar density. (Similar density are grouped together, but if the neighboring bin is 40% more or less in density, it would require another bin).

It seems like what your code is doing is grouping data based on how close they are to each other.


Comment only
23 Jul 2013 Hoi Wong

Hoi Wong (view profile)

Haha. My data set is supposed to give me an array of numbers, but sometimes I got a singleton. That's how I found out. By the way, excellent submission!

Comment only
23 Jul 2013 Hoi Wong

Hoi Wong (view profile)

20 Jul 2013 Xiong

Xiong (view profile)

thank you for your submission!

12 Jul 2013 Brett Shoelson

Brett Shoelson (view profile)

Hmmm. Well, that's clearly a "bug" in the sense that I could have dealt with that case more gracefully, but then--well, let's just say that I never anticipated that anyone would try to cluster a single scalar. :)

Comment only
11 Jul 2013 Hoi Wong

Hoi Wong (view profile)

It seems like the program get stuck (running forever) when I try to cluster a singleton, say clusterData(3).

Comment only
25 Jun 2013 Brett Shoelson

Brett Shoelson (view profile)

Han, did you find some problem with the submission that led you to rate this so poorly? Do you have any comments to share that might help me understand why it merits a two-star rating?

Comment only
25 Jun 2013 Han

Han (view profile)

13 May 2013 Deanna

Deanna (view profile)

11 May 2013 Joel

Joel (view profile)

Excellent submission

18 Sep 2012 Venkat R

Very cool submission. I was searching different options to kind 'k' automatically in the k-means. This submission does it nicely.

10 Aug 2012 Brett Shoelson

Brett Shoelson (view profile)

PLEASE NOTE that this code uses tildes for argument placeholders. As such, it will not work without modification on releases prior to R2009b. Feel free to edit the code, or upgrade to a newer MATLAB!!!

Comment only
10 Jun 2013 1.1

Modified the help to correct a doc bug. Higher sensitivity results in fewer clusters, not more. (No code change.)

01 Sep 2016

Updated license

Contact us