counting the number of clusters

I have a list of pair of numbers for example (please see below). In this example, if we look at the first column, the number one (1) repeats 3 times with its pair 54, 106 and 143. Similarly the number 24 repeats two times with its corresponding pair 87 and 288. What i want to do is group all those that repeat to one cluster. In the example listed below there are 12 pairs. I want to group 1 with 54, 106, and 143 and call it as one cluster and do the same thing with any such repeating pairs (number 24 in this example). In the end I will have 12 - 2 = 10 clusters. I would appreciate if some one could help with a matlab code for this.
Thanks,
Sudharsan
[1 54
1 106
1 143
5 90
24 87
64 244
5 202
7 270
24 288
25 176
26 206
27 161]

4 Comments

maybe i did not unterstand- I see only 8 clusters here - where is my mistake?
Hi Stephan,
There are actually 9 clusters in the listed example and not 10. Also the number 5 repeates twice with its pair 90 and 202. So 12 pairs minus 3 groups = 9 clusters.
Thanks,
Sudharsan
Stephan
Stephan on 8 Oct 2019
Edited: Stephan on 8 Oct 2019
Sorry, I still dont get it... since the 1 appears 3 times, the 5 and the 24 appear 2 times both - for me this are 3 clusters builded by 7 pairs. 5 pairs remain, which gives 8 clusters for me...
Hi Stephan, I think interpreted counting the clusters in wrong way. I think you're right. 3 clusters + 5 pairs = 8.
Thanks, Sudharsan

Sign in to comment.

 Accepted Answer

Fabio Freschi
Fabio Freschi on 8 Oct 2019
Edited: Fabio Freschi on 8 Oct 2019
% unique indices
idx = unique(data(:,1));
% clusters
cluster = arrayfun(@(idx)data(data(:,1) == idx,2),idx,'UniformOutput',false);
Then you can access your clusters with idx(i)and cluster{i}, where i = 1:length(idx)

3 Comments

Hi Fabio, Thank you for your response. I don't quite understant this function and also what the number '2' is doing?
Thanks, Sudharsan
do you mean the 2 here?
cluster = arrayfun(@(idx)data(data(:,1) == idx,2),idx,'UniformOutput',false);
% ^
% |
I am taking the second column of data to check the values of the cluster.
arrayfun is not simple to understand for newbies, try with
doc arrayfun
Roughly speaking, it basically applys the comparison data(:,1) == idx of the cluster to each value of idx, one element at a time. Then I extract the values of the second column of data using the previous comparison data(data(:,1) == idx,2). 'UniformOutput',false is needed to deal with the output and put every value in a different entry of the cell array cluster
Hi Fabio, Yes i meant the 2 where you have pointed it out. Now it makes sense to me. Thanks for your time. Sudharsan

Sign in to comment.

More Answers (1)

I agree with Stephan and findgroups() -- there are 8 "clusters."
Below I use findgroups() to find the groups, then I store all the rows (actually the values in the second column) into a cell array, where each cell has the second values for that group. Try this:
m = [
1 54
1 106
1 143
5 90
24 87
64 244
5 202
7 270
24 288
25 176
26 206
27 161]
groupIndexes = findgroups(m(:, 1))
% Make clusters as a cell array because every group might have a different number of members.
for k = 1 : max(groupIndexes)
thisGroupRows = groupIndexes == k;
groupValues{k} = m(thisGroupRows, 2);
end
celldisp(groupValues) % Report to the command window
You'll see this:
groupIndexes =
1
1
1
2
4
8
2
3
4
5
6
7
groupValues{1} =
54
106
143
groupValues{2} =
90
202
groupValues{3} =
270
groupValues{4} =
87
288
groupValues{5} =
176
groupValues{6} =
206
groupValues{7} =
161
groupValues{8} =
244
Is that what you want? If you want, you could put those values into the second column of the cell array and have the group value (the column 1 values) in the first column.

9 Comments

Hi, yes i was expecting somethinf like this. Thank you for your idea and time.
Sudharsan
However, for a dataset like given below i should have only only cluster. 1 is touching 2 and also 2 is touching 5 and so on. How will I solve this? Some matlab fuctions like linkage and cluster will help?
[1 2
1 3
1 4
2 5
5 6
6 7]
Good to know the availability of this function: +1
Any idea of how to code this will be of great help!
Since every number always "touches" three to five other numbers, why do you not ALWAYS have one cluster? Please give a label matrix where you assign the cluster number to the position so we can try to figure out what constitutes a group.
Perhaps you need bwlabel() or bwconncomp() but I'm not sure because I don't understand your algorithm for defining clusters.
I have a dataset for example like below. It is different from the previous example where now I have the number '2' that appears both in column one and column two. By using "findgroupds", i can group all the 1's and 2's (in column one) and have two clusters in the end. Since the number '2' appears at m(1,2) as well as m(4,1) and so on..i will eventually have only one cluster in the end because all are touching each other.
I think my question is clear now?
m=[
1 2
1 3
1 4
2 5
2 6
2 7]
Do you mean like this:
>> groupIDs = (m<=2)+1
groupIDs =
2 2
2 1
2 1
2 1
2 1
2 1
The top 2 touches the 1's and they have a path going down to the other 2's, so it seems like you want everything that's 2 or less in one group, and above 2 in another group.
Actually not. Say, you imagine all the numbers in the matrix m as circles. So there are 7 circles where circle 1 touches 2,3,4 and also circle 2 touches 5,6,7, and in the end since all are conntected they form one cluster. I want to check if any number in column two appears again in column one (which in the example above is number '2') then i group them together ans call it a cluster. May be if that is clear?
Not totally, but try using this:
inBothColumns = m(:, 1) == m(:, 2);
This will give you a logical vector where both numbers in a row are the same.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!