Connected component analysis on undirected graphs, with thresholding and connectivity constraints.
Tristan Ursell
April 2012
Connected component analysis on an undirected graph, with various thresholding and connectivity constraints.
[groups,orphans] = graph_analysis(W);
[groups,orphans] = graph_analysis(W,'field',value,...);
[groups,~] = graph_analysis(W,'field',value,...);
W is the N x N adjaceny matrix for a symmetric graph. Thus W should be a symmetric matrix, if it is not, this function will give an error. Self connections (i.e. diagonal elements) are not allowed and will be removed automatically. The values of the parameters below are applied as a union set, that is, all original elements must meet all of the conditions specified by the parameters to be included in a group of connected components.
If only W is given, then all components with W > 0 will be analyzed and grouped, with the default constraints.
'min_conn' (0 <= min_conn <= max_conn) specifies the minimum degree of connectivity (not including itself) for any element in W to be included in a group. The default 'min_conn' value is 1.
'max_conn' (min_conn <= max_conn <= N) specifies the maximum degree of connectivity for any element in W to be included in a group. The default 'max_conn' value is N.
'min_like' is the minimum likelihood value for an element in W to be included in any group. The default value is 0. The 'likelihood' is not necessarily a probability, and hence is not bounded between zero and one. However, for any two elements in W, the ratio of likelihoods should be equal to their ratio of probabilities.
'max_link' is the maximum number of linkages to search for in the network. This parameter is useful when you know that the network has some maximum number of connections between the elements in the network (e.g. if the graph has the property that any two two nodes are no more than N connections away, you can seg max_lin = N, to speed up the code). Choosing smaller values of 'max_link' can significantly speed the code. The default value is the size of the current subblock.
'max_rank' (1 <= max_rank <= N) is the number of highest likelihood values to use in forming connected groups. For instance, if max_rank = 3, and a row of W is:
1 3 6 4 4 5 6 8 0 3 1
then the matrix row will become:
0 0 1 0 0 0 1 1 0 0 0
If a row of W is:
1 3 6 4 4 6 6 6 0 3 1
with max_rank = 1,2,3 or 4, the row becomes:
0 0 1 0 0 1 1 1 0 0 0
with max_rank = 5, the same row becomes:
0 0 1 1 1 1 1 1 0 0 0
The default value of 'max_rank' is N.
'plot' with value 1 will generate plots of the grouping algorithm as it creates block diagonal groups in from top left to bottom right in W.
The output 'groups' is a structure array with fields:
groups(i).num_els = number of elements in group i.
groups(i).block = subblock identity of group i.
groups(i).elements = elements of W that are in group i.
groups(i).degrees = degrees of connection for each element in group i.
orphans = elements of W that were not in any group, becasue they did not meet the constraints.
The number of distinct groups is length(groups).
Example with a block diagonal random matrix W:
W=blkdiag(rand(50,50),rand(100,100),rand(200,200),rand(300,300));
W=(W+W')/2;
figure;
imagesc(W)
axis equal tight
xlabel('Elements of W')
ylabel('Elements of W')
title('Random BlockDiagonal Adjacency Matrix')
%more inclusive connections, less inclusive probability
[groups,orphans]=graph_analysis(W,'min_like',0.95,'min_conn',3,'plot',1);
%less inclusive connections, more inclusive probability
[groups,orphans]=graph_analysis(W,'min_like',0.9,'min_conn',6,'plot',1);
1.9  improved memory usage, so that 32 bit systems can handle larger arrays. 

1.6  put 'modone' function inside of mfile as requested by multiple users. 

1.5  added new features, improved plot handling, reduced memory load and fixed precision error. 

1.4  fixed some typos in the directions 

1.1  Improved description. 