You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
Is there a way of creating a MLGraph from only linearized upper triangular of a symmetric adjacency matrix to save memory?
1 view (last 30 days)
Show older comments
Hi! I have the elements of an upper triangular part of a symmetric adjacency matrix and like this
a b d g
x c e h
x x f i
and the numbers are linearized like H = [a b c d e f g i]
And I need to create a MLGraph using matlab.internal.graph.MLGraph() function but I do not want to transform de H array into a full square symmetric matrix to save memory.
I could do this with
adjacency_matrix(triu(true(n))) = H;
diag = 1:n+1:n^2;
adjacency_matrix = adjacency_matrix + adjacency_matrix';
adjacency_matrix(diag) = adjacency_matrix(diag)/2;
but I'm working with very large arrays and need to save memory...
How could I pass just the linearized upper triangular elements H to matlab.internal.graph.MLGraph() and get the same result as of using the full matrix?
14 Comments
Steven Lord
on 25 Oct 2022
As stated in the "Internal Packages" section on this documentation page "MathWorks® reserves the use of packages named internal for utility functions used by internal MATLAB code. Functions that belong to an internal package are intended for MathWorks use only. Using functions or classes that belong to an internal package is discouraged. These functions and classes are not guaranteed to work in a consistent manner from one release to the next. Any of these functions and classes might be removed from the MATLAB software in any subsequent release without notice and without documentation in the product release notes."
Lorenco Santos Vasconcelos
on 25 Oct 2022
Hi @Steven Lord thanks for the reply. Actually I'm creating a graph from the adjacency matrix just to get the MinSpanningTree graph for later use. So I'm using the internal funciton to bypass all the verifications performed by the graph function once my adjacency matrix is well behaved. Just to speed up a little bit...
Bruno Luong
on 25 Oct 2022
If you want the minspaning tree, then the values of adjadcent matrix does not matter, you might work on logical array that save the space by factor of 8:
Lorenco Santos Vasconcelos
on 25 Oct 2022
You mean to create a sparse matrix from H (the upper triangular linearized) and then pass the sparse matrix to the graph function?
In this way, I think I will need to use sparse(I,J,H) and precompute the arrays I and J, right? Is there any fast and low memory consumption way to compute I and J without using something like ndgrid() and than keep only values with J>=I?
Lorenco Santos Vasconcelos
on 25 Oct 2022
Just detailing more
I'm doing this:
G.Underlying = matlab.internal.graph.MLGraph(adjacency_matrix);
weights = nonzeros(tril(adjacency_matrix));
clear adjacency_matrix
[~, edgeind] = primMinSpanningTree(G.Underlying, weights, 1, false);
ed = G.Underlying.Edges;
gMST = graph(ed(edgeind, 1), ed(edgeind, 2), pesos(edgeind));
clear G weights edgeind ed
sMST = adjacency(gMST,'weighted');
I'm doing this because latter on the code I need the graph of minspanstree gMST and the adjacency matrix of the minspantree graph sMST.
Do you have and ultra efficient suggestion? Because my memory usage is getting very big like 40 Gb
Bruno Luong
on 25 Oct 2022
Sparse matrix is a special storage to store matrix with a lot of 0s, said number of non-zeros elements < 0.01 total number of elements. This is typically used PDE discretization or graph where degree of nodes < number of nodes.
I'm not suggesting you use to store upper triangular part which is NOT a good idea. If you are thining ndgrid, then sparse is not suitable.
Steven Lord
on 25 Oct 2022
What sizes of graphs are you working with or planning / hoping to work with in the future?
Lorenco Santos Vasconcelos
on 25 Oct 2022
Agreed @Bruno Luong. In this case, the ratio between zeros and non zero elements will be 0.5, so it's not good to use sparse representation.
Also @Steven Lord, I'm dealing with weighted adjacency matrices of the order 50000 x 50000, which gives me memory usage about 20 Gb being allocated in each worker doing the job. After getting the minspantree, I need to find several djikstra shortest paths between nodes...
I'll try using the logical indexing suggested by @Bruno Luong to reduce the size. But any new suggestions are nice!
Christine Tobler
on 25 Oct 2022
The graph class is optimized for the case of a sparse adjacency matrix, so it's difficult to do much for this case of a very large and dense matrix.
To safe on intermediate memory when constructing the original graph class, you could try to compute indices I and J that match your linearized upper triangle:
a b d g
x c e h
x x f i
Numbers are linearized like H = [a b c d e f g i], with I = [1 1 2 1 2 3 1 2 3 4] and J = [1 2 2 3 3 3 4 4 4 4].
I think this piece of code should compute them fine, although I've only tested it for n=4:
I=zeros(n*(n+1)/2,1);
J = I;
k = 1;
for jj=1:n
for ii=1:jj
I(k) = ii;
J(k) = jj;
k = k+1;
end
end
I, J
Given these two inputs, you can then use
g = graph(I, J, linearizedUpperTriangle);
without needing to construct the whole adjacency matrix. This won't affect the size of the graph class itself, but could reduce some of the intermediate memory loss.
One thing to keep in mind: In this syntax, any zero entry in the linearizedUpperTriangle input will be interpreted as an edge with zero weight (in the adjacency matrix constructor, zeros in that matrix are interpreted as "there is no edge between the respective nodes).
So you may need to do I(linearizedUpperTriangle==0) = [] and the same for J and for linearizedUpperTriangle itself, if this contains zero entries which signify that there isn't an edge between the respective two nodes.
Bruno Luong
on 25 Oct 2022
Note thet the esquence I, J can be generated without loop
n = 4;
K=flip(nchoosek(n+1:-1:1,2),1);
I=K(:,2), J=K(:,1)-1,
I = 10×1
1
1
2
1
2
3
1
2
3
4
J = 10×1
1
2
2
3
3
3
4
4
4
4
Lorenco Santos Vasconcelos
on 28 Oct 2022
Thank you for all the suggestions! I just came to some ending on this mixing all of the information you provided.
I was trying to speed up the code by using almost all of it vectorized (this is the first tip to speed up MATLAB). However, as I'm dealing with BIG amount of data, vectorizing the code implied on having large matrices of double type. So I got speed but not enough memory.
I have to deal with sort of a trade off between fast vectorized code and avoid creating large matrices. So, some parts of the code I had to get back from vectorization and use basic loops.
@Bruno Luong's tip to construct the graph with the logical adjacency matrix was extremely valuable, once I was interested only in the minspantree, so i did:
G = matlab.internal.graph.MLGraph(true(n_samples));
Also, I computed the weights with a half-vectorized loop like and stored them in a linearized form of a lower triangular matrix with size N*(N-1)/2.
@Bruno Luong's tip on generating I and J with nchoosek was nice, but not practical in this case. With n_samples like 30,000 or 50,000 the combs function took a lot of time, so I did that, based on @Christine Tobler suggestion:
for ii = 1:n_samples
I = ii:n_samples;
J = ii*ones(n_samples-ii+1,1);
weights(count:(count + n_samples-ii)) = max(max(core_distances(I),core_distances(J)),sqrt(sum((X(I,:) - X(J,:)).^2,2)));
count = count + n_samples-ii+1;
end
Then I got the minspantree graph and its adjacency matrix (what I really needed in the rest of the code) like this
[~, edgeind] = primMinSpanningTree(G, weights, 1, false);
Gargs = {G.Edges(edgeind, 1), G.Edges(edgeind, 2), weights(edgeind)};
clear G weights edgeind
[gMST, gMSTW] = matlab.internal.graph.constructFromEdgeList(@matlab.internal.graph.MLGraph, 'graph', Gargs{:});
sMST = adjacency(gMST,gMSTW.Weight);
Things are running nice now, but there's still two things to speed up. It's the following:
First, I need to compute de shortest paths between all points belonging to a cluster A to all points belonging to a cluster B, and do that for all existing clusters. I'm doing this way:
function [] = cluster_density_separation()
for ii = 1:n_clusters
starts = clusteredX{ii};
n_starts = length(starts);
shortest_paths = zeros(n_starts,1);
for jj = ii+1:n_clusters
destinations = clusteredX{jj};
for k=1:n_starts
d = dijkstraShortestPaths(gMST, gMSTW.Weight, starts(k), destinations, Inf);
shortest_paths(k) = min(d(destinations));
end
density_separation(ii,j) = min(shortest_paths);
end
end
end
For now, this was the fastest way I could do that. Any suggestions?
Secondly, I need to do some operation with the mutual distances of every point in a cluster. To get these distances is easy like get one line of a pdist2 call, but for many points, this is a huge matrix.
So I'm using pdist() because the size is n*(n-1)/2 which is also big... For each cluster, I call pdist() and fill a new matrix like the one generated with pdist2 but without the diagonal of zeros. This way consumes less memory and only on pdist call for each cluster. I could compute the distances for each pair of points in a cluster but think it would be slow. Can you think of a faster way to do the folowing without sacrificing RAM?
function [] = compute_core_dist()
for ii=1:n_clusters
neighbors = clusteredX{ii};
D = pdist(X(neighbors,:));
n_neighbors = numel(neighbors);
distance_matrix = zeros(n_neighbors,n_neighbors-1);
for m = 1:n_neighbors
idx = [((2 * n_neighbors - (3:m+1)) .* (0:m-2) / 2 + m - 1), ((2 * n_neighbors - 2 - m) * (m-1) / 2 + (m:n_neighbors-1))];
distance_matrix(m,:) = D(idx);
end
numerator = sum((1./distance_matrix).^n_features,2);
core_distances(neighbors) = (numerator / (n_neighbors - 1)) .^(-1/n_features);
end
end
Lorenco Santos Vasconcelos
on 28 Oct 2022
For completeness, I'm sharing the entire function I'm trying to optimize for speed and memory.
It receives a large matrix of zscored data X(n_samples,n_features) of sizes like 50,000 x 15 and also receives the array of labels for each sample in X and an array of the clusters. This big matrix is generated once in each worker with WorkerObjWrapper...
This data in X was clustered by a method like DBSCAN. I make a lot of combinations of clustering varying the features. Like 10.000 different clustering combinations and call this function DBCV2 to evaluate the quality of each one. So this function is heavy processing (like 80-100 seconds per call) and called multiple times (10.000 times), so I'm calling it from a parfor loop. That is the reason I need to save memory, because I was getting things like 40 GB on each worker. Now I reduced to 15-20 GB per worker.
So this is it. The trade off between fast vectorized code and low memory usage code but slow
Lorenco Santos Vasconcelos
on 28 Oct 2022
One more complement... The big trouble is in the first execution in all workers because they all start runing almost in the same time and all allocate lots of memory. Then, one each call of the function lasts a different time, the next iterations will not be at same time and not all workers will allocate memory togheter.
I think this can be resolved like delaying a little bit only the first execution of each worker? Can this be done? How could I apply different delay on only the first execution of each worker?
Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)