Most Efficient Way to Construct the Matrices to Extract the Lower and Upper Triangle from a Vectorized Matrix
Show older comments
Given a matrix X and its vector form
I am after the most efficient way to build the matrices L and U which extracts the lower and upper triangle from X.
So in MATLAB code it would be something like that:
clear();
numRows = 3;
numCols = numRows;
mX = randn(numRows, numCols);
vX = mX(:);
% Lower Triangle are indices 2, 3, 6
mL = [ 0, 1, 0, 0, 0, 0, 0, 0, 0 ; ...
0, 0, 1, 0, 0, 0, 0, 0, 0 ; ...
0, 0, 0, 0, 0, 1, 0, 0, 0 ];
% Upper Triangle are indices 4, 7, 8
mU = [ 0, 0, 0, 1, 0, 0, 0, 0, 0 ; ...
0, 0, 0, 0, 0, 0, 1, 0, 0 ; ...
0, 0, 0, 0, 0, 0, 0, 1, 0 ];
assert(isequal(mL * vX, mX(logical(tril(mX, -1)))));
assert(isequal(mU * vX, mX(logical(triu(mX, 1)))));
I am after sparse represenation of mU and mL in the most efficient way.
My current implementation is given by:
function [ mLU ] = GenerateTriangleExtractorMatrix( numRows, triangleFlag, diagFlag )
EXTRACT_LOWER_TRIANGLE = 1;
EXTRACT_UPPER_TRIANGLE = 2;
INCLUDE_DIAGONAL = 1;
EXCLUDE_DIAGONAL = 2;
switch(diagFlag)
case(INCLUDE_DIAGONAL)
numElements = 0.5 * numRows * (numRows + 1);
diagIdx = 0;
case(EXCLUDE_DIAGONAL)
numElements = 0.5 * (numRows - 1) * numRows;
diagIdx = 1;
end
vJ = zeros(numElements, 1);
if(triangleFlag == EXTRACT_LOWER_TRIANGLE)
elmntIdx = 0;
for jj = 1:numRows
for ii = (jj + diagIdx):numRows
elmntIdx = elmntIdx + 1;
vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
end
end
elseif(triangleFlag == EXTRACT_UPPER_TRIANGLE)
elmntIdx = numElements + 1;
for jj = numRows:-1:1
for ii = (jj - diagIdx):-1:1
elmntIdx = elmntIdx - 1;
vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
end
end
end
mLU = sparse(1:numElements, vJ, 1, numElements, numRows * numRows, numElements);
end
Is there a more efficient way to generate vJ without extensive allocation of memory (In order to allow generating really large matrices)?
Thank You.
24 Comments
Tommy
on 20 Apr 2020
Won't guarantee most efficient, but how about this?
function triangles
numRows = 1000;
numCols = numRows;
mX = randn(numRows, numCols);
vX = mX(:);
li = 1:numRows < (1:numRows)';
ln = sum(li(:));
mL = sparse(1:ln, find(li), 1, ln, numRows*numRows);
ui = 1:numRows > (1:numRows)';
un = sum(ui(:));
mU = sparse(1:un, find(ui), 1, un, numRows*numRows);
assert(isequal(mL * vX, mX(logical(tril(mX, -1)))));
assert(isequal(mU * vX, mX(logical(triu(mX, 1)))));
end
time test:
>> timeit(@triangles)
ans =
0.0827 % sec
Royi Avital
on 20 Apr 2020
Edited: Royi Avital
on 20 Apr 2020
Tommy
on 20 Apr 2020
I will say, after I read your comment the problems with my first suggestion are now pretty dang obvious...
At the risk of proposing another dud, I'll offer this:
function T = func(numRows)
numCols = numRows;
mX = randn(numRows, numCols);
vX = mX(:);
[mL, mU] = triangles;
assert(isequal(mL * vX, mX(logical(tril(mX, -1)))));
assert(isequal(mU * vX, mX(logical(triu(mX, 1)))));
function [mL, mU] = triangles
N = (numRows*(numRows-1))/2;
N2 = numRows*numRows;
idx = zeros(N,1);
begin = 1;
for i = 1:numRows
range = ((i-1)*(numRows+1) + 2):(i*numRows);
n = numel(range);
idx(begin:begin+n-1) = range;
begin = begin+n;
end
mL = sparse(1:N, idx, 1, N, N2);
mU = sparse(1:N, N2+1-flip(idx), 1, N, N2);
end
T = timeit(@triangles);
end
>> func(10000)
ans =
3.7114 % sec
>> func(20000)
ans =
27.0066 % sec
Royi Avital
on 20 Apr 2020
Is there a more efficient way to generate vJ without extensive allocation of memory (In order to allow generating really large matrices)?
Even if there were, it doesn't look like you can get more than about 25% reduction in run time. As a quick profiling of the code will tell you, more than 75% of the run time is consumed by the call to sparse().
Royi Avital
on 22 Apr 2020
Matt J
on 22 Apr 2020
Well, 25% of a lot of a time is a lot to gain
Is it, though? Whether it runs for 12 hours or 9 hours, you're still going to be letting it run overnight.
Royi Avital
on 22 Apr 2020
James Tursa
on 22 Apr 2020
Edited: James Tursa
on 22 Apr 2020
How will these be used downstream? Are you actually going to be doing the mL * vX or mU * vX multiples downstream? (In which case you probably want the mL and mU to be double sparse). Or would you want mU and mL to be logical sparse? (less storage as long as you don't subsequently turn it into a double sparse downstream)
Royi Avital
on 22 Apr 2020
James Tursa
on 22 Apr 2020
Edited: James Tursa
on 22 Apr 2020
If you actually do the multiply, I suspect (although I have not tested) that the logical is first converted to a double before the multiply is done. If true, this would completely negate the original savings from the logical storage (and actually cost you more memory). So double sparse storage is probably best if you do the multiply.
That being said, why bother even doing these multiplies downstream if you already know the answer? I would just have a routine generate the answer to mU * vX directly without generating mU or doing the multiply, but maybe that is not an option for you if you need to pass this to an optimizer that is programmed to do a multiply. Is that the situation?
I will look into a mex routine that will generate mU or mL directly in sparse format.
Matt J
on 22 Apr 2020
If they are to be used in an optimization problem, are you certain you need the operation in matrix form? Why would operator form not be good enough?
Royi Avital
on 22 Apr 2020
Take linprog() or quadprog() for instance, I don't think they accept anything but matrices to express linear constraints.
This might be an instance where you would benefit from using fmincon, with a nonlinear constraint function to express the constraint even though it is really linear. This would allow you to implement mL*x and mL*U in operator form. You would of course supply an analytical gradient and, if possible, a feasible intial guess, x0.
Eliminating the need for mL and mU is, in any case, the only thing that is likely to change the game signficantly. Although, I now wonder whether the one-time step of building the mL and mU matrices is significant next to the time taken for the optimization iterations to converge. Building vJ would be still less significant. Your 25% improvement is now a 2% improvement at best, I would guess.
Royi Avital
on 22 Apr 2020
Edited: Royi Avital
on 22 Apr 2020
James Tursa
on 22 Apr 2020
Edited: James Tursa
on 22 Apr 2020
For an optimization problem, it appears to me that you would know the form of the constraint up front, so the sizes are set and the form of mU or mL doesn't change during the iterations. So this boils down to an up front memory allocation issue, and not really a speed issue.
@Matt, there are many cases for using those matrices.
I can't think of any. You can pursue this for hypothetical interest if you want, of course.
Regarding fmincon(), Solving problem which have dedicate solver with general solver is usually a really bad chice.
The situations when that is true are those where calculating the objective and derivatives are faster in matrix form than in operator form. For large problems that won't always be the case, because the computational cost of implementing things in matrix form can start to outweigh the benefits of using a specialized algorithm.
Not to speak defining Linear Constarint in the form of Non Linear (Think time spent on calculating the Derivataive when it is so well defined).
I don't think mU and mL are helpful for defining either linear or nonlinear constraints. A linear constraint on the lower triangle of your unknown matrix X will always be of the form sum(T.*X, 'all')<=b, where T is some lower triangular matrix that you know in advance. The matrix form of the constraint gradient is simply T(:), which doesn't require mL at all to set up.
For non-linear constraints c(mL*X)<=0 on the lower triangular part of X, the gradient can be expressed mL.'*gradc(mL*X), but this could implemented efficeintly and without mL as follows:
B=B=tril(true(numRows),-1);
Bd=double(B);
g=Bd;
g(B)=gradc(X(B));
So, for this, you really only need to pre-compute B and Bd, which can be done with much less time and memory allocation than mL:
N=3000;
tic;
B=tril(true(N),-1);
Bd=double(B);
toc
%Elapsed time is 0.056492 seconds.
tic;
mL=GenerateTriangleExtractorMatrix( N, 1, 2);
toc
%Elapsed time is 0.264385 seconds.
>> whos B Bd mL
Name Size Kilobytes Class Attributes
B 3000x3000 8790 logical
Bd 3000x3000 70313 double
mL 4498500x9000000 140602 double sparse
Royi Avital
on 23 Apr 2020
Matt J
on 23 Apr 2020
We'll compare it against commercial solver which asks for the constraint in matrix form...If the Commercial Solver will be faster, it is useful.
But as I mentioned in my last comment, even providing linear constraints in matrix form should not require mL and mU. So I don't know what this comparison could prove. If you wanted to do the comparison for non-linear constraints, that might be interesting...
Royi Avital
on 23 Apr 2020
Yes, but the matrix you provide should not require mL to set up. Each row of the A(i,:) of the constraint matrix should simply be T_i(:).' where T_i is some row-dependent lower triangular matrix the same size as your unknown matrix X. I don't see why you would need mL as part of the process for generating T_i. You sould be able to set them up more efficiently in the space of X.
Royi Avital
on 23 Apr 2020
Edited: Royi Avital
on 23 Apr 2020
Matt J
on 23 Apr 2020
But that would mean your constraints are of the form mL*X(:)<=b. But since each row of mL contains only a single non-zero element, this means the constraint is equivalent to a simple bound X(j)<=b. In Matlab, you would never have to construct a matrix to represent such a constraint. You would use the vector input arguments lb and ub to specify those. I assume Gurobi has something similar.
Royi Avital
on 23 Apr 2020
Accepted Answer
More Answers (2)
Another approach to consider is to use my MatrixObj class
to construct an object that has the same effect as the operations mL*X and mL.'*Y, but doesn't require you to actually build the matrix,
N=5000;
tic;
mL0=GenerateTriangleExtractorMatrix( N, 1, 2);
toc
%Elapsed time is 0.678702 seconds.
tic;
B=tril(true(N),-1);
Bd=double(B(:));
mL=MatrixObj;
mL.Params.B=B;
mL.Params.Bd=Bd;
mL.Ops.mtimes=@(obj,z) z(obj.Params.B);
mL.Trans.mtimes=@mtimesT;
toc;
%Elapsed time is 0.086228 seconds.
function out=mtimesT(obj,z)
out=obj.Params.Bd;
out(obj.Params.B)=z;
end
In addition to requiring less time to construct, you can verify that it gives the same results as multiplications with mL and mL.',
>> X=rand(N^2,1); isequal(mL0.'*(mL0*X),mL.'*(mL*X))
ans =
logical
1
but with considerably less memory consumption:
>> whos mL mL0
Name Size Kilobytes Class Attributes
mL 1x1 219739 MatrixObj
mL0 12497500x25000000 390586 double sparse
1 Comment
Royi Avital
on 24 Apr 2020
Royi Avital
on 21 Apr 2020
2 Comments
Tommy
on 21 Apr 2020
The two methods are fairly similar - I also like that yours minimizes memory allocation. I ran a few simple fun tests:


I didn't dare try higher than 20,000 for numRows. It seems that your code may possibly perform better at higher values of numRows. In the second case (calculating both the upper and lower triangles) I had your code running both sets of for loops, one after the other (shown in red). In green is the result from your code if only the first set of loops runs, and you recognize that vJ for one triangle is easy to determine if you have vJ for the other triangle (N^2+1-flip(vJ)). So the only thing I'll conclude from this is, if you will eventually calculate both the lower and upper triangle matrices for a given size, it might be better to calculate them together and only find vJ once. I suppose it depends on how expensive N^2+1-flip(vJ) is.
Royi Avital
on 21 Apr 2020
Categories
Find more on Logical in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!