3d matrix multiplication with reduce computation time

Question

Abi Waqas on 28 Aug 2017

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/354352-3d-matrix-multiplication-with-reduce-computation-time

Edited: James Tursa on 30 Aug 2017

Hello Everyone

I have two 3D-matrices A(M,N,I) and B(M,N,I) with complex numbers. Values of M,N and I are large say in the range 500-5000. An easy way to do matrix multiplication is to use for loop for I, but it hugely slows down my rest of the code because I have to do this multiplication thousand of time.

I have been searching the answer in MATLAB answer since two days. The recommended solutions are using following MEX files

1. ' ndfun ': It's very fast and doesn't support complex numbers.

2. ' MMX ': Seems like the version of 'ndfun' but also don't support complex numbers.

3. ' mtimesx ': This is the only solution I found which fits for my situation. It's really fast with real numbers but when I use the complex number the estimated time is greater than simply using for loop.

Can anyone here please recommend the solution in which I can do multiplication of two complex 3D-matrices without using the for the loop but reduces computation time?

I am really looking for a positive response.

Regards abi

3 Comments
Show 1 older commentHide 1 older comment

Abi Waqas on 29 Aug 2017

Open in MATLAB Online

I am using Windows 10, MATLAB 2016 b. I have placed the code below. Please note that the code I place here is but different from the original. I modified it to just to the timings information. The Matrix A and B, W1 and COEFF_S I keep them same for simplification but they are going to change for each iteration (comes from another part of the code). In the end Final and A matrices are same.

If you just use the real number in COFF(1:3) you will see a clear speed up up-to 3x. But when I am using complex numbers then "mtimesx" slows down drastically.

if true
  clear  ; close all ; clc 
W1=zeros(500,500,500) ;
PRODOTTO3 = ((rand(500,500,500)>0.97));
PD3 = sparse(PRODOTTO3(:,:)) ;
COEFF = zeros(round(npolinomi),1) ; 
COEFF(1:3) = [0.5+0.5i , 0.12+0.05i,0.001+0.001i] ; 
COEFF=sparse(COEFF) ; 
COEFF_S = sparse(size(COEFF,1),size(COEFF,1)) ; 
 tic ; 
 for l = 1:500
 COEFF_S=reshape( COEFF.'*(PD3(:,:)) , size(COEFF,1), []).';
 W1(:,:,l)=full(COEFF_S) ;
 end
 for j = 1 : 30
       A = mtimesx(W1,W1) ;
 end
 toc 
 tic ; 
 for l = 1:500 %freq.
 COEFF_S=reshape( COEFF.'*(PD3(:,:)) , size(COEFF,1), []).';
    for j = 1 : 30
      B = COEFF_S*COEFF_S ; 
    end
 Final(:,:,l)=full(B) ;
 end
 toc

Jan on 29 Aug 2017

Just a note: See the preceding question: https://www.mathworks.com/matlabcentral/answers/353797-can-i-reduce-computation-time-for-the-matrix-computation-sparse-matrices

Sign in to comment.

Sign in to answer this question.

Answer 1

James Tursa on 29 Aug 2017

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/354352-3d-matrix-multiplication-with-reduce-computation-time#answer_279596

Some testing issues I had:

I couldn't run the posted code because npolinomi was not defined. Looking at the code, I made a guess and set npolinomi = 500. The code then ran, but ran out of memory on my machine. So I dropped all of the 500's to 200's just to get things to run.

Speed observations:

It's appears to be somewhat of apples to oranges comparison here. You have sparse and full matrices in your testing mix, as well as logical variables. The fact that you have sparse matrices involved make a HUGE difference and this was not mentioned in your original post. For matrix multiplies, MTIMESX only has code for full double and full single variables. So you appear to be comparing a full*full matrix multiply (via MTIMESX) along with the W1 building vs a sparse*sparse multiply. The more sparse the matrices are, the better the sparse*sparse formulation will perform for speed. So if the matrices are sparse enough, it would not surprise me if the sparse code was faster. In addition, in my 32-bit R2011a PCWIN version the MTIMESX code runs faster, so it appears the newer MATLAB versions have been improved for sparse matrix multiply speed.

So, back to the original question of how to speed up your code given that there are sparse matrices involved. This is hard to say because the code you show is boiled down to the point that I cannot really tell if there is anything that could be sped up with a mex routine or with other methods. Working with sparse matrices in mex routines is not trivial, particularly if you are trying to combine them or incrementally build them. So I would have to see more detail about your real code before I could make any suggestions. E.g., how is COEFF_S really built? Reshaping a sparse matrix (even transposing a sparse vector) invokes a deep data copy, so maybe something could be changed here to avoid that data copy. Are there any calculations that could be moved outside the loop? Are there any full/sparse conversions that could be avoided? Etc, etc. We can't really advise on any of this based on the boiled down code posted above.

7 Comments
Show 5 older commentsHide 5 older comments

Abi Waqas on 29 Aug 2017

Open in MATLAB Online

Dear James

Thank you so much for the detialed response.

1.sorry for the mistake I made for not defining the npolinomi and your guess was correct.

2. The logical matrix can be avoided by just putting double. I really don't need logical matrices. (but this don't put any effect on timing of calculation)

3. I know it is a comparison of apple to orange but 'mtimesx' don't accept sparse matrix that's why I converted into full matrices. One thing to note here is, in the second case multiplication is done using for loop instead of using 'mtimes' which technically consume more time.

4. I have looked at the timing profile of the code and 98% of time spent is on loop below.

for j = 1 : 30
       A = mtimesx(W1,W1) ;
end

So there shouldn't be any concern regarding the reshaping and building of COEFF_S because here just 2% of the time is spent.

5. My real question actually was in these two cases when I use COEFF(1:3) = [0.5 , 0.12,0.001] instead of COEFF(1:3) = [0.5+0.5i , 0.12+0.05i,0.001+0.001i] the first case 4 times faster than the second case (even beating the sparse multiplication and that's great) but what I don't understand is what exactly happen when I use complex number.

6. In real code, the only difference is PRODOTTO3 is the result of some very complex integration which I choose not to show here but the end result of that integration if the sparse matrix of given dimension (500).

James Tursa on 29 Aug 2017

Open in MATLAB Online

Matt has already explained the 4x complex speed difference, so I won't get into that.

I don't understand this comment of yours at all:

"One thing to note here is, in the second case multiplication is done using for loop instead of using 'mtimes' which technically consume more time."

In your code above, the j=1:30 loop isn't needed for anything. You are just repeating the same calculation 30 times. Obviously to speed up this particular case I would simply advise getting rid of the loop entirely and do the code that is inside the loop only once. But I assume this loop is present because it is a boiled down version of your real code where this loop actually does something. Since I haven't seen your real code here is is impossible for me to speculate on how to speed it up. And that is the crux of the issue here ... without seeing what your real code is doing, we don't have enough information to offer advice on speeding it up.

Example, take the following statement:

Final(:,:,l)=full(B)

That assignment can be done directly in a mex routine without the need to physically form full(B) first. But how much difference will this small change make and will other techniques make a bigger difference? Hard to say since I don't know what is really going on with the other parts of the code inside the loops in your real code.

James Tursa on 29 Aug 2017

I need to think about the 131s --> 136s timing result you are seeing with the sparse code ...

James Tursa on 30 Aug 2017

Edited: James Tursa on 30 Aug 2017

Open in MATLAB Online

So, after thinking about it and making a few crude tests, I have a hypothesis (which may or may not be true).

The matrix multiply employed by MATLAB (and MTIMESX) for full matrices is contained in a 3rd party BLAS library that is highly optimized for speed, and internally employs multi-threading. E.g., the generic matrix multiply routine is called DGEMM for real double precision matrices. The complex counterpart to this routine is called ZGEMM, but this routine requires the real and imaginary parts of variables to be interleaved in memory like Fortran and C/C++. But in MATLAB, the real and imaginary parts of complex variables are stored in different parts of memory. So MATLAB had to make a choice when doing a complex matrix multiply ... copy the real and imaginary parts into temporary interleaved memory to call ZGEMM, or just call DGEMM on the appropriate real and imaginary parts? MATLAB chose the latter (and in fact that is what MTIMESX does also). Going from a real matrix multiply to a complex matrix multiply then means that DGEMM is called four times instead of once. In both cases the DGEMM code executes in a multi-threaded fashion as long as the sizes are large enough to justify it. So you see the 3x - 4x speed difference. That is all to be expected. On my system the CPU monitor is pegged at about 50% for both of these cases (I assume this represents 1/2 of the logical cores but all of the physical cores in the system).

For sparse real matrix multiply, it does not appear to me (when watching the CPU monitor) that the calculation is multi-threaded. The CPU usage is down at around 6% typically, which is what I would expect for my system is there is no multi-threading going on (50% / 8 cores).

For complex sparse matrix multiply, I suspect that this is not multi-threaded either, but the elements are calculated serially. Only those calculations where both operands are non-zero are done (see note below). My suspicion is that some type of appropriate SSE calculation(s) is done to calculate each individual complex element multiply, and that is where the complex speed increase is coming from. I.e., some SSE instruction(s) can essentially carry out the complex element multiply in about the same time as a single real element multiply. Hence you don't see much difference in the sparse real vs complex matrix multiply total times.

But ... THIS IS ALL JUST A GUESS!

Side Note: There is a mismatch in the underlying sparse matrix multiply algorithm used when there are NaN's involved. E.g.,

>> a
a =
       NaN         0         0         0         0
         0         0         0         0         0
         0         0         0         0         0
         0         0         0    0.9747         0
    0.9696         0         0         0         0
>> s = sparse(a);
>> a*a
ans =
       NaN       NaN       NaN       NaN       NaN
       NaN         0         0         0         0
       NaN         0         0         0         0
       NaN         0         0    0.9501         0
       NaN         0         0         0         0
>> full(s*s)
ans =
       NaN         0         0         0         0
         0         0         0         0         0
         0         0         0         0         0
         0         0         0    0.9501         0
       NaN         0         0         0         0

You can see that the NaN's did not propagate appropriately in the sparse matrix multiply case. The sparse algorithm did not do any of the NaN*0 calculations ... it simply assumed the result would be 0 since one of the operands was 0. This, I suppose, could probably be considered a bug in MATLAB. Maybe I will report it ...

Sign in to comment.

3d matrix multiplication with reduce computation time

3 Comments
Show 1 older commentHide 1 older comment

Accepted Answer

7 Comments
Show 5 older commentsHide 5 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

3d matrix multiplication with reduce computation time

3 Comments Show 1 older commentHide 1 older comment

Accepted Answer

7 Comments Show 5 older commentsHide 5 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

7 Comments
Show 5 older commentsHide 5 older comments