Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
What code does matlab use for Matrix multiplication? Does it use Strassen?

Subject: What code does matlab use for Matrix multiplication? Does it use Strassen?

From: Nick

Date: 13 Mar, 2012 22:29:18

Message: 1 of 5

I am getting unexpectedly fast results using matlab's matrix multiplication and I am wondering if it is doing something besides straight multiplication, such as using strassen's algorithm.

I am benchmarking the matrix multiply at over 30 GigaFlops on my cpu, however, I am running only one thread (using -singleCompThread at startup), and my computer is 2.25 Ghz. Since I am doing single precision, SSE instructions allow 4 multiplies or adds per instruction, this allows 9 Gigaflops. Most computers can execute multiple instructions at once, however I know from my own tests, and consulting the intel manual: http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf (table C-10 and C-10A) that my computer can do at most 2 floating point operations in one cycle. This puts the theoretical maximum bound at 18Gigaflops, yet MATLAB seems to give much higher performance. How can this be?

Below is my code that I tested with

sz=2.^(0:12)
for s=1:numel(sz)
    A=rand(sz(s),sz(s),'single');
    tic;
    A'*A;
    t=toc;
    fprintf('sz=%d, time=%3.3gms, GFlops=%3.3g\n', sz(s), t*1000, 2*size(A,1)^3/1e9/t)
end

OUTPUT

sz=1, time=0.0457ms, GFlops=4.37e-005
sz=2, time=0.0199ms, GFlops=0.000803
sz=4, time=0.00634ms, GFlops=0.0202
sz=8, time=0.0086ms, GFlops=0.119
sz=16, time=0.0086ms, GFlops=0.952
sz=32, time=0.029ms, GFlops=2.26
sz=64, time=0.0503ms, GFlops=10.4
sz=128, time=0.268ms, GFlops=15.7
sz=256, time=1.61ms, GFlops=20.9
sz=512, time=10.8ms, GFlops=24.8
sz=1024, time=76.4ms, GFlops=28.1
sz=2048, time=572ms, GFlops= 30
sz=4096, time=4.36e+003ms, GFlops=31.5

Subject: What code does matlab use for Matrix multiplication? Does it

From: Nasser M. Abbasi

Date: 13 Mar, 2012 23:06:48

Message: 2 of 5

On 3/13/2012 5:29 PM, Nick wrote:
> I am getting unexpectedly fast results using matlab's matrix multiplication and
>I am wondering if it is doing something besides straight multiplication,
>such as using strassen's algorithm.


http://www.mathworks.com/help/techdoc/rn/f14-998197.html

"MATLAB uses Basic Linear Algebra Subprograms (BLAS) for
its vector inner product, matrix-vector product, matrix-matrix
product, and triangular solvers in \. MATLAB also uses BLAS
behind its core numerical linear algebra routines from
Linear Algebra Package (LAPACK), which are used in functions
like chol, lu, qr, and within the linear system solver \."

Subject: What code does matlab use for Matrix multiplication? Does it use Strassen?

From: Roger Stafford

Date: 13 Mar, 2012 23:25:18

Message: 3 of 5

"Nick" wrote in message <jjohnu$iv3$1@newscl01ah.mathworks.com>...
> I am getting unexpectedly fast results using matlab's matrix multiplication and I am wondering if it is doing something besides straight multiplication, such as using strassen's algorithm.
> .... This puts the theoretical maximum bound at 18Gigaflops, yet MATLAB seems to give much higher performance. How can this be?
> ......
> sz=4096, time=4.36e+003ms, GFlops=31.5
- - - - - - - - -
  If the matlab compiler is smart enough it could recognize that your A'*A matrix is Hermitian and thereby reduce the number of flops needed by half which would then fall below your theoretical max of 18 Gigaflops. I seem to recall hearing somewhere that that is just what the compiler does in that case. Try it for a general A*B and see how it performs.

Roger Stafford

Subject: What code does matlab use for Matrix multiplication? Does it use Strassen?

From: Nick

Date: 13 Mar, 2012 23:36:17

Message: 4 of 5

you're right. Thank you! I've been trying to figure that out for a long time.
A=rand(4096,4096,'single'); B =rand(4096,4096,'single');
tic; A'*B; toc
Elapsed time is 8.106026 seconds.
gFlops = size(A,1)^3*2/1e9/8.08793
gFlops =

   16.9931

Almost exactly as I expected

"Roger Stafford" wrote in message <jjol0u$sa0$1@newscl01ah.mathworks.com>...
> "Nick" wrote in message <jjohnu$iv3$1@newscl01ah.mathworks.com>...
> > I am getting unexpectedly fast results using matlab's matrix multiplication and I am wondering if it is doing something besides straight multiplication, such as using strassen's algorithm.
> > .... This puts the theoretical maximum bound at 18Gigaflops, yet MATLAB seems to give much higher performance. How can this be?
> > ......
> > sz=4096, time=4.36e+003ms, GFlops=31.5
> - - - - - - - - -
> If the matlab compiler is smart enough it could recognize that your A'*A matrix is Hermitian and thereby reduce the number of flops needed by half which would then fall below your theoretical max of 18 Gigaflops. I seem to recall hearing somewhere that that is just what the compiler does in that case. Try it for a general A*B and see how it performs.
>
> Roger Stafford

Subject: What code does matlab use for Matrix multiplication? Does it use Strassen?

From: James Tursa

Date: 14 Mar, 2012 00:34:11

Message: 5 of 5

"Nick" wrote in message <jjollh$10v$1@newscl01ah.mathworks.com>...
> you're right. Thank you! I've been trying to figure that out for a long time.
> A=rand(4096,4096,'single'); B =rand(4096,4096,'single');
> tic; A'*B; toc
> Elapsed time is 8.106026 seconds.
> gFlops = size(A,1)^3*2/1e9/8.08793
> gFlops =
>
> 16.9931
>
> Almost exactly as I expected
>
> "Roger Stafford" wrote in message <jjol0u$sa0$1@newscl01ah.mathworks.com>...
> > "Nick" wrote in message <jjohnu$iv3$1@newscl01ah.mathworks.com>...
> > > I am getting unexpectedly fast results using matlab's matrix multiplication and I am wondering if it is doing something besides straight multiplication, such as using strassen's algorithm.
> > > .... This puts the theoretical maximum bound at 18Gigaflops, yet MATLAB seems to give much higher performance. How can this be?
> > > ......
> > > sz=4096, time=4.36e+003ms, GFlops=31.5
> > - - - - - - - - -
> > If the matlab compiler is smart enough it could recognize that your A'*A matrix is Hermitian and thereby reduce the number of flops needed by half which would then fall below your theoretical max of 18 Gigaflops. I seem to recall hearing somewhere that that is just what the compiler does in that case. Try it for a general A*B and see how it performs.
> >
> > Roger Stafford

MATLAB uses the following BLAS routines for symmetric cases:

DSYRK and SSYRK: Performs one of the symmetric rank k operations

* C := alpha*A*A' + beta*C,
*
* or
*
* C := alpha*A'*A + beta*C,
*
* where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k
* matrix in the first case and a k by n matrix in the second case.

DSYR2K and SSYR2K: Performs one of the symmetric rank 2k operations

* C := alpha*A*B' + alpha*B*A' + beta*C,
*
* or
*
* C := alpha*A'*B + alpha*B'*A + beta*C,
*
* where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k
* matrices in the first case and k by n matrices in the second case.

I haven't checked recently, but as of a few years ago MATLAB used these routines for most symmetric cases, but not all possible symmetric cases.

James Tursa

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us