Asked by Adam Peterson
on 4 Nov 2019

I have two n-by-n matrices, X and Y. I want to perform the following procedure, where all multiplication steps are elementwise multiplication.

- The first result is obtained by multiplying both of the n-by-n matrices and then summing the result to yield a row vector. This row vector is saved into the first row of my results matrix.
- The second result is obtained by multiplying X(2:end, 2:end) by Y(1:end-1, 2:end) and summing the result to yield a row vector. This row vector is saved into the second row of my results matrix.
- The third result is obtained by multiplying X(3:end, 3:end) by Y(1:end-2, 3:end) and summing the result to yield a row vector. This row vector is saved into the third row of my results matrix.
- This pattern repeats until the nth result is obtained by multiplying the bottom right element from X by the upper right element from Y. This result is saved into the last row of my results matrix.

To remove any doubt about how my matrices are setup, here is a graphic showing my two original n-by-n matrices, and each of the submatrices:

It's important for later computations in my workflow that the result of each step is left-aligned in its respective row of the results matrix:

I currently calculate my results using a for loop. On each iteration, I index the respective submatrices from X and Y, multiply them, sum the result, and save the row vector to the respective row of the results matrix. I cannot imagine a more efficient approach, but the ingenuity of other MATLAB users has certainly impressed me before. Is there a faster way to do this? In my case, the matrices I use are pretty large, something like 10,000-by-10,000 elements.

Answer by James Tursa
on 4 Nov 2019

Edited by James Tursa
on 5 Nov 2019

Accepted Answer

Here is the naive mex code. Could probably be made faster by doing the for-loop in parallel or trying to optimize cache hits, but I didn't spend the time to code that. Avoids all temporary data copies and runs about 10x faster than m-code on my machine:

>> mex xymult.c -lmwblas -largeArrayDims

Building with 'Microsoft Visual C++ 2013 Professional (C)'.

MEX completed successfully.

>> n = 5000;

>> x = randi(10,n,n); y = randi(10,n,n);

>> tic;c=xymult(x,y);toc

Elapsed time is 49.459822 seconds.

>> tic;m=xymult_m(x,y);toc

Elapsed time is 480.140740 seconds.

>> isequal(c,m)

ans =

1

The mex code:

/* File: xymult.c */

/* Does a special multiply & summing from Answers question */

/* Programmer: James Tursa */

/* Date: 4-Nov-2019 */

/* Complile command: mex xymult.c -lmwblas -largeArrayDims */

/* Includes ----------------------------------------------------------- */

#include "mex.h"

#include "blas.h"

#if false /* true = use dotproduct code , false = use BLAS ddot */

#define xDOT dotproduct /* Hard-coded loop */

#elif defined(__OS2__) || defined(__WINDOWS__) || defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64) || defined(_MSC_VER)

#define xDOT ddot /* Win dot product name */

#else

#define xDOT ddot_ /* linux dot product name */

#endif

double dotproduct( mwSignedIndex *n, double *x, mwSignedIndex *incx,

double *y, mwSignedIndex *incy)

{

double result = 0.0;

mwSignedIndex i;

for( i=0; i<*n; i++ ) {

result += *x * *y;

x += *incx;

y += *incy;

}

return result;

}

/* Gateway ------------------------------------------------------------ */

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])

{

double *A, *B, *C, *a, *b, *c;

mwSignedIndex i, j, n, N, INCX=1, INCY=1;

if( nrhs != 2 ||

!mxIsDouble(prhs[0]) || !mxIsDouble(prhs[1]) ||

mxIsSparse(prhs[0]) || mxIsSparse(prhs[1]) ||

mxIsComplex(prhs[0]) || mxIsComplex(prhs[1]) ||

mxGetNumberOfDimensions(prhs[0]) != 2 ||

mxGetNumberOfDimensions(prhs[1]) != 2 ||

mxGetM(prhs[0]) != mxGetN(prhs[0]) ||

mxGetM(prhs[1]) != mxGetN(prhs[1]) ||

mxGetM(prhs[0]) != mxGetM(prhs[1]) ) {

mexErrMsgTxt("Need exactly two full double square inputs");

}

if( nlhs > 1 ) {

mexErrMsgTxt("Too many outputs");

}

N = n = mxGetM(prhs[0]);

plhs[0] = mxCreateDoubleMatrix(N,N,mxREAL);

A = (double *) mxGetData(prhs[0]);

B = (double *) mxGetData(prhs[1]);

C = (double *) mxGetData(plhs[0]);

for( i=0; i<N; i++ ) {

a = A + i * (N+1); /* step A to next diagonal element */

b = B + i * N; /* step B to next column */

c = C + i; /* step C to next row */

for( j=0; j<n; j++ ) { /* sum(a.*b) is just dot(a_column,b_column) in loop */

*c = xDOT( &n, a, &INCX, b, &INCY );

a += N; b += N; c += N; /* move all pointers to next columns */

}

n--;

}

}

The m-code:

function c = xymult_m(a,b)

c = zeros(size(a));

n = size(a,1);

for k=1:n

c(k,1:n-k+1) = sum(a(k:end,k:end).*b(1:end-k+1,k:end));

end

end

Adam Peterson
on 5 Nov 2019

Thanks for taking the time to do this. I've written some MEX functions in the past, but certainly nothing so complicated. My C/C++ skills aren't very strong.

I've compiled it and tested it and it works perfectly! It gives a full order of magnitude improvement in execution time, which is a huge help to me!

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 4 Comments

## JESUS DAVID ARIZA ROYETH (view profile)

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/489082-how-can-i-multiply-square-submatrices-more-efficiently#comment_763541

## Stephen Cobeldick (view profile)

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/489082-how-can-i-multiply-square-submatrices-more-efficiently#comment_763570

## James Tursa (view profile)

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/489082-how-can-i-multiply-square-submatrices-more-efficiently#comment_763581

## Adam Peterson (view profile)

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/489082-how-can-i-multiply-square-submatrices-more-efficiently#comment_763713

Sign in to comment.