MeX implementation of Tensorproduct multiplications

3 views (last 30 days)
Dear all,
I am searching for a more efficient way to calculate tensorproduct multiplications of arbitrary sizes. First of all, i have already searched the web to get inspired by different solutions, e.g. here Tensorproduct at stackexchange:
1.) External Libraries, e.g. Eigen C++.
2.) Pure Matlab solution based in reshape and shiftdim.
3.) Matlab Tensor Toolbox.
However, so far i ended up with a MeX-file implementation with OpenMP parallelization.
--
In the following, i will give a short introduction of the math behind the implementation:
Suppose you have a matrix vector multiplication, where a matrix C is constructed by a Kronecker product of matrices A with size (n x m) and B with size (p x q). More informations can be found here Wikipedia.
.
In two dimensions this operation can be performed with O(npq+qnm) operations instead of O(mqnp) operations.
--
So far so good. In following example the sizes of the arrays are
matrix_xi = 4x4,
matrix_eta = 4x4,
vector = 16x500000x5,
(new version: vector = 16x2500000)
and may be initialized with ones or zeros. Moreover, rather than calculating only one Kronecker matrix vector multiplication, i want to calculate e.g. 500000x5 operations with one call (see the size of vector).
--
My current MeX-C file implementation:
/*************************************************
* CALLING:
*
* out = tensorproduct(A,B,vector)
*
*************************************************/
#include "mex.h"
#include "omp.h"
#define PP_A prhs[0]
#define PP_B prhs[1]
#define PP_vector prhs[2]
#define PP_out plhs[0]
#define n 4
#define m 4
#define p 4
#define q 4
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
const mwSize *dim;
int i,j,k,s,l;
double temp[n*q];
double *A = mxGetPr(PP_A);
double *B = mxGetPr(PP_B);
double *vector = mxGetPr(PP_vector);
dim = mxGetDimensions(PP_vector);
l = dim[1];
PP_out = mxCreateDoubleMatrix(l*n*p,1,mxREAL);
double *out = mxGetPr(PP_out);
#pragma omp parallel for private(i,j,k,s,temp)
for(k=0; k<l; k++){
for(i=0; i<n; i++){
for(j=0; j<q; j++){
temp[i+j*n]=0;
}
}
for(s=0; s<m; s++){
for(i=0; i<n; i++){
for(j=0; j<m; j++){
temp[i+s*n]+=A[i+j*n]*vector[j+s*m+k*m*p];
}
}
}
for(s=0; s<n; s++){
for(i=0; i<p; i++){
for(j=0; j<q; j++){
out[i*n+s+k*n*p]+=B[i+j*p]*temp[j*n+s];
}
}
}
}
}
--
The code can be compiled with:
mex FFLAGS='$FFLAGS -fopenmp -Ofast -funroll-loops' ...
CFLAGS='$CFLAGS -fopenmp -Ofast -funroll-loops' ...
LDFLAGS='$LDFLAGS -fopenmp -Ofast -funroll-loops' ...
COPTIMFLAGS='$COPTIMFLAGS -fopenmp -Ofast -funroll-loops' ...
LDOPTIMFLAGS='$LDOPTIMFLAGS -fopenmp -Ofast -funroll-loops' ...
DEFINES='$DEFINES -fopenmp -Ofast -funroll-loops' ...
tensorproduct.c
--
Currently on my Notebook: (Ubuntu 18.04, GCC 7.5.0, 4 Cores)
matrix_xi = ones(4,4);
matrix_eta = ones(4,4);
vector = ones(16,500000,5);
n = 50;
tic
for i=1:n
vector_out = reshape(tensorproduct(matrix_xi,matrix_eta,vector),size(vector));
end
toc
% Elapsed time is 8.036490 seconds.
--
A quite simple Matlab implementation without the use of the Kronecker property gives
matrix=kron(matrix_xi,matrix_eta);
tic
for i=1:n
vector_out = reshape(matrix*vector(:,:),size(vector));
end
toc
% Elapsed time is 10.489606 seconds.
--
With a different size of n,m,p,q = 7 the difference surprisingly does not change and gets worse (here you have to change the constants in the C-file)
matrix_xi = ones(7,7);
matrix_eta = ones(7,7);
vector = ones(49,500000,5);
n = 50;
tic
for i=1:n
vector_out = reshape(tensorproduct(matrix_xi,matrix_eta,vector),size(vector));
end
toc
% Elapsed time is 31.436541 seconds.
matrix=kron(matrix_xi,matrix_eta);
tic
for i=1:n
vector_out = reshape(matrix*vector(:,:),size(vector));
end
toc
% Elapsed time is 35.829957 seconds.
Luckily I am still faster than Matlab.
--
Note that only n,m,p,q are constants during compile time. To make the code useable for arbitrary sizes i have written a Macro to generate many C-files with different const n,m,p,q which will be included later with a header and a main C-file. This will help the compiler to unrole and inline the code.
This is the most efficient way i have achieved so far. I hope there are some experts here in this forum who can help to optimize these few lines. I am also open for other solutions, e.g. Matlab based or based on external libraries.
Thank you!
  4 Comments
Bruno Luong
Bruno Luong on 6 Dec 2020
Edited: Bruno Luong on 6 Dec 2020
Yes I do know tensor and kron. But your notation is confusing. Actually your second statement
A = matrix_xi
B = matrix_eta
is wrong. Actually it's
B = matrix_xi.';
A = matrix_eta;
And your vector consists of a set of multiple X matrices.
Each reshape(vector(:,p,q), size(A,2), size(B,1)) is an X.
Your description is very confusing.
I knew what you want to compute. For the general reader, the description is careless and confusing.
ConvexHull
ConvexHull on 6 Dec 2020
Edited: ConvexHull on 6 Dec 2020
Sorry, I am a little bit inaccurate here. I hope you got it.
Summarizing, i simply want to use the Kronecker property to apply many (500000*5) efficient matrix vector multiplications using O(npq+qnm) with one call.
Thank you for your help.
Edit: My main concern is a more efficient C implementation.

Sign in to comment.

Accepted Answer

Bruno Luong
Bruno Luong on 6 Dec 2020
Edited: Bruno Luong on 22 Aug 2021
I propose two other methods of MATLAB.
The first simply replaces (:,:) by RESHAPE. The first doing some data copy, the second do not. It seems to be the fatest.
The second method uses MATLAB pagemtimes introduced in R2020b. (Edit: Jan corrects the release)
function benchtensor
matrix_xi = rand(7,7);
matrix_eta = rand(7,7);
vector = rand(49,500000,5);
n = 50;
tic
for i=1:n
vector_out = reshape(tensorproduct(matrix_xi,matrix_eta,vector),size(vector));
end
toc % Elapsed time is 19.429296 seconds.
matrix=kron(matrix_xi,matrix_eta);
tic
for i=1:n
vector_out = reshape(matrix*vector(:,:),size(vector));
end
toc % Elapsed time is 28.08963 seconds.
tic
for i=1:n
vector_out = reshape(matrix*reshape(vector,49,[]),size(vector));
end
toc % Elapsed time is 14.30025 seconds.
B = matrix_xi.';
A = matrix_eta;
tic
for i=1:n
X = reshape(vector, size(A,2), size(B,1), []);
CX = pagemtimes(pagemtimes(A, X), B); % Reshape CX if needed
end
toc % Elapsed time is 29.296709 seconds.
end
  17 Comments
ConvexHull
ConvexHull on 23 Aug 2021
Edited: ConvexHull on 23 Aug 2021
Note that i merged k/l loops in the original post as suggested. For the C-code you have to use vector=size(49x2500000) instead of vector=size(49x500000x5) .
The other two hints did not give me any speed up.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!