version 1.10.0.0 (256 KB) by
James Tursa

Beats MATLAB 300% - 400% in some cases ... really!

MTIMESX is a fast general purpose matrix and scalar multiply routine that has the following features:

- Supports multi-dimensional (nD, n>2) arrays directly

- Supports Transpose, Conjugate Transpose, and Conjugate pre-operations

- Supports singleton expansion

- Utilizes BLAS calls, custom C loop code, or OpenMP multi-threaded C loop code

- Can match MATLAB results exactly or approximately as desired

- Can meet or beat MATLAB for speed in most cases

MTIMESX has six basic operating modes:

- BLAS: Always uses BLAS library calls

- LOOPS: Always uses C loops if available

- LOOPSOMP: Always uses OpenMP multi-threaded C loops if available

- MATLAB: Fastest BLAS or LOOPS method that matches MATLAB exactly (default)

- SPEED: Fastest BLAS or LOOPS method even if it doesn't match MATLAB exactly

- SPEEDOMP: Fastest BLAS, LOOPS, or LOOPOMP method even if it doesn't match MATLAB exactly

MTIMESX inputs can be:

single

double

double sparse

The general syntax is (arguments in brackets [ ] are optional):

mtimesx( [directive] )

mtimesx( A [,transa] ,B [,transb] [,directive] )

Where transa, transb, and directive are the optional inputs:

transa = A character indicating a pre-operation on A:

transb = A character indicating a pre-operation on B:

The pre-operation can be any of:

'N' or 'n' = No pre-operation (the default if trans_ is missing)

'T' or 't' = Transpose

'C' or 'c' = Conjugate Transpose

'G' or 'g' = Conjugate (no transpose)

directive = One of the modes listed above, or other directives

Examples:

C = mtimesx(A,B) % performs the calculation C = A * B

C = mtimesx(A,'T',B) % performs the calculation C = A.' * B

C = mtimesx(A,B,'g') % performs the calculation C = A * conj(B)

C = mtimesx(A,'c',B,'C') % performs the calculation C = A' * B'

mtimesx('SPEEDOMP','OMP_SET_NUM_THREADS(4)') % sets SPEEDOMP mode with number of threads = 4

For nD cases, the first two dimensions specify the matrix multiply involved. The remaining dimensions are duplicated and specify the number of individual matrix multiplies to perform for the result. i.e., MTIMESX treats these cases as arrays of 2D matrices and performs the operation on the associated parings. For example:

If A is (2,3,4,5) and B is (3,6,4,5), then

mtimesx(A,B) would result in C(2,6,4,5), where C(:,:,i,j) = A(:,:,i,j) * B(:,:,i,j), i=1:4, j=1:5

which would be equivalent to the MATLAB m-code:

C = zeros(2,6,4,5);

for m=1:4

for n=1:5

C(:,:,m,n) = A(:,:,m,n) * B(:,:,m,n);

end

end

The first two dimensions must conform using the standard matrix multiply rules taking the transa and transb pre-operations into account, and dimensions 3:end must match exactly or be singleton (equal to 1). If a dimension is singleton then it is virtually expanded to the required size (i.e., equivalent to a repmat operation to get it to a conforming size but without the actual data copy). This is equivalent to a bsxfun capability for matrix multiplication.

James Tursa (2020). MTIMESX - Fast Matrix Multiply with Multi-Dimensional Support (https://www.mathworks.com/matlabcentral/fileexchange/25977-mtimesx-fast-matrix-multiply-with-multi-dimensional-support), MATLAB Central File Exchange. Retrieved .

Created with
R2007a

Compatible with any release

**Inspired:**
Small size linear solver, frontal, Merton Structural Credit Model (Matrixwise Solver)

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!Create scripts with code, output, and formatted text in a single executable document.

BrandonThis function works well for me to multiply a list of matricies 3x3xN with a vector 3x1xN and get a 3x1xN result. Thanks. The build script is still broken this many years later, but the comments below tell how to fix it. When you try to run the mtimesx_build.m script, you get error compiler not set up. Fix this by changing line 166 to say mexopts = [prefdir '\mex_C_win64.xml]; and then re run the build script. Worked on Windows 10, Matlab 2019b.

Anton LoubmanVal SchmidtI have long been a big fan of this function and for a long time it was indeed 100's of times faster than MATLAB's native multiplication. But for some time MATLAB has improved their routines and for all a handful of cases this code proves to be much slower than native MATLAB.

That said, the one thing this code can do that I do not think MATLAB is capable of natively a built-in function call, is a single call to execute a series of multiplications. This comes up, for example, in rotating a point cloud of thousands or millions of XYZ coordinates. Prior to mtimesx, to my knowledge it required looping through individual multiplications - prohibitively slow for large data sets. Will be happy when MATLAB solves that problem too.

Ville-Veikko WettenhoviIf you are compiling the file on Windows then the following code should work:

libdir = 'microsoft';

comp = computer;

mext = mexext;

lc = length(comp);

lm = length(mext);

cbits = comp(max(1:lc-1):lc);

mbits = mext(max(1:lm-1):lm);

if( isequal(cbits,'64') || isequal(mbits,'64') )

compdir = 'win64';

largearraydims = '-largeArrayDims';

else

compdir = 'win32';

largearraydims = '';

end

lib_blas = [matlabroot '\extern\lib\' compdir '\' libdir '\libmwblas.lib'];

d = dir('mtimesx.m');

mname = [d.folder '/' d.name];

cname = [mname(1:end-2) '.c'];

mex(cname,largearraydims,lib_blas,'COMPFLAGS="$COMPFLAGS /openmp"');

If you are using MinGW64, simply replace 'microsoft' with 'mingw64'.

Chantel CharleboisI am having the same problem as shan liu when trying to compile. I have tried using MinGW64 Compiler (C) and Microsoft Visual C++ 2017 (C) neither of which works.

Aditya Kunapulishan liumex -setup

MEX is configured to use 'MinGW64 Compiler (C)' for C language compilation.

My version is matlab2016a,windows10. I am running ECO code.

When I Run mtimex_build.m, I get the following error:

Build routine for mtimesx

... Checking for PC

... Finding path of mtimesx C source code files

... Found file mtimesx.c in D:\MATLAB2016\bin\ECO-master\external_libs\mtimesx\mtimesx.c

... Found file mtimesx_RealTimesReal.c in D:\MATLAB2016\bin\ECO-master\external_libs\mtimesx\mtimesx_RealTimesReal.c

... Opened the mexopts.bat file in C:\Users\liush\AppData\Roaming\MathWorks\MATLAB\R2016a\mex_C++_win64.xml

... Reading the mexopts.bat file to find the compiler and options used.

warning: ... Supported C/C++ compiler has not been selected with mex -setup

> In mtimesx_build (line 251)

warning: ... Assuming that Selected Compiler will link with Microsoft libraries

> In mtimesx_build (line 252)

warning: ... Continuing at risk ...

> In mtimesx_build (line 253)

... OpenMP compiler not detected ... you may want to check this website:

http://openmp.org/wp/openmp-compilers/

... Using BLAS library lib_blas = 'D:\MATLAB2016\extern\lib\win64\microsoft\libmwblas.lib'

... Now attempting to compile ...

mex('D:\MATLAB2016\bin\ECO-master\external_libs\mtimesx\mtimesx.c','-largeArrayDims',lib_blas,'-DCOMPILER=(unknown)')

Compile with 'MinGW64 Compiler (C)'.

D:\MATLAB2016\bin\ECO-master\external_libs\mtimesx\mtimesx.c: In function 'mexFunction':

D:\MATLAB2016\bin\ECO-master\external_libs\mtimesx\mtimesx.c:592:10: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]

ans = mexGetVariablePtr("caller","OMP_SET_NUM_THREADS");

^

... Well, *that* didn't work ...

This may be because an OpenMP compile option was added that the

compiler did not like. Attempting to compile again, but this time

will not add the '' option.

... OpenMP compiler not detected ... you may want to check this website:

http://openmp.org/wp/openmp-compilers/

... Using BLAS library lib_blas = 'D:\MATLAB2016\extern\lib\win64\microsoft\libmwblas.lib'

... Now attempting to compile ...

mex('D:\MATLAB2016\bin\ECO-master\external_libs\mtimesx\mtimesx.c','-largeArrayDims',lib_blas,'-DCOMPILER=(unknown)')

Compile with 'MinGW64 Compiler (C)'.

D:\MATLAB2016\bin\ECO-master\external_libs\mtimesx\mtimesx.c: In function 'mexFunction':

D:\MATLAB2016\bin\ECO-master\external_libs\mtimesx\mtimesx.c:592:10: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]

ans = mexGetVariablePtr("caller","OMP_SET_NUM_THREADS");

^

... Well, *that* didn't work either!

The mex command failed. This may be because you have already run

mex -setup and selected a non-C compiler, such as Fortran. If this

is the case, then rerun mex -setup and select a C/C++ compiler.

Error using mtimesx_build (line 434)

Unable to compile mtimesx.c

Can anyone help me? thank you very much.

zb liHi everyone

I am very interested in using mtimesx , but I am new to compiling. When I Run mtimex_build.m, I get the following error:

>>Error using: mtimesx_build (line 446)

Unable to compile mtimesx.c

Can anybody help me? Thank you and best wishes,

Oussama GASSABis it possible to use this function to calculate the matrix inverse A^(-1)?

Paulo RibeiroFor Windows users this link provides a nice workaround for C/C++ library setup, which works fine with MTIMESX:

https://www.mathworks.com/matlabcentral/answers/95039-why-does-the-sdk-7-1-installation-fail-with-an-installation-failed-message-on-my-windows-system

Then you need the following steps:

1. Open Matlab and setup the C/C++ compiler with: mex -setup. This should result in:

MEX configured to use 'Microsoft Windows SDK 7.1 (C)' for C language compilation

2. Modify line 166 from mtimex_build.m to: mexopts = [prefdir '\mex_C_win64.xml'];

3. Run mtimex_build.m

4. Use the provided functions from this package

Simon LebastaSimon LebastaAbsence of accessibility to MAC users makes the package undesirable in my code. Not that I don't mind going through the steps, but I doubt the people that I share my code with won't...

Duc Vinh NGUYENI tried to use mtimesx in Octave as QUANLONG HE said it was supported, but there is an error like this:

error: .../mtimesx.mex: failed to load: .../mtimesx.mex: undefined symbol: mxCreateSharedDataCopy

How I can solve this problem? Thanks

Yiou Liproblem solved. Just simply download xcode and set the blas path as: /Applications/MATLAB_R2017b.app/bin/maci64/libmwblas.dylib

and everything works.

Yiou LiI am using 2017b version Matlab on a mac. I tried to use 'mtimesx' function, but it returns an error message:

Non-PC auto build is not currently supported. You will have to

manually compile the mex routine. E.g., as follows:

>> blas_lib = 'the_actual_path_and_name_of_your_systems_BLAS_library'

>> mex('-DDEFINEUNIX','mtimesx.c',blas_lib)

or

>> mex('-DDEFINEUNIX','-largeArrayDims','mtimesx.c',blas_lib)

I tried to find the blas library, but have no idea where it is saved. I found a 'blas00.c' that seems to come with R, and set it as the blas library. Then, I got another error message:

No supported compiler or SDK was found.

Does it mean I need to download something else, say xcode?

Thanks in advance!!!

BerkerVery fast indeed. I updated QUANLONG HE's version with Ethan Beyak's comment on the fix for releases beyond R2014a, can be found in https://github.com/nonkreon/mtimesx

Ethan BeyakTo rehash what Clemens Nyffeler 12 Dec 2017 has written, I have also had issues with mtimesx usage for Windows 10 MATLAB 2016b.

For a C/C++ compiler, I downloaded the free version of Microsoft Visual Studio 2017.

In MATLAB, I entered the following commands:

mex -setup

mex -setup C

For mtimesx, we need to choose C since the files are .c

At which point, I changed line 166 to the proper .xml file in my prefdir:

mexopts = [prefdir '\mex_C_win64.xml'];

And with that, it works like a charm. Thanks for the amazing package, James Tursa!

Clemens NyffelerPlease james Tursa, update this great package, I would love to be able to use it.

I can't get it to work. I have the same problem as described in this post <https://mathworks.com/matlabcentral/answers/267167-mtimesx-crashes-when-used> and by Jonathan Sullivan

on 8 Feb 2012 on this page here.

Getting the mex compilation to work is already tricky: I had to download mingw using the addon manager, which is quite straightforward. Then I edit the mtimesx_build.m file on line 166 to find the xml instead of the bat file:

mexopts = [prefdir '\mex_C++_win64.xml'];

I changed line 246 to find the comiler definition (g++ insteado of gcc)

x = findstr(tline,'COMPILER=gcc');

Finally the MEX completes "successfully" in the second attempt without the "-fopenmp" option, despite a compiler warning.

The mtimesx function us unusable, it crashes with an "access violation" error whenever supplied with two nonscalar arguments.

I also tried the code you provided here <https://ch.mathworks.com/matlabcentral/answers/290786-mtimesx-library-compile-error> last year. Still no luck.

this is the output:

mtimesx_build_new

... Using BLAS library lib_blas = 'C:\Program Files\MATLAB\R2016a\extern\lib\win64\microsoft\libmwblas.lib'

Building with 'MinGW64 Compiler (C)'.

E:\Workspaces\Matlab\lib\mtimesx\mtimesx.c: In function 'mexFunction':

E:\Workspaces\Matlab\lib\mtimesx\mtimesx.c:592:10: warning: assignment discards 'const' qualifier from pointer target type

ans = mexGetVariablePtr("caller","OMP_SET_NUM_THREADS");

^

MEX completed successfully.

Evangelos PapoutsellisHello everyone, I have tried several things mentioned on the comments about installing it on MAC, but no success. Has anyone installed it on mac el capitan, Matlab2016a

Abi WaqasWhen I am using the complex number the code is slowing down, even slower than the simple for loop. Any suggestion?

xiaounfortunatly, it is useless. I don't know why!

>> A=rand(5000)+rand(5000)*1i;

>> B=rand(5000,3)+rand(5000,3)*1i;

>> tic;A' * B;toc

Elapsed time is 0.045380 seconds.

>> tic;mtimesx(A,'c',B,'SPEED');toc

Elapsed time is 0.138179 seconds.

QUANLONG HELatest version can be found at https://github.com/cybertk/mtimesx, with Octave support

RobiHello everyone, I encountered difficulties when I tried to compile the c files with mex. Matlab sent me the following message.

(I used the Lcc-win32 C 2.4.1 compiler with Matlab R2011b on Windows Vista 32 bits) Do somebody has an idea ?

Thanks

mex 'C:\Users\Utilisateur\mtimesx_RealTimesReal.c'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 57 invalid struct field declarations

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 57 syntax error; found `RealKind' expecting `}'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 57 skipping `RealKind' `r'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 57 syntax error; found `i' expecting `;'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 57 unrecognized declaration

Warning C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 57 empty declaration

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 63 syntax error; found `*' expecting `)'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 63 skipping `*' `x' `,' `mwSignedIndex' `n'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 63 extraneous old-style parameter list

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 64 syntax error; found `*' expecting `)'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 64 skipping `*' `Cpr' `,' `RealKind' `*' `Cpi' `,' `RealKind' ... up to `)'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 65 extraneous old-style parameter list

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 66 syntax error; found `*' expecting `)'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 66 skipping `*' `Cpr' `,' `RealKind' `*' `Cpi' `,' `RealKind' ... up to `)'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 67 extraneous old-style parameter list

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 68 syntax error; found `*' expecting `)'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 68 skipping `*' `Cpr' `,' `RealKind' `*' `Cpi' `,' `RealKind' ... up to `)'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 69 extraneous old-style parameter list

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 70 syntax error; found `*' expecting `)'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 70 skipping `*' `Cpr' `,' `RealKind' `*' `Cpi' `,' `RealKind' ... up to `)'

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 71 extraneous old-style parameter list

Error C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c: 72 too many errors

C:\PROGRA~1\MATLAB\R2011B\BIN\MEX.PL: Error: Compile of 'C:\USERS\UTILIS~1\mtimesx_RealTimesReal.c' failed.

Error using mex (line 206)

Unable to complete successfully.

Sama ShresthaRui LiI encountered issue:

Error using mtimesx_build (line 169)

A C/C++ compiler has not been selected with mex -setup

This is because i didn't install Microsoft Windows SDK for Windows 7 and .NET Framework 4, which can be downloaded:

https://www.microsoft.com/en-us/download/confirmation.aspx?id=8279

After this, everything works. Also, this link may be useful:

http://au.mathworks.com/matlabcentral/answers/96050-how-can-i-set-up-microsoft-visual-studio-2008-express-edition-for-use-with-matlab-7-7-r2008b-on-32

Thanks for sharing MTIMESX.

Chris Hooper@Vladislav @Iliya @James

It works for me with the following method:

1) change "mexopts = [prefdir '\mexopts.bat'];" in mtimesx_build.m to "mexopts = [prefdir '\mex_C_win64.xml'];"... or maybe it was "'\mex_C++_win64.xml'];", I'd have to check...

2) don't change anything inside the .xml file: no need to change the set COMPFLAGS line.

3) I have matlab version 2016b: use the compiler Microsoft professional visual C++ 2013 (a nice person did this for me on their computer and we were lucky that it worked on mine). MinGW-w64 compiler did not not work for either of us in this case.

GuillaumeHi, Did you try the logical * double to double process ?

the double conversion in matlab is very slow and I would be very good.

If not can you provide advice where I should modify the code ?

BR, guillaume

Earl VickersI'm sure it's great, but it doesn't work on 2015b & Mac.

armita maniI am trying to use mtimesx on matlab 2016a, windows 10 64-bit, but I get the error " A C/C++ compiler has not been selected with mex -setup error" and understand I should change mexopts = [prefdir '\mexopts.bat']; in mtimesx_build.m to mexopts = [prefdir '\mex_C++_win64.xml']; but in path C:\Users\..\AppData\Roaming\MathWorks\MATLAB\R2016a this xml dosent exit, how could I complime this library?

Thank you

Simon LeglaiveAndreas@Vladislav @Iliya @James Tursa Did you mean to set the flag in the .xml file like this?

set COMPFLAGS=-c -openmp $CFLAGS $DEFINES $MATLABMEX

For me it still crashes when I try and it feels hopeless. I have tried everything to make it work.

HanseDavid BeloFantastic work, James. Have you tried implementing matrix inversion on each slice as you alluded to in the past? I'm currently using multinv which works well but I wonder if there is a performance gain to be had using a similar approach as you've taken in mtimesx.

Levent KaracanPhlipDev-iL@Vladislav - See comment by Martin Lindfors from 26 May 2015. My solution is not something new and was suggested before.

Vladislav Yordanov@Iliya

I have the same issue. I can use the MTIMESX with Matlab 2010 and it works out great. Many thanks to James Tursa.

However I have the same issue as Tyler with Matlab 2014, could you please elaborate in detail how to fix it. I read your comment but I have no idea how to point to the xml.

Regards

Dev-iL@Tyler - On newer MATLAB versions, you don't need the mexopts.bat file but rather mex_C_win64.xml (or an XML with a similar name that fits your system).

Change the mtimesx_build.m to point to an .xml file and it should be ok. I manually added the "/openmp" to the "set COMPFLAGS ..." line in the xml. After this, compilation completed correctly and running mtimesx didn't crash my MATLAB (2015a).

Good luck!

@JamesTursa - I humbly suggest you modified the build script to also be compatible with newer MATLAB versions.

Tyler MillhouseI downloaded this mexopts.bat file, and the compiler now works.

https://gist.github.com/dgleich/1579415

Unfortunately, when I run mtimesx, Matlab crashes.

Tyler MillhouseHi James, thanks for putting this together. I keep getting this message: 'A C/C++ compiler has not been selected with mex -setup'. This is strange since I have run 'mex -setup' and 'mex --setup C++'. Mex selected the MinGW64 compiler. Do I need a different compiler? Thanks again!

FritzIs there any news on the cumprod implementation for MxMxK?

Thank you

http://www.mathworks.com/matlabcentral/newsreader/view_thread/159298

James Tursa@Michal: For generic 2D matrix multiply, there is no advertised speedup since mtimesx is calling exactly the same routine as MATLAB. Speedup would be if you were doing 2D page multiplies with nD matrices.

Michal KvasnickaFor MATLAB R2015b:

+++++++++++++++++++++++

>> a = rand(5000);

>> b = rand(5000);

>> tic, c2 = mtimesx(a,b,'speedomp'); toc

Elapsed time is 1.858891 seconds.

>> tic, c1 = a*b; toc

Elapsed time is 1.956402 seconds.

>> isequal(c1,c2)

ans =

1

So, where is the announced speedup???

NickMartin LindforsI fixed the issues by replacing line 166 in mtimesx_build.m from

mexopts = [prefdir '\mexopts.xml'];

to

mexopts = [prefdir '\mex_C++_win64.xml'];

PhilipI am having the same issues as Octavian. I am using 64-bit R2015a on 64-bit Windows 7. I have setup the compiler Microsoft Visual C++ 2013. However, when I run mtimesx_build, it produces the error:

Error using mtimesx_build (line 169)

A C/C++ compiler has not been selected with mex -setup.

I believe this may be related to Matlab switching to using xml files for mex from bat files. Thus mtimesx_build cannot find mexopts.bat. Any help resolving this issue would be greatly appreciated.

OctavianDear James,

I am very interested in using mtimesx for ND array operations, but I am new to compiling. When testing mtimesx, I get the following error:

Error using mtimesx_build (line 169)

A C/C++ compiler has not been selected with mex -setup.

I have a Windows 7 64-bit laptop. Which C/C++ compiler you would recommend to install

Thank you,

Octavian

Jeppe@Heather,

I'm running MATLABR2014b on a Macbook with Yosemite OS X. I installed Xcode from the AppStore and defined the blas:

blas_lib='/Applications/MATLAB_R2014b.app/bin/maci64/libmwblas.dylib'

mex('-DDEFINEUNIX','-largeArrayDims','mtimesx.c',blas_lib)

After running this in the command window I could use mtimesx.m

Hope this helps

Jeppe

HeatherThanks James for your great work; it is a really useful function. I am having trouble compiling mtimesx.c on my new macbook though. I cannot see from the comments below that anyone has managed to solve this.

My BLAS library is located at /usr/local/lib/ and is called libblas.a

Following the instructions given when I tried to call the mtimesx function in my matlab code I defined

blas_lib = '/usr/local/lib/libblas.a’

and then tried to manually compile the mex routine by

mex('-DDEFINEUNIX','-largeArrayDims','mtimesx.c',blas_lib)

after which I see `Building with 'Xcode with Clang’ followed by several lines of output and ultimately:

Error using mex

Undefined symbols for architecture x86_64:

"__gfortran_st_write", referenced from:

_xerbla_ in libblas.a(xerbla.o)

"__gfortran_st_write_done", referenced from:

_xerbla_ in libblas.a(xerbla.o)

"__gfortran_stop_string", referenced from:

_xerbla_ in libblas.a(xerbla.o)

"__gfortran_string_len_trim", referenced from:

_xerbla_ in libblas.a(xerbla.o)

"__gfortran_transfer_character_write", referenced from:

_xerbla_ in libblas.a(xerbla.o)

"__gfortran_transfer_integer_write", referenced from:

_xerbla_ in libblas.a(xerbla.o)

ld: symbol(s) not found for architecture x86_64

clang: error: linker command failed with exit code 1 (use -v to see invocation)

Can anybody help me? Thank you and best wishes,

Heather

Luca VivianinoaIs mtimesx working with mac? I have ios 10.9 and could not compile, got an the error - “Undefined symbols for architecture x86_64:...“.

Thank you.

David HRoyi AvitalCan anyone compare the performance to MATLAB R2014b?

Thank You.

OctavianIs mtimesx working on linux? I have ubuntu 12.04, and it cannot compile. Also, does it work with gpuArrays?

Thanks,

Octavian

Matt JHi James,

I had a question about how the flags 't' and 'c' are used by sparse matrix arguments. I assume that the flags are there to circumvent explicit transposes and its associated overhead. Yet, it doesn't appear that sparse mtimesx operations make use of them. In the test below, I see basically the same execution speeds in all 3 versions. Is it true that sparse operations can't make use of the flags, and is it deliberate/inevitable? It would be a great benefit to be able to avoid transposing tall sparse matrices because of the high memory consumption that comes with that.

J=sprand(3192027,3225,.0001);

tic;

mtimesx(J,'t',J);

toc

tic;

mtimesx(J.',J);

toc

tic;

J.'*J;

toc

JulienFrom my understanding, the file "mex_C_win64.xml" replaces "mexopts.bat".

"Error using mtimesx_build (line 169)

A C/C++ compiler has not been selected with mex -setup". I replaced line 169 by : mexopts = [prefdir '\mex_C_win64.xml'];. It did work with a few warnings...

Matt JThanks, James. This thread describes where the information has moved to

http://www.mathworks.com/matlabcentral/answers/67521#answer_138814

I can post those xml files somewhere, if you like. However, is there no way to get mtimesx to compile simply with the "mex" command, maybe with a small sacrifice in performance?

James TursaHmmm ... Well, I don't have access to R2014 so I will have to code in the blind on this. My understanding is that MATLAB will in real time make a guess as to the correct compiler to use based on source code extension. So the information that was in mexopts.bat (user had already selected a compiler) is no longer available for my automatic build stuff. I will put out an update shortly, but may have to ask for info direct from the user for compiler stuff. I will try to get it out there sometime next week.

Matt JThought I'd ping again. It doesn't look like mtimesx_build is compatible with R2014, since it looks for mexopts.bat, which is now gone. Can we hope to get a version that supports R2014? James?

Matt JI have the same problem as Safdar in R2014a. Even after running mex -setup, it appears as though mtimesx_build cannot find mexopts.bat. Possibly, the locations of relevant files/directories may have shifted?

StefanSafdarSorry for the multiple posts. It seems something went wrong when submitting the post. Sorry

SafdarI got the following message

... Build routine for mtimesx

... Checking for PC

... Finding path of mtimesx C source code files

... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c

... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c

Error using mtimesx_build (line 169)

A C/C++ compiler has not been selected with mex -setup

Error in mtimesx (line 271)

mtimesx_build;

SafdarHi James,

I unzipped the file in my working folder and tries to use mtimesx, I got the following message

... Build routine for mtimesx

... Checking for PC

... Finding path of mtimesx C source code files

... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c

... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c

Error using mtimesx_build (line 169)

A C/C++ compiler has not been selected with mex -setup

Error in mtimesx (line 271)

mtimesx_build;

SafdarHi James,

I unzipped the file in my working folder and tries to use mtimesx, I got the following message

... Build routine for mtimesx

... Checking for PC

... Finding path of mtimesx C source code files

... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c

... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c

Error using mtimesx_build (line 169)

A C/C++ compiler has not been selected with mex -setup

Error in mtimesx (line 271)

mtimesx_build;

SafdarHi James,

I unzipped the folder in my working path and then tried to used mtimesx, I got the following error

... Checking for PC

... Finding path of mtimesx C source code files

... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c

Error using mtimesx_build (line 169)

A C/C++ compiler has not been selected with mex -setup

Error in mtimesx (line 271)

mtimesx_build;

Alireza@Evan: have you tired blas_lib = 'lmwblas'?

EvanI would like to second Matthieu's question. For blas_lib, I've tried both

blas_lib = '/Applications/MATLAB_R2013b.app/bin/maci64/lapack.spec';

and

blas_lib = '/Applications/MATLAB_R2013b.app/bin/maci64/blas.spec';

With both, I get:

ld: warning: ignoring file /Applications/MATLAB_R2013b.app/bin/maci64/blas.spec, file was built for unsupported file format

Anyone have any ideas?

Marc CrapeauWorks well for me on Linux.

If it can helps, I have installed the library libblas.so in my home following the version "shared library" of this page: http://gcc.gnu.org/wiki/GfortranBuild

Then I have compiled the mex file with the command return by the mtimesx routine when the automatic installation failed:

blas_lib = 'the_actual_path_and_name_of_your_systems_BLAS_library';

mex('-DDEFINEUNIX','-largeArrayDims','mtimesx.c',blas_lib)

Gained a factor 2 in speed for my 3D matrices multiplications, and a clearer matlab code :)

HannesHi all,

I tried to locate compliers via mex -setup, but got the answer => No supported SDK or compiler was found on this computer.

I installed Microsoft Windows SDK 7.1 but still not able to work. Does somebody know the path and install directory or what the problem is?

thanks for help

WKI don't know why, I did the test again in the linux cluster and the windows one, it works in the parallel computation now. So strange!

WKJames, by the way, what version of matlab last time you test this package? I place it in my labtop (windows 64) and linux cluster running matlab 2013, only few operations that mtimesx will beat matlab and it only beat is for less than 10%. I saw that this package was uploaded in 2011 so just wonder if matlab 2013 made lots of change to boost the performance.

James Tursa@WK: I don't have the parallel toolbox, so am unable to give much advice here. My understanding is that each worker in the parallel environment is a separate process, which I would expect would mean that MTIMESX could be used. But since I don't have the toolbox I can't verify this. Have you actually run some tests and gotten wrong results?

WKI just get it compiled and installed in my 64-bit matlab with windows SDK7 as well as the unix cluster. I find that it doesn't work in spmd, doesn't it?

WKThanks for the great library. I have two questions. Do I have to compile it to get library? So it means the library is really depending on the machine, right? Also, our lab has a multi-node cluster, each node are equipped with 8 cores. I think this library is based on multi-threading, if I run the matlab code in spmd and in each core, I run your library so will it conflict? Sorry, I didn't have any experience in compiling mex before, I tried but I don't have MSVC installed, I only have mingw.

JiaDaHow about announce another function for fast matrix inverse with multi-dimensional support? I wrote a version but I think your version will far fast than mine. :P

JiaDaMatthieuHello,

I'm trying to use mtimesx on mac OS 10.8 with matlab R2011b but I don't understand how to compile it.

I tryed several solutions described in comments but none worked.

blas_lib = '/Applications/MATLAB_R2011b.app/bin/maci64'

blas_lib =

/Applications/MATLAB_R2011b.app/bin/maci64

>> mex('-DDEFINEUNIX','mtimesx.c',blas_lib)

mtimesx.c: In function 'mexFunction':

mtimesx.c:592: warning: assignment discards qualifiers from pointer target type

ld: can't map file, errno=22 for architecture x86_64

collect2: ld returned 1 exit status

mex: link of ' "mtimesx.mexmaci64"' failed.

Error using mex (line 206)

Unable to complete successfully.

Can you help me ?

James Tursa@Agustin: It looks like you put the files in the lcc compiler directory. I would advise that you put the files in a work directory instead (not any of the "official" sys or bin etc directories that are installed with MATLAB) and try again. Make sure this work directory is on the MATLAB path. As a side note, in general it is *not* a good idea to put your own files into any of the "official" directories that are installed with MATLAB ... doing so risks breaking built-in MATLAB functions and capabilities.

agustin darrosathanks james but i have this problem

Build routine for mtimesx

... Checking for PC

... Finding path of mtimesx C source code files

... Found file mtimesx.c in C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.c

... Found file mtimesx_RealTimesReal.c in C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx_RealTimesReal.c

... Opened the mexopts.bat file in C:\Users\Gusa\AppData\Roaming\MathWorks\MATLAB\R2008a\mexopts.bat

... Reading the mexopts.bat file to find the compiler and options used.

... LCC is the selected compiler

... OpenMP compiler not detected ... you may want to check this website:

http://openmp.org/wp/openmp-compilers/

... Using BLAS library lib_blas = 'C:\Program Files\MATLAB\R2008a\extern\lib\win32\lcc\libmwblas.lib'

... Now attempting to compile ...

mex('C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.c',lib_blas,'-DCOMPILER=LCC')

it can´t find C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.exp

it cant´find C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.lib

... mex mtimesx.c build completed ... you may now use mtimesx.

then i introduce:

C = mtimesx(a,b)

??? Undefined function or method 'mtimesx' for input arguments of type 'double'.

what can i do?

James Tursa@Agustin: For Windows, generally you just put all the files in a directory on the MATLAB path and then type mtimesx at the prompt. For other systems, see other posts below.

agustin darrosaHi

How can i introduce this "function" in matlab??

I tried it but i have errors

Sorry for my ignorance

James Tursa@Tonio: No, MTIMESX does not do this. But from your description it sounds like something like this might work for you:

a = cell array of matrices

b = cell array of vectors

c = cellfun(@mtimes,a,b,'UniformOutput',false)

TonioCan I used that for multiplying an (m,n,k) * (n,k) = (m,k) where each k 'slice' needs to be multiplied together and they may contain variables number of elements, i.e. 'm' can vary in each slice? In my code each slice is stored in a cell in matlab.

James Tursa@Robert: Yes, MTIMESX is CPU based, not GPU.

RobertHello James,

nice work. A basically like to multiply n matrices B (400,400) with one Matrix A (400,400). I used arrayfun before, but your code is about 2.5 times faster. Soon I will get a Tesla NVidia graphic card and I would like to ask, is there any possibility to run the calculation on the GPU instead on the CPU?

I tried to call mtimesx with tww GPU Arrays but as my computation times is still fast (on my slow GPU) I guess he was not using my GPU right now.

Perhaps I also missed this, but to my understanding mtimesx is using the CPU only, or??

Thanks

Robert

Charles@Michael Thanks, that did it.

Michael VölkerCharles,

for my own record and to make it machine-searchable, I wrote earlier how I was able to compile mtimesx. Taking the liberties to quote myself:

> On a 64Bit Debian based Linux I managed to compile it with

> mex -DDEFINEUNIX CFLAGS="\$CFLAGS -march=native" -largeArrayDims -lmwblas -lmwlapack -lgomp mtimesx.c

> using gcc-4.5.

Can you try that, or did you already try?

Michael

Charles@Sebastian Thanks for the response. I think I wasn't linking to the right file. Now, I seem to be linking to the blas library, but I'm getting a new error. I am using the code "mex -DEFINEUNIX 'mtimesx.c' -lmwblas" but I get an error that reads "mtimesx.c:592: warning: assignment discards qualifiers from pointer target type ". I saw that someone else had gotten this error before and the code was fixed, so I downloaded the files again but I still am getting the error. When I tried to use mtimesx in my code I get an error and matlab crashes. I haven't had any problems with other mex files, so it seems to be specific to mtimesx.

Any help is appreciated. Thanks in advance.

Sebastiaan@Charles: under Unix, the BLAS library linked to is normally mwblas , not blas_lib.

CharlesThis program is great. Really useful. I am having great success with it on my PC. However, I cannot seem to create the mex file on a UNIX computer. I am defining the BLAS library as blas_lib. Then I enter the code "mex('-DDEFINEUNIX','mtimesx.c',blas_lib)". However, I get a string of errors that are all similar to "mtimesx.c;(.text+0x76a1)undefined reference to `saxpy_'". It says this for saxpy, sger, sdot, daxpy, dger, ddot, sgemv,ssyrk, ssyr2k, sgemm, dgemv, dsyrk and dgemm. Can anyone please help me with creating the .mexa64 file? Thanks.

James Tursa@KSIDHU: Why can't you use .* and ./ directly? What are the exact dimensions of your two matrices? If the matrices need singleton expansion, then look at the function bsxfun. What is the overall operation you are trying to do?

KSIDHUHi

How can I perform elementwise multiplication and division of two matrices?

I have looked through the manual but cannot find any information on it.

Is there a way to check if BLAS multi-threading is available? Is this better than OpenMP multi-thread

I need to perform .*, ./ on matrices with numel ~3000x100,000 so any tips would be great, and also is there a multi threaded sum function for large matrices (along dim 1 or 2).

MATLAB 2010a

Thanks in advance

KS

James TursaI like the reasoning you both have spelled out. I will make logical*other conform to the other, and make logical*logical output a double by default but have an option to do logical AND/OR operations and output a logical instead. Thanks for your inputs.

JohnI like Michael's reasoning and I think he is right about the 'double' result. So in general I think that the double should be the default. However, given that i enjoy being efficient with data, I would appreciate a 'logical' option.

Michael VölkerDamn, the E-Mail notification stopped to work, so I didn't know you replied to my message.

My spontaneous thought was that (logical matrix) * (logical matrix) should *of course* result in a double output, since matrix multiplication involves a sum, so we should expect to get results different than 0 or 1.

So, as you already guessed, I would like to have an auto-adjust-precision feature.

This is what I would expect:

[inputA] * [inputB] ==> [output]

------------------------------------------

logical | logical | double

logical | single | single

logical | double | double

single | logical | single

double | logical | double

(I hope it does not get formatted too ugly...)

I also like John's idea to make the behaviour more optional.

JohnPersonally, I do not understand why it should be a 'double' result. So, my vote goes towards a logical result.

However, would it be possible to add this as an option? (e.g.: mtimesx(a,b,'logical') will produce a logical result, if a and b are logical). This way the advanced user can be efficient with data, whereas the less experienced user will still get results that they are familiar with....

James Tursa@John: OK, that's two votes. Automatic logical conversion goes into the next version.

Q: Should a (logical matrix) * (logical matrix) give a double matrix result (ala MATLAB), or a logical matrix result (i.e., essentially treating the * - operations as AND OR operations)?

JohnExcellent code!

I experienced the same 'error' as Michael Völker, so I think automatically converting the logical/sparse data to double might be a good idea as an enhancement!

Nevertheless, great work! Saved me a lot of time/coding.

James Tursa@Michael Völker: The error message you are getting is not a bug. As stated in the doc, using arguments that are not double or single will cause MTIMESX to invoke the built-in mtimes function to do the matrix multiply. If MATLAB will do it then great ... you get that result. But if MATLAB will not do it then you will get whatever error message MATLAB puts out. This is the intended behavior. You can convert the logical to double first, of course, to avoid the error message. If you would like MTIMESX to do this automatically for logical inputs then I can take that as an enhancement request (seems reasonable to me).

Michael VölkerJames,

I found a bug with logical input.

Try:

foo = mtimesx( true(2,2), randn(1,1,5) );

And you probably get this error message:

Error using *

Inputs must be 2-D, or at least one input must be scalar.

To compute elementwise TIMES, use TIMES (.*) instead.

Error in mtimesx_sparse (line 47)

result = a * b;

Michael

Sam TJames:

I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!

Lately, I have been trying to use MTIMESX code to multiply two big n-dimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:

A = 3 x 3 x 3

B = 3 x 2

and C = A*B;

such that dimension of C = 3 x 3 x 2

where C(i, j, l) = A(i, j, k) * B(k, l);

i.e.

C(:, :, 1) = A(:, :, k) * B(k, 1);

C(:, :, 2) = A(:, :, k) * B(k, 2);

Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).

This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.

I am wondering if there is a way around it? I look forward to hearing from you.

Thanks for your help.

James Tursa@Sam T: I don't think I get it yet. In your example you have:

C(:, :, 1) = A(:, :, k) * B(k, 1);

A(:,:,k) is 3x3

B(k,1) is 1x1

Are you trying to do an nD scalar*matrix multiply here? MTIMESX can do that (with some modifications of the inputs), but I am not quite sure that is really what you want. Can you clarify? Maybe write out an explicit for loop showing how C is to be filled in its entirety?

Sam TJames:

I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!

Lately, I have been trying to use MTIMESX code to multiply two big n-dimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:

A = 3 x 3 x 3

B = 3 x 2

and C = A*B;

such that dimension of C = 3 x 3 x 2

where C(i, j, l) = A(i, j, k) * B(k, l);

i.e.

C(:, :, 1) = A(:, :, k) * B(k, 1);

C(:, :, 2) = A(:, :, k) * B(k, 2);

Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).

This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.

I am wondering if there is a way around it? I look forward to hearing from you.

Thanks for your help.

Sam TJames:

I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!

Lately, I have been trying to use MTIMESX code to multiply two big n-dimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:

A = 3 x 3 x 3

B = 3 x 2

and C = A*B;

such that dimension of C = 3 x 3 x 2

where C(i, j, l) = A(i, j, k) * B(k, l);

i.e.

C(:, :, 1) = A(:, :, k) * B(k, 1);

C(:, :, 2) = A(:, :, k) * B(k, 2);

Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).

This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.

I am wondering if there is a way around it? I look forward to hearing from you.

Thanks for your help.

Sam TJames:

A = 3 x 3 x 3

B = 3 x 2

and C = A*B;

such that dimension of C = 3 x 3 x 2

i.e.

C(:, :, 1) = A(:, :, k) * B(k, 1);

C(:, :, 2) = A(:, :, k) * B(k, 2);

Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).

I am wondering if there is a way around it? I look forward to hearing from you.

Thanks for your help.

Boaz SchwartzClay Fulchermtimesx is an awesome code. Besides speeding up my structural dynamics and signal processing applications, it is much easier to apply than using loops. I'm also using a multi-D matrix inverter called multinv that can be downloaded from this forum. The two codes together allow me to operate on matrices of functions as easily as matrices of single values.

Clay Fulcher@ James Tursa: Thanks James! I greatly appreciate it!

James Tursa@ Clay Fulcher: Some options to possibly get more speed:

1) NDFUN by Peter Boettcher. E.g., see this thread:

http://www.mathworks.com/matlabcentral/newsreader/view_thread/294669#861239

2) FEX submissions. E.g., see this submission:

http://www.mathworks.com/matlabcentral/fileexchange/31222

3) Wait a couple of months for the next MTIMESX submission, when this capability will be included (along with in-place operations and nD matrix multiply versions of PROD and CUMPROD).

Clay FulcherJames, I need to invert a 3D array of function values. I have to step through each slice of the array in a loop, inverting each slice. Do you know of any way to do this more efficiently?

Jonathan Sullivan@ James

The array sizes I'm working with are much larger what I posted. The size of A is 100x33x3. The size of B is 33x1x3x5000x5. That might be why you had trouble recreating it. I'm running on Windows 7, 64 bit, MATLAB 2011b, compiler is cl. If it makes a difference, some of the values in the first matrix are -inf.

James Tursa@ Jonathan: Yes, weird indeed. For slices 100x100 in size, MTIMESX will always use BLAS dgemm calls regardless of the method setting (you can verify this by using the 'DEBUG' setting). That is, the exact same code path gets executed ragardless of whether 'MATLAB', 'SPEED', or 'SPEEDOMP' is set. That setting should have had no effect on this particular outcome. Also, I have been unable to reproduce this error on a 32-bit WinXP system. Can you provide me some specifics on your MATLAB version, machine OS, compiler used, etc? Does it crash if you restart MATLAB and immediatly try the example?

Jonathan Sullivan@ James

So the problem, at least in part, is caused by using the 'SPEEDOMP' flag. I removed the flag, and now it works. Weird, huh?

James Tursa@ Jonathan Sullivan: Your first example without repmat should have worked. Looks like I introduced a bug in the latest release. I will look into this right away ...

Jonathan SullivanJames,

Fantastic code. It has come in handy. It is well documented, incredibly fast, and extremely useful, especially for N-D arrays.

For the N-D case, this code would be even more powerful if the singleton dimension capability were expanded. Currently, all the dimensions from 3:end must be either the same size (A to B), or must all be singleton for one of the variables.

For example

A = rand(100,100,3);

B = rand(100,100,3,5);

C = mtimesx(A,B); % Does not work. Crashes MATLAB.

You can get around this by using repmat:

A = rand(100,100,3);

B = rand(100,100,3,5);

C = mtimesx(repmat(A,[1 1 1 5]),B); % Works, but slow

Unfortunately, explicitly expanding out large arrays has overhead which is more than I'd like.

Overall code. A+.

Michael VölkerJames, thank you for your hint to using 'lapack' from the FEX.

Indeed it produces the desired result, by calling

lapack('D=DSDOT(i,s,i,s,i)', length(A), A, 1, A, 1)

with 'A' from my example above. Unfortunately, this is ~5 times slower than calling mtimesx() with double input.

On double input, calling the equivalent 'DDOT' routine from the lapack-package takes more than 10 times longer than your mtimesx (wow, by the way...) to compute the very same result.

The difference appears to be even worse for complex input.

So I am still hoping that you might somehow get this feature in your code and maintain the speed...

James Tursa@ Michael Völker: FYI, another way to use the DSDOT routine is to use this submission:

http://www.mathworks.com/matlabcentral/fileexchange/16777-lapack

Not sure if it works on 64-bit systems, but you might give it a try.

James Tursa@ Michael Völker: Thanks for the comments. I will look into your request. There is a BLAS routine called DSDOT that does what you request. I will look into how easily I can incorporate this into MTIMESX for BLAS specific results. For the SPEED modes MTIMESX sometimes uses custom code to calculate the dot product (e.g., see function RealKindDotProduct), and in fact the accumulation is done in double precision regardless of input type (calculations themselves are single for single inputs), so it might not be too difficult to adapt this to return the double precision output if requested instead of always converting back to single for single inputs. Thanks for the suggestion.

Michael VölkerIncredibly great work. Thank you for the elaborate code, the very useful features, the intuitive syntax and behaviour and the docu.

On a 64Bit Debian based Linux I managed to compile it with

mex -DDEFINEUNIX CFLAGS="\$CFLAGS -march=native" -largeArrayDims -lmwblas -lmwlapack -lgomp mtimesx.c

using gcc-4.5.

After using mtimesx for a while I currently miss only one feature, which is to define the precision of the output independently from the input variables.

This would be useful when calculating the sum of squares of a huge single-precision vector where the correct result exceeds realmax('single') and prior casting takes a lot of time.

Example:

>> A = randn( 1e8, 1,'single'); % plenty of sane values

>> A(1) = 1e20; % only few outliers

>> sos = A' * A % or sos = mtimesx( A, 'C', A )

sos =

Inf

Doing A = double(A) beforehand solves the problem:

>> sos = A' * A

sos =

1.000000040081755e+40

but

>> tic, A = double(A); toc

Elapsed time is 0.472007 seconds.

In fact, that casting to double takes much more time than the actual computation in mtimesx. Since I do this repeatedly in an iteration, the overhead sums up to minutes.

So my wish would be like this:

C = mtimesx( A, B, 'double' ); % C is double even if A or B are single

Nevertheless, thank you so much for this code.

James Tursa@Gary: Follow-up, ndfun as I recall only works for real variables. If you have complex variables then the only efficient option I could suggest would be a custom mex routine.

James Tursa@Gary: For this type of matrix product you should be using something like ndfun('mprod',a) by Peter Boettcher. You can find the source code here:

http://www.mit.edu/~pwb/matlab/

If you need a 64-bit version of this you can see the hack I posted on this thread:

http://www.mathworks.com/matlabcentral/newsreader/view_thread/294669#791467

Using ndfun will avoid all the array slice copies you are doing in the above example code.

GaryThanks for the answer, but I think you misunderstood my problem or me your answer.

I have eg 10000 matrices and want to calculate the product. By dividing into 2*5000 I can use the advantage of your nD multiply. Here is the code:

a=rand(n,n,L)

tic

b=a(:,:,1);

for i=2:L

b=a(:,:,i)*b;

end

toc

c=a;

tic

while(L~=1)

L2=floor(L/2)*2;

d=c(:,:,end);

c = mtimesx(c(:,:,2:2:L2),c(:,:,1:2:L2-1));

if(L2~=L)

c(:,:,end+1) = d;

end

L=size(c,3);

end

toc

So will this never be faster than normal multiplies?

James Tursa@Gary: MTIMESX is generally not going to be any faster at generic matrix multiplies than MATLAB since it is calling exactly the same BLAS routines as MATLAB. The exceptions are small (4x4 or less) multiplies, dot products, outer products, and some matrix-vector multiplies since they can sometimes be done faster with inline code which MTIMESX uses in the SPEED or LOOP modes. Also, the nD multiply case is almost always faster with MTIMESX than the equivalent loop(s) in MATLAB for any size since array slice copies are avoided.

GaryHi,

I am trying to compute the product of many matrices (M1*M2*...). I used mtimesx to multiply half of all matrices by the other half until the result(e.g. a=M1*M2, b=M3*M4, a*b). It works very well when the matrix have small size like 4x4 but when for larger size it becomes slower than Matlab. Is that expected?

TomWas a bit to fast, sorry. Someone already asked this! It works fine now!

Great job!

dmGreat submission! Find it VERY useful for my work!

Also, I think you can change

"Beats MATLAB 300% - 400% in some cases ... really!"

in the title, and multiply the numbers with at least a factor of 25 (based on the results from mtimesx_test_nd).

James Tursa@ Dan Scholnik: Have you tried running the mtimesx_test_nd file yet? That should invoke small OpenMP test cases. For larger test cases it may in fact be the case that later and/or 64-bit versions of MATLAB are hard to improve on for speed. Alas, I don't have a 64-bit system to tinker with.

At the moment I am working on incorporating in-place operations in MTIMESX and that will be in the next major release, but I will put your dense triangular suggestion on the list of potential future improvements.

Dan ScholnikCompiled for Matlab 2011a on Fedora 14 x86_64 using

mex CFLAGS='$CFLAGS -fopenmp' LDFLAGS='$LDFLAGS -fopenmp' -largeArrayDims -O -DUNIX mtimesx.c -lmwblas

Not sure all of that was needed, but it works. Compiler was gcc 4.5.1.

mtimesx('OPENMP') returns 1 and mtimesx('OMP_GET_NUM_PROCS') returns 4 (I have a dual-core i7 w/ hyperthreadding), however mtimesx never reports using anything but the BLAS no matter which flags I pass in. Best I can tell the flags have no effect. The matlab-provided mwblas is multithreaded, so perhaps there's no improving on it. Speeds were generally not much different than the matlab multiply, except (as mentioned previously) for cases where the matlab transpose or conjugate operations themselves consumed significant portions of the total time.

I also tried using the fedora-provided BLAS via "-lblas" but it appears to be single-threaded and very slow. Here again mtimesx never used multiple processors that I could tell.

A possible feature suggestion, albeit one that deviates from the general-purpose nature of the current code: How about routines for dense triangular-matrix multiplication? Matlab doesn't seem to have any optimizations for this case (although it does for back-substitution, which is common with triangular matrices).

pinkcan provide your email address, because my code is too long to post

wahyoe_slipnot@yahoo.co.id

wahyoe

pinkyes, it can work, thanks a lot james

I wanted to ask

1. if the form below, how do I use mtimesx

for i = 1: m

A (:,:, i) = B (:,:, i). '* Ex (:,:, i) * C (:,:, i);

end

2. If I compile into an application (exe) files that I have to enter into deploytool, because I saw a lot of files in it

thanks

Wahyoe

James Tursa@ wahyoe Unggul: Did you get the build process to work?

James Tursa@ wahyoe Unggol: The error message is telling you that you need to select a C compiler first. So do this:

mex -setup

(Then press Enter)

(Then enter the number of a C compiler such as lcc)

(Then press Enter again)

Once this is done, try running mtimesx again and see if it builds OK.

To answer your specific question I would need to know the dimensions of each variable. I.e., what is the size of RT and Ex?

pinkhi all

I tried running "mtimesx_test_sdspeed.m", but an error occurs

? Error using ==> mtimesx_build at 169

AC / C + + compiler has not been selected with Mex-setup

Error in ==> mtimesx at 271

mtimesx_build;

Error in ==> mtimesx_test_sdspeed at 100

compver = [computer ',' version ', mtimesx mode' mtimesx ', the median of'

num2str (n) 'runs'];

I wanted to ask, how the use of "mtimesx" in the form like this

for i = 1:2000

Keg (:,:, i) = RT (:,:, i). '* Ex (:,:, i) * RT (:,:, i);

end

and I want to ask whether in addition to speed, can also reduce the data to be more efficient?

James Tursa@ Daniel Lyddy: Confirmed typo in the prototypes for these functions. Change the int * to mwSignedIndex *. So

void xSYRK(char *, char *, mwSignedIndex *, mwSignedIndex *, RealKind *, RealKind *,

mwSignedIndex *, RealKind *, RealKind *, int*);

void xSYR2K(char *, char *, int*, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *,

RealKind *, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *);

becomes

void xSYRK(char *, char *, mwSignedIndex *, mwSignedIndex *, RealKind *, RealKind *,

mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *);

void xSYR2K(char *, char *, mwSignedIndex *, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *,

RealKind *, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *);

Thanks for pointing this out. That being said, these typos in all likelihood should not affect the actual results since in both cases 64-bit pointers were being passed and the underlying data is in fact mwSignedIndex in spite of the prototype. I will upload fixed files soon.

Daniel LyddyI am getting some warnings about incompatible pointer types when compiling on a 64-bit Centos Linux 5.5 system. For example, in line 4909 referred to below, argument 10 is 'LDC' which is #defined to &ldc which is a pointer to mwSignedIndex. In the blas/lapack dsyrk function, argument 10 is also called 'LDC', but it is an integer. Pointers on 64-bit systems are typically larger than 32 bit ints ... maybe these blas/lapack arguments need to be cast to something else?

mex('-DDEFINEUNIX','-largeArrayDims','mtimesx.c',blas_lib)

mtimesx.c: In function ‘mexFunction’:

mtimesx.c:592: warning: assignment discards qualifiers from pointer target type

In file included from mtimesx.c:1303:

mtimesx_RealTimesReal.c: In function ‘DoubleTimesDouble’:

mtimesx_RealTimesReal.c:4209: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4211: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4212: warning: passing argument 3 of ‘dsyr2k_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4225: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4227: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4228: warning: passing argument 3 of ‘dsyr2k_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4241: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4243: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4257: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4259: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type

In file included from mtimesx.c:1481:

mtimesx_RealTimesReal.c: In function ‘FloatTimesFloat’:

mtimesx_RealTimesReal.c:4209: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4211: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4212: warning: passing argument 3 of ‘ssyr2k_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4225: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4227: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4228: warning: passing argument 3 of ‘ssyr2k_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4241: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4243: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4257: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type

mtimesx_RealTimesReal.c:4259: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type

Bruno LuongI'm not using deploytool, just MCC command.

James Tursa@ Bruno Luong: Sorry I am not going to be much help here since I do not have a Windows 7 platform to test with. Are you using deploytool, and then the result can't use mtimesx?

Bruno LuongThere is an annoying compatibility issue with deploying MTIMESX on Window7 platform, The MCR issues an error message "mex file in incorrect". It works fine under Matlab or other OS. I just can't use it when the issue is not resolved.

James Tursa@ Matt J: Issue mtimesx without any arguments to find out what the current calculation mode is. If you enter any of the OpenMP directives but mtimesx was not compiled with an OpenMP enabled compiler then mtimesx will revert to the nearest functionality. So SPEEDOMP will revert to SPEED, LOOPSOMP will revert to LOOPS, and the omp_etc functions will revert to substitute functions instead of the actual OpenMP functions. Your above example looks like what would happen if mtimesx was not compiled with an OpenMP enabled compiler. To see how mtimesx thinks it was compiled, issue the mtimesx('OPENMP') command and it will return a 1 (yes OpenMP) or 0 (No OpenMP).

Matt JJames, the latest version of mtimesx doesn't display at the command line any of the new modes (so of course I don't know if they're being used).

>> mtimesx SPEEDOMP

ans =

SPEED

KaiJanThanks James! See this thread for applying MTIMESX to calculate a fast 2-norm for the matrix along a specified dimension:

http://www.mathworks.com/matlabcentral/newsreader/view_thread/293747#787738

James Tursa@Matt J: I will look into it. Rather than pre-converting I may be able to fake out a call back to MATLAB with a pointer to the interior of the nD array and fudged up dimensions. Highly against the mex rules, but it might work. I should have taken advantage of the simple reshape method you use for the non-transposed cases (I didn't recognize it at the time I was updating the code) and that should not be too hard to implement. But the transpose cases for the 2D slices and the case for the sparse matrix being the right operand is going to be tricky to avoid a data copy. Guess I have a weekend project now ...

Matt JJames - I appreciate you taking my comments into account in the latest rev, but I'm still seeing weird stuff. From the manual:

"If you combine sparse inputs with full nD (n > 2) inputs, MTIMESX will convert the sparse input to full since MATLAB does not

support a sparse nD result."

Pre-converting the input from sparse to full leads to very slow performance, certainly slower than regular MATLAB as the comparison below shows. Is it really the only convenient solution? Also, MATLAB not supporting sparse nD is again not the issue, since even for n=2, sparse*full is always full in regular MATLAB.

%fake data

N=300;M=200;

A=speye(N);

B=rand(N,N,M);

Bs=single(B);

mtimesx SPEED;

tic;

ans1=mtimesx(A,B);

toc;

%Elapsed time is 0.681635 seconds.

tic;

ans2=reshape( A*reshape(B,N,[]) , size(B) );

toc;

%Elapsed time is 0.159543 seconds.

isequal(ans1,ans2)%1

"If you combine double sparse and single inputs, MTIMESX will convert the single input to double since MATLAB does not support a single sparse result."

As above, even if MATLAB did support sparse*fullsingle you would expect the result to be full single, not sparse single. And also, this feature is showing very slow behavior as compared to regular MATLAB, for some reason

tic;

ans3=mtimesx(A,Bs);

toc;

%Elapsed time is 0.817832 seconds.

tic;

ans4=single( reshape( A*reshape(double(Bs),N,[]) ,size(B)) );

toc;

%Elapsed time is 0.402111 seconds.

isequal(ans3,ans4),%1

James Tursa@Matt J: Thanks for the comments. I will get to work on the suggested changes right away.

Matt JJust a couple more remarks regarding some of the limitations mentioned in the user manual,

"You cannot combine double sparse and single inputs, since MATLAB does not support a single sparse result."

It would be worthwhile to silently cast the single input to double prior to the matrix multiply. I've never understood why MATLAB doesn't implement the same. It only forces the user to do it manually.

"You also cannot combine sparse inputs with full nD (n > 2) inputs, since MATLAB does not support a sparse nD result."

Don't understand that. Native MATLAB produces a full result when combining sparse matrix multiplicands with full ones. No reason why you couldn't do the same for nD, as far as I can see...

Matt JI like it. The only thing I find a little peculiar is that the output of

mtimesx('MODE') reports the previous mode rather than the current one.

SebastiaanSomething like A(A<1e-8) = 0 should automatically remove small numbers from the sparsity pattern.

If this is too slow, have a look at spfun; it may be faster.

However, if you mean if you can remove small elements from the output matrix, you should write something of your own, partially multiplying the matrix and removing small elements. As a tip, use full column vectors of the matrix, this will make the multiplication much faster than square blocks, since sparse matrices are stored by columns. E.g.:

C = spalloc(size(A,1), size(B,2), 0);

for j=1:size(B,2)

R = A*B(:,j);

R(R<1e-8) = 0;

C(:,j) = R;

end

is (probably) a fast implementation. Try to take multiple columns for speed of course.

JulioThanks Sebastiaan, I never thought to check the density of the resulting matrix and you are absolutely right: the memory blew up! By the way, is there any way to define an “operational zero” for sparse matrices operations in order to automatically make MATLAB assume a zero for numbers say < than 1e-8?

Thank you again.

SebastiaanCould it not be possible that you are really out of memory? Your result is a 460191 by 460191 matrix, which is not necessarily sparse. If the sparsity pattern is poor, the result may even be a dense matrix. Storing a full matrix this size requires 1578GiB of memory, so you have to be sure that the result is ~2% sparse. If neither of the input matrices are <1% sparse, this is likely your problem.

JulioThanks for your prompt reply. “mtimes” at the MATLAB prompt gives exactly the same error so, nothing wrong with “mtimesx”. Is it possible that this is a repetition of the “outer-product” problem? I am going to try partition the matrix “cb” in several chunks and then concatenate the results. Maybe it will work this way.

Thanks again.

James Tursa@ Julio: MTIMESX does not do generic sparse matrix multiplies internally, it just calls the MATLAB built-in function mtimes. i.e., it does the same thing as if you had just typed c21 * cb at the MATLAB prompt. Can you verify that c21 * cb works at the MATLAB prompt but mtimesx(c21,cb) does not work? Be sure that you start them both from the same state (e.g., clear all other big variables including ans before attempting the calculation). That would be surprising to me since it would indicate some type of mex limitation that I am currently unaware of.

JulioHi, I am new using mtimesx but, so far, I think is great. Meanwile, I am trting to multiply 2 large sparse matrices "ci=mtimesx(c21,cb)". c21 is 460191 by 5579 and cb is 5579 by 460191. After several minutes I got a error message (similar to mtimes):

======================

??? Error using ==> mtimes

Out of memory. Type HELP MEMORY for your options.

Error in ==> mtimesx_sparse at 47

result = a * b;

Error in ==> mtimesx at 277

[varargout{1:nargout}] = mtimesx(varargin{:});

======================

I don't think that the problem is lack of memory (I have a 64 bit system with 8 GB of RAM and the memory usage never went above 5 GB). Couls you please comment on this.

Thanks in advance.

Julio

James TursaAlso in today's release that I forgot to mention below: Added custom inline code for small matrix multiplies where all of the first two dimensions are <= 4 (in 'SPEED' mode).

D SenThanks! I am seeing about a 3.25x improvement in speed for my application.

James Tursa@ D Sen: In addition, you would need to reshape audio_dft to be 32 x 1 x 4097.

James Tursa@ D Sen: The E array slicing is not set up the way mtimesx needs. mtimesx needs the array slices to be contiguous in memory (a requirement for the underlying BLAS calls). So your E array would need to be a 25 x 32 x 4097 array, making the 25 x 32 slices contiguous in memory (the 2D slices need to be the first two dimensions). The way you currently have it set up, all of your 25 x 32 array slices of E are individually scattered throughout memory and are not the first two dimensions, and mtimesx cannot use them that way.

D SenI have been trying to use this to multiply a 3D matrix with a 2D matrix.

The matlab code which works is as follows:

C = zeros ( 25, 4097 );

for( i_ = 1:4097 )

C(:, i_) = squeeze( E( i_, :, : ) ) * audio_dft( :, i_ );

end

Here E and audio_dft have the following sizes:

size(E)

ans =

4097 25 32

size( audio_dft)

ans =

32 4097

I get:

mtimesx( E, audio_dft )

??? Error using ==> mtimesx

Inner matrix dimensions must agree.

Val SchmidtNever mind, I found my problem. I have not successfully built mtimesx on Snow Leopard 10.6 with 64 bit MATLAB.

To do so I edited

/Applications/MATLAB_R2009b.app/bin/gccopts.sh

In the section for "maci64" I changed "SDKROOT" and "MACOSX_DEPLOYMENT_TARGET" to reflect a change in OS from 10.5 to 10.6.

Then I selected this opts file with

mex -setup

selecting "1" at the prompt.

Then I compiled everything with:

mex -v -DDEFINEUNIX -largeArrayDims mtimesx.c -lmwblas -lmwlapack

(look out for the 2 D's in -DDEFINEUNIX - that was my error)

Val SchmidtI'm trying to compile on 64bit MATLAB on Mac OSX 10.6 without success. The only changes I've made to the default have been to modify the MATLABROOT/bin/gccopts.sh file such that the SDK and OS Version are set to the 10.6 variety. (and then ran mex -setup)

It seems the linker cannot find the linear algebra packages, but it is not clear to me why. The output is below and I would be thankful for ideas on what to try next.

thanks

---------------snip------------

-> gcc-4.0 -O -Wl,-twolevel_namespace -undefined error -arch x86_64 -Wl,-syslibroot,/Developer/SDKs/MacOSX10.6.sdk -mmacosx-version-min=10.6 -bundle -Wl,-exported_symbols_list,/Applications/MATLAB_R2009b.app/extern/lib/maci64/mexFunction.map -o "mtimesx.mexmaci64" mtimesx.o -L/Applications/MATLAB_R2009b.app/bin/maci64 -lmx -lmex -lmat -lstdc++

Undefined symbols:

"_dsyr2k", referenced from:

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

"_dgemm", referenced from:

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

"_dgemv", referenced from:

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

"_sgemm", referenced from:

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

"_sgemv", referenced from:

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

"_dsyrk", referenced from:

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

_DoubleTimesDouble in mtimesx.o

"_ssyrk", referenced from:

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

"_ssyr2k", referenced from:

_FloatTimesFloat in mtimesx.o

_FloatTimesFloat in mtimesx.o

"_sdot", referenced from:

_floatcomplexdotproduct in mtimesx.o

_floatcomplexdotproduct in mtimesx.o

_floatcomplexdotproduct in mtimesx.o

_floatcomplexdotproduct in mtimesx.o

_floatcomplexdotproduct in mtimesx.o

"_ddot", referenced from:

_doublecomplexdotproduct in mtimesx.o

_doublecomplexdotproduct in mtimesx.o

_doublecomplexdotproduct in mtimesx.o

_doublecomplexdotproduct in mtimesx.o

_doublecomplexdotproduct in mtimesx.o

ld: symbol(s) not found

collect2: ld returned 1 exit status

mex: link of ' "mtimesx.mexmaci64"' failed.

Roland KruseThe code is suprisingly fast for multiplication of a sparse matrix by a skalar. I really wonder why Matlab is so slow in this (very simple) case.

Val SchmidtJames,

This is code is proving to be a FANTASTIC speed up to my work. I am so grateful that I thought I would post and comment.

First, I am working on R2008b on an intel mac. I compiled your code with

mex -DEFINEUNIX mtimesx.c

and although I received a handful of warnings about redefined pointers, it compiled fine without error. I also tried compiling the code with a few extra optimizations, out of curiosity:

mex -DEFINEUNIX CFLAGS = '$CFLAGS -O3 -funroll-loops -fexpensive-optimizations -ffast-math' mtimesx.c

The result wasn't significantly faster than the original on my own processing test (I did not take the time to run it through your tests, however.).

I did run mtimesx_test_ddspeed initially to compare with MATLAB native routines, and while I didn't capture the results, I found that most things native MATLAB could do as fast or faster than mtimesx, with the notable exception of math involving complex conjugates, in which mtimesx was far, far faster.

However, the fantastic benefit for me was the native handling of 4D matrices by mtimesx. In my own code in which I am rotating several (8.3) million 3D vectors through a 3x3 rotation matrices, mtimesx increased the speed of my calculations 2100%!

Thanks so much for your time and effort. It is most gratefully appreciated.

-Val

James Tursa@ veltz: Thanks for your comments. Regarding openMP, I think the short answer is no. I will give my reasons shortly, but please be advised I don't do any openMP stuff myself so my comments might not be correct.

The basic loop for nd stuff is this loop:

for( ip=0; ip<p; ip++ ) {

However, this is just the loop for the blocks in the C result. The blocks from the A and B arrays are calculated inside the loop based on the singleton expansion code (which depends on the dimensions of A and B) and are not known ahead of time ... at the end of one iteration the calculation for the A and B blocks for the next iteration takes place (the C block pointer just advances by the same amount each time). So it is not set up for a parallel loop. I would have to change the code and create three new arrays of pointers so that ahead of time all of the individual block starting points are known. *Then* maybe it could be put in a parallel loop.

But I don't think this would do much good. The multi-threaded stuff in MATLAB seems to happen at the BLAS/LAPACK level, so mtimesx is already getting the benefit of multi-threading for those calls without doing anything special in the mtimesx code itself to get it. For example, if you compare a simple matrix multiply timing at the MATLAB prompt using two large matrices, you can see about a 1.6x improvement with 2 threads vs 1 thread. If I code up a very simple mex routine that calls dgemm and does nothing else (no pragma omp etc) I see the exact same timing benefit when switching the number of threads used. So the c-mex code gets the multi-threaded benefit by virtue of the BLAS/LAPACK routines used. Wrapping another layer of threading on top of that will likely not result in any timing benefit, and may (speaking from ignorance here) in fact conflict with the multi-threading that is already going on inside the BLAS/LAPACK routines themselves.

veltzFirst, thank you for this marvellous code :)

Is there a way to use openMP in your code ? For example, when doing

C = zeros(2,6,4,5);

for m=1:4

for n=1:5

C(:,:,m,n) = A(:,:,m,n) * B(:,:,m,n);

end

end

we could call un pragma omp parallel for. As the code is pretty big, where does such loop appear in your code ?

Jon CDebian 64-bit compiled right with:

mex CFLAGS="\$CFLAGS -std=c99" -DDEFINEUNIX -largeArrayDims -lmwblas mtimesx.c

(the functional syntax of mex did not parse the arguments properly)

The GCC -ansi switch included by the default mexopts apparently refers to C89 (which does not recognize // -- C++ style comments). I wonder what is wrong with C99 as a default in mexopts?

Jon CJoshua DillonVery useful code. Good work.

The following snippet may be useful to those compiling on Ubuntu:

libblas='/usr/lib/libblas.so';

if exist(libblas,'file')~=2

system('sudo aptitude install libblas-dev');

end

% This will compile the code on linux

mex('CFLAGS=-std=c99 -fPIC','-DDEFINEUNIX','-largeArrayDims','-lmwblas','mtimesx.c');

James Tursa@Fabio: Thanks very much for the 64-bit info. I was unaware that gcc did not like the // commenting style, which I prefer over the /* */. However, I will create additional files with changed comments and upload them in the next week or so. I would be very interested in seeing the details of what you had to change to get it to compile and the results of any benchmark tests you happen to run. Feel free to e-mail me at the address listed in the pdf file. Thanks!

Fabio VeroneseGreat job!

Thanks to mtimesx I changed a huge for loop to 3D matrix computation, reducing script execution time from 14 s to 1.5 s!

Tested and compiled on Gentoo Linux (64bit) 2.6.33 on amd64 system. Compiling with gcc 4.3.* (not the deprecated 4.2 as suggested by Mathworks) I had to delete all quotes from C code files. Probably "//" sequence is not friendly to newer gcc compiler, maybe you can replace it with " /*quote*/ " wich is correctly ignored. Also to successfully compile I had to use:

mex('-DDEFINEUNIX','-largeArrayDims', '-lmwblas','mtimesx.c')

wich is something different from default. Both '-lmwlapack', '-lmwblas' are the options suggested for compiling in UNIX systems (see "Building MEX-Files" in doc), but that was enough. I can say it works correctly, but I didn't run the benchmarks files.

Hope this helps!

Feel free to ask if some testing on LINUX 64bit is needed. I'm going to run tests soon.

Eric LarrieuxThanks for the quick fix!

James TursaConfirmed bug. The bug happens during the following conditions in MATLAB mode:

(row vector) * (non-square matrix)

(non-square matrix transposed) * (column vector)

Sorry about that! This was not caught in testing because I only used square matrices for these particular tests. The bug occurs because the BLAS routines DGEMV and SGEMV do not switch the dimension arguments when a transpose is involved. I had erroneously assumed that they did because that is how the BLAS routines DGEMM and SGEMM arguments work for general matrix multiplies. I have no idea why the interface for these routines is not set up the same. In any event, I will upload a fix tonight. In the meantime, you can manually put the fix in as follows:

Edit the mtimesx_RealTimesReal.c file.

Global replace the following text:

xGEMV(PTRANSA, M, K

with this text:

xGEMV(PTRANSA, &m1, &n1

And then also global replace the following text:

xGEMV(PTRANSB, N, K

with this:

xGEMV(PTRANSB, &m2, &n2

Those replacements will force the DGEMV and SGEMV calls to use the original dimensions of the first matrix argument instead of the switched dimensions.

Eric LarrieuxThanks! Also, FYI: I am running R2009a...

James TursaI will look into it.

Eric LarrieuxHi I running matlab on a 32-bit windows machine and I am getting incorrect (all 0) results when i perform some simple 3D matrix multiplication in "matlab' mode only ('speed' mode seeps yield the correct results in all the cases i have tested so far). i.e.

>> % define a and b

>> a=2*ones(1,2,2)

a(:,:,1) =

2 2

a(:,:,2) =

2 2

>> b=3*ones(2,3,2)

b(:,:,1) =

3 3 3

3 3 3

b(:,:,2) =

3 3 3

3 3 3

>>

>> % verify result using normal matrix multiply

>> a(:,:,1)*b(:,:,1)

ans =

12 12 12

>> mtimes(a(:,:,1),b(:,:,1))

ans =

12 12 12

>>

>> % test results using mtimesx

>> mtimesx(a(:,:,1),b(:,:,1),'matlab')

ans =

0 0 0

>> mtimesx(a(:,:,1),b(:,:,1),'speed')

ans =

12 12 12

I just downloaded and installed the package yesterday, so I believe the outer-product bug discussed about should be fixed. Does anyone have any idea what is causing this error? Thanks!

-eric

James Tursa@newpolaris: Thanks for the input! It's hard for me to debug the 64-bit stuff since I don't have a system to test with, so comments like yours are very much appreciated. I have uploaded a new set of files with the largearraydims typo in the build routine fixed.

newpolaris ?grate! but mis-typing exist on mtimesx_build.m for 64bit Windows

on line 247 -> largearraydims = '-largearraydims';

it cause error like,

Warning: MEX could not find the library "argearraydims"

specified with -l option on the path specified

with the -L option.

... mex mtimesx.c build completed ... you may now use mtimesx.

so it muse be replace by "largearraydims = '-largeArrayDims';"

newpolaris ?newpolaris ?in my case, result is all zero matrix :)

Tobias WoodCannot thank you enough for this. Got a huge speed improvement (10 seconds down to 0.6) for a script using this. I'm staggered it's not in the core of Matlab.

Bruno LuongIt must have some gut to tackle what Matlab pretends to do the best: matrix computation.

The package is nicely presented. This is no doubt a professional quality code using the junk of state-of-art BLAS, an accompaniment PDF document to explain the “why” and the “how” in great detail.

As with any well written MEX-based package, there is an installation file to make conveniently the compilation before the package can be used. Just type the name and the job is done. The main path (no subfolders) of the package is left to user to add to Matlab preference.

Obviously there are 8 separate files that can be used to benchmark with Matlab. Their names tell something that not obvious for new comers, but for those who are familiar with Matlab, there is no doubt: It tests all 4 combination of double/single and one for equality test, another for speed test. I choose the “dd” one (double/double) likely the most used combination, and run the “equal” test. It crashes. Too bad, then despite of the crash I took my chance to run the speed test. Well, a bunch of tests are tic/toc against Matlab, with clear outputs indicating if it’s slower or faster than Matlab and how much in percent the runtimes are comparable. Something strange happened: in some cases, the MTIMESX show few hundred times faster than Matlab. I told: “It cannot be true; why Matlab performs so poorly?”. I decided to go step-by-step the speed-test function, and discovered there may be another bug. I contacted the author, and bingo, I get his explanation and solution few hours later. The bug (just one line missing) is due to last minute change before submission. Followed James’s instruction, I modified the code, rebuilt the Mex file, and run the test code. The results look much more reasonable. And I rerun the “equal” test, there is no longer crash.

One more word on my setup before reading the next: Windows Vista 32-bit, 2009B, and an old loyal VS C++ 6.0. It is important because MTIMESX performance will depend greatly on the setup, and this is clearly stated by the author.

The main function MTIMESX can operate on two modes: “MATLAB” (default) and “SPEED”. The first one tries to be as much as compatible with Matlab, and the second one is all about – as the name suggests - speed oriented. In all my first two tests I run “MATLAB” modes, the last one will compare two modes.

PERTINENT:

I. 2D Arrays

Speed tests run with all possible combination of cases: complex/real; normal, transposed, conjugate, conjugate-transposed; then combinations of matrix/vector/scalar; all that apply for first or second matrix arguments. Each argument might have then 2*4*3=24 states, not counting a vector can be column or row. Combine both, the test run probably no less than 24^2=576 cases, and for 2D arrays only (ND arrays supported, we’ll be back on that).

Most of the time, MTIMESX (in Matlab mode) seems to perform more or less a comparable speed with Matlab, except for few cases where speed advantage is clearly significant. Here is one typical of such case:

>> n=2000;

>> A=rand(n)+1i*rand(n);

>> B=rand(1,n)+1i*rand(1,n);

>> tic; C=conj(A)*B.'; toc

Elapsed time is 0.094847 seconds.

>> tic; C=mtimesx(A,'G',B,'T'); toc

Elapsed time is 0.034242 seconds.

That about 3 times faster than Matlab. Taking a closer look, it seems Matlab is penalized by carrying out explicitly the transpose and conjugate before the product as show in the next

>> A1=conj(A);

>> B1=B.';

>> tic; C=A1*B1; toc

Elapsed time is 0.030122 seconds.

whereas MTIMESX performs those operations *on-the-fly* during the product. Very good. But I would argue that the speed gain is less in practice. In fact if the product is to be carried out once, careful developers usually manage to build conj(A) and B.’ and not A and B. In this case all the speed gain of MTIMESX is irrelevant. If A and conj(A) are used to perform product once each, then there is speed gain is about 1.5 (3/2). If they are used many times, then the developer might perform CONJ once and store it somewhere before product function MTIMES is invoked, and the speed gain get even closer to 1. Still there is clearly an advantage in MTIMEX, for speed and it avoid storing extra matrix conj(A).

My impression is this is the only (and significant) factor that contributes to the speed gain: performing CONJUGATE on the fly. Recent Matlab (2008A, i.e., v.7.6) benefits smart TRANSPOSE on the fly, according to Tim-Davies.

II. ND-arrays

The nicest feature of MTIMESX is the capability to perform multiple matrix products ranged in a ND-array. Previously there is such function available on Matlab File Exchange coded by Peter Boettcher also using BLAS, but it seems this package has now a minimum support, and users has trouble to compile it on 64-platform. So this new feature is very welcome.

When a Matlab developer wants to make product of multiple matrix of the same size, up to date he has 4 alternatives

1. Use for-loop

2. use a product based on sparse diagonal block (for example function SliceMultiProd available in http://www.mathworks.com/matlabcentral/fileexchange/24260-multiple-same-size-linear-solver

3. Use the package MULTIPROD where the element-wise product and sum are carried out, http://www.mathworks.com/matlabcentral/fileexchange/8773-multiple-matrix-multiplications-with-array-expansion-enabled

4. Use BLAS based, such as Peter Boettcher’s ndfun, and now James Tursa’s MTIMESX

Here is a code I use to benchmark the four approaches. Note that in small matrix size (of less than said 5) are often encountered in practice.

function mprodbench(m,n)

% Bench mark four approaches of product of n (m x m) matrices

A=rand(m,m,n);

B=rand(m,m,n);

C=zeros(size(A,1),size(B,2),size(B,3));

tic

for k=1:size(B,3)

C(:,:,k) = A(:,:,k)*B(:,:,k);

end

t1=toc;

tic

C=SliceMultiProd(A,B);

t2=toc;

tic

C=multiprod(A,B);

t3=toc;

tic

C=mtimesx(A,B);

t4=toc;

fprintf('Matrix size=%d, number matrices=%d\n', m, n);

fprintf('\tfor-loop = %f, rel speed gain = (%f)\n', t1, t1/t1);

fprintf('\tSliceMultiProd = %f, rel speed gain = (%f)\n', t2, t1/t2);

fprintf('\tmultiprod = %f, rel speed gain = (%f)\n', t3, t1/t3);

fprintf('\tmtimesx

end

And here is the results:

> mprodbench(3,1e4)

Matrix size=3, number matrices=10000

for-loop = 0.055119, rel speed gain = (1.000000)

SliceMultiProd = 0.014130, rel speed gain = (3.900815)

multiprod = 0.007815, rel speed gain = (7.052806)

mtimesx = 0.006185, rel speed gain = (8.911272)

>> mprodbench(10,1e4)

Matrix size=10, number matrices=10000

for-loop = 0.074665, rel speed gain = (1.000000)

SliceMultiProd = 0.146336, rel speed gain = (0.510228)

multiprod = 0.151051, rel speed gain = (0.494303)

mtimesx = 0.022569, rel speed gain = (3.308258)

That shows MTIMESX is the best and consistent approach of multiple same-size matrix product. It supports inputs with any multiple-trailing dimensions.

III. SPEED VS MATLAB MODES

From my test, speed mode is faster about 30-50% typically for tensorial product (column vector multiplied row vector). Beside that case, the speed gain is probably not spectacular.

CONCLUSION:

- For whom? Developers who are interested in optimizing speed and want to take advantage the last ounce of performance from their computers. I have not tested in older version of Matlab but MTIMES, but it will be more surely pertinent as Matlab makes constant progress such as smart matrix transposition. Those who run on latest Matlab (from 2008A) has limited benefits besides saving extra conjugate and transpose operations of their matrices. The syntax of MTIMESX is simple, though quite not lean as Matlab operator is the price to pay.

- MTIMESX is clearly a best approach for multiple-product of matrices. Go for it!

Fully justified, until Mathlab catching up.

Bruno

James TursaAs a workaround prior to Monday you can make this change to the mtimesx_RealTimesReal.c file near line 222:

scalarmultiply = 0; // <-- add this line

if( m1 == 1 && n1 == 1 ) {

scalarmultiply = 1;

James TursaThere is a bug in the outer product code. Will be fixed as soon as TMW posts updated version. Probably Monday Dec 7.