Beats MATLAB 300%  400% in some cases ... really!
MTIMESX is a fast general purpose matrix and scalar multiply routine that has the following features:
 Supports multidimensional (nD, n>2) arrays directly
 Supports Transpose, Conjugate Transpose, and Conjugate preoperations
 Supports singleton expansion
 Utilizes BLAS calls, custom C loop code, or OpenMP multithreaded C loop code
 Can match MATLAB results exactly or approximately as desired
 Can meet or beat MATLAB for speed in most cases
MTIMESX has six basic operating modes:
 BLAS: Always uses BLAS library calls
 LOOPS: Always uses C loops if available
 LOOPSOMP: Always uses OpenMP multithreaded C loops if available
 MATLAB: Fastest BLAS or LOOPS method that matches MATLAB exactly (default)
 SPEED: Fastest BLAS or LOOPS method even if it doesn't match MATLAB exactly
 SPEEDOMP: Fastest BLAS, LOOPS, or LOOPOMP method even if it doesn't match MATLAB exactly
MTIMESX inputs can be:
single
double
double sparse
The general syntax is (arguments in brackets [ ] are optional):
mtimesx( [directive] )
mtimesx( A [,transa] ,B [,transb] [,directive] )
Where transa, transb, and directive are the optional inputs:
transa = A character indicating a preoperation on A:
transb = A character indicating a preoperation on B:
The preoperation can be any of:
'N' or 'n' = No preoperation (the default if trans_ is missing)
'T' or 't' = Transpose
'C' or 'c' = Conjugate Transpose
'G' or 'g' = Conjugate (no transpose)
directive = One of the modes listed above, or other directives
Examples:
C = mtimesx(A,B) % performs the calculation C = A * B
C = mtimesx(A,'T',B) % performs the calculation C = A.' * B
C = mtimesx(A,B,'g') % performs the calculation C = A * conj(B)
C = mtimesx(A,'c',B,'C') % performs the calculation C = A' * B'
mtimesx('SPEEDOMP','OMP_SET_NUM_THREADS(4)') % sets SPEEDOMP mode with number of threads = 4
For nD cases, the first two dimensions specify the matrix multiply involved. The remaining dimensions are duplicated and specify the number of individual matrix multiplies to perform for the result. i.e., MTIMESX treats these cases as arrays of 2D matrices and performs the operation on the associated parings. For example:
If A is (2,3,4,5) and B is (3,6,4,5), then
mtimesx(A,B) would result in C(2,6,4,5), where C(:,:,i,j) = A(:,:,i,j) * B(:,:,i,j), i=1:4, j=1:5
which would be equivalent to the MATLAB mcode:
C = zeros(2,6,4,5);
for m=1:4
for n=1:5
C(:,:,m,n) = A(:,:,m,n) * B(:,:,m,n);
end
end
The first two dimensions must conform using the standard matrix multiply rules taking the transa and transb preoperations into account, and dimensions 3:end must match exactly or be singleton (equal to 1). If a dimension is singleton then it is virtually expanded to the required size (i.e., equivalent to a repmat operation to get it to a conforming size but without the actual data copy). This is equivalent to a bsxfun capability for matrix multiplication.
1.10  Fixed typos in the dsyrk, dsyr2k, ssyrk, and ssyr2k BLAS function prototypes. (These typo fixes should not change any results) 

1.9  Added OpenMP support for custom code


1.8  Added capability for nD scalar multiplies (i.e., 1x1xN * MxKxN).


1.7  Fixed a bug for some of the (row vector) * (matrix) and (matrix transposed) * (column vector) operations in MATLAB mode that involved incorrect dimensions in the dgemv and sgemv calls. 

1.6  Fixed a typo in the build routine for 64bit systems, changed largearraydims to largeArrayDims 

1.5  Added the multidimensional test routine mtimesx_test_nd.m for speed and equality tests. Added its description to the pdf file. 

1.4  Fixed a bug for empty transa or transb inputs. Now treats these the same as 'N'. Also simplified the nD singleton expansion code a bit. 

1.3  Added singleton expansion capability for multidimensional matrix multiplies. 

1.2  Fixed bug in scalar multiply code that was causing incorrect results & crashes. 

1.1  Fixed bug for (scalar) * (sparse) when the scalar contains an inf or NaN (in which case the zeros do not stay zero). Slight update to pdf doc. 
Sama Shrestha (view profile)
Rui Li (view profile)
I encountered issue:
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex setup
This is because i didn't install Microsoft Windows SDK for Windows 7 and .NET Framework 4, which can be downloaded:
https://www.microsoft.com/enus/download/confirmation.aspx?id=8279
After this, everything works. Also, this link may be useful:
http://au.mathworks.com/matlabcentral/answers/96050howcanisetupmicrosoftvisualstudio2008expresseditionforusewithmatlab77r2008bon32
Thanks for sharing MTIMESX.
Chris Hooper (view profile)
@Vladislav @Iliya @James
It works for me with the following method:
1) change "mexopts = [prefdir '\mexopts.bat'];" in mtimesx_build.m to "mexopts = [prefdir '\mex_C_win64.xml'];"... or maybe it was "'\mex_C++_win64.xml'];", I'd have to check...
2) don't change anything inside the .xml file: no need to change the set COMPFLAGS line.
3) I have matlab version 2016b: use the compiler Microsoft professional visual C++ 2013 (a nice person did this for me on their computer and we were lucky that it worked on mine). MinGWw64 compiler did not not work for either of us in this case.
Guillaume (view profile)
Hi, Did you try the logical * double to double process ?
the double conversion in matlab is very slow and I would be very good.
If not can you provide advice where I should modify the code ?
BR, guillaume
Earl Vickers (view profile)
I'm sure it's great, but it doesn't work on 2015b & Mac.
armita mani (view profile)
I am trying to use mtimesx on matlab 2016a, windows 10 64bit, but I get the error " A C/C++ compiler has not been selected with mex setup error" and understand I should change mexopts = [prefdir '\mexopts.bat']; in mtimesx_build.m to mexopts = [prefdir '\mex_C++_win64.xml']; but in path C:\Users\..\AppData\Roaming\MathWorks\MATLAB\R2016a this xml dosent exit, how could I complime this library?
Thank you
Simon Leglaive (view profile)
Andreas (view profile)
@Vladislav @Iliya @James Tursa Did you mean to set the flag in the .xml file like this?
set COMPFLAGS=c openmp $CFLAGS $DEFINES $MATLABMEX
For me it still crashes when I try and it feels hopeless. I have tried everything to make it work.
Hanse (view profile)
David Belo (view profile)
Fantastic work, James. Have you tried implementing matrix inversion on each slice as you alluded to in the past? I'm currently using multinv which works well but I wonder if there is a performance gain to be had using a similar approach as you've taken in mtimesx.
Levent Karacan (view profile)
Phlip (view profile)
DeviL (view profile)
@Vladislav  See comment by Martin Lindfors from 26 May 2015. My solution is not something new and was suggested before.
Vladislav Yordanov (view profile)
@Iliya
I have the same issue. I can use the MTIMESX with Matlab 2010 and it works out great. Many thanks to James Tursa.
However I have the same issue as Tyler with Matlab 2014, could you please elaborate in detail how to fix it. I read your comment but I have no idea how to point to the xml.
Regards
DeviL (view profile)
@Tyler  On newer MATLAB versions, you don't need the mexopts.bat file but rather mex_C_win64.xml (or an XML with a similar name that fits your system).
Change the mtimesx_build.m to point to an .xml file and it should be ok. I manually added the "/openmp" to the "set COMPFLAGS ..." line in the xml. After this, compilation completed correctly and running mtimesx didn't crash my MATLAB (2015a).
Good luck!
@JamesTursa  I humbly suggest you modified the build script to also be compatible with newer MATLAB versions.
Tyler Millhouse (view profile)
I downloaded this mexopts.bat file, and the compiler now works.
https://gist.github.com/dgleich/1579415
Unfortunately, when I run mtimesx, Matlab crashes.
Tyler Millhouse (view profile)
Hi James, thanks for putting this together. I keep getting this message: 'A C/C++ compiler has not been selected with mex setup'. This is strange since I have run 'mex setup' and 'mex setup C++'. Mex selected the MinGW64 compiler. Do I need a different compiler? Thanks again!
Fritz (view profile)
Is there any news on the cumprod implementation for MxMxK?
Thank you
http://www.mathworks.com/matlabcentral/newsreader/view_thread/159298
James Tursa (view profile)
@Michal: For generic 2D matrix multiply, there is no advertised speedup since mtimesx is calling exactly the same routine as MATLAB. Speedup would be if you were doing 2D page multiplies with nD matrices.
Michal Kvasnicka (view profile)
For MATLAB R2015b:
+++++++++++++++++++++++
>> a = rand(5000);
>> b = rand(5000);
>> tic, c2 = mtimesx(a,b,'speedomp'); toc
Elapsed time is 1.858891 seconds.
>> tic, c1 = a*b; toc
Elapsed time is 1.956402 seconds.
>> isequal(c1,c2)
ans =
1
So, where is the announced speedup???
Nick (view profile)
Martin Lindfors (view profile)
I fixed the issues by replacing line 166 in mtimesx_build.m from
mexopts = [prefdir '\mexopts.xml'];
to
mexopts = [prefdir '\mex_C++_win64.xml'];
Philip (view profile)
I am having the same issues as Octavian. I am using 64bit R2015a on 64bit Windows 7. I have setup the compiler Microsoft Visual C++ 2013. However, when I run mtimesx_build, it produces the error:
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex setup.
I believe this may be related to Matlab switching to using xml files for mex from bat files. Thus mtimesx_build cannot find mexopts.bat. Any help resolving this issue would be greatly appreciated.
Octavian (view profile)
Dear James,
I am very interested in using mtimesx for ND array operations, but I am new to compiling. When testing mtimesx, I get the following error:
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex setup.
I have a Windows 7 64bit laptop. Which C/C++ compiler you would recommend to install
Thank you,
Octavian
Jeppe (view profile)
@Heather,
I'm running MATLABR2014b on a Macbook with Yosemite OS X. I installed Xcode from the AppStore and defined the blas:
blas_lib='/Applications/MATLAB_R2014b.app/bin/maci64/libmwblas.dylib'
mex('DDEFINEUNIX','largeArrayDims','mtimesx.c',blas_lib)
After running this in the command window I could use mtimesx.m
Hope this helps
Jeppe
Heather (view profile)
Thanks James for your great work; it is a really useful function. I am having trouble compiling mtimesx.c on my new macbook though. I cannot see from the comments below that anyone has managed to solve this.
My BLAS library is located at /usr/local/lib/ and is called libblas.a
Following the instructions given when I tried to call the mtimesx function in my matlab code I defined
blas_lib = '/usr/local/lib/libblas.a’
and then tried to manually compile the mex routine by
mex('DDEFINEUNIX','largeArrayDims','mtimesx.c',blas_lib)
after which I see `Building with 'Xcode with Clang’ followed by several lines of output and ultimately:
Error using mex
Undefined symbols for architecture x86_64:
"__gfortran_st_write", referenced from:
_xerbla_ in libblas.a(xerbla.o)
"__gfortran_st_write_done", referenced from:
_xerbla_ in libblas.a(xerbla.o)
"__gfortran_stop_string", referenced from:
_xerbla_ in libblas.a(xerbla.o)
"__gfortran_string_len_trim", referenced from:
_xerbla_ in libblas.a(xerbla.o)
"__gfortran_transfer_character_write", referenced from:
_xerbla_ in libblas.a(xerbla.o)
"__gfortran_transfer_integer_write", referenced from:
_xerbla_ in libblas.a(xerbla.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use v to see invocation)
Can anybody help me? Thank you and best wishes,
Heather
Luca Viviani (view profile)
noa (view profile)
Is mtimesx working with mac? I have ios 10.9 and could not compile, got an the error  “Undefined symbols for architecture x86_64:...“.
Thank you.
David H (view profile)
Royi Avital (view profile)
Can anyone compare the performance to MATLAB R2014b?
Thank You.
Octavian (view profile)
Is mtimesx working on linux? I have ubuntu 12.04, and it cannot compile. Also, does it work with gpuArrays?
Thanks,
Octavian
Matt J (view profile)
Hi James,
I had a question about how the flags 't' and 'c' are used by sparse matrix arguments. I assume that the flags are there to circumvent explicit transposes and its associated overhead. Yet, it doesn't appear that sparse mtimesx operations make use of them. In the test below, I see basically the same execution speeds in all 3 versions. Is it true that sparse operations can't make use of the flags, and is it deliberate/inevitable? It would be a great benefit to be able to avoid transposing tall sparse matrices because of the high memory consumption that comes with that.
J=sprand(3192027,3225,.0001);
tic;
mtimesx(J,'t',J);
toc
tic;
mtimesx(J.',J);
toc
tic;
J.'*J;
toc
Julien (view profile)
From my understanding, the file "mex_C_win64.xml" replaces "mexopts.bat".
"Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex setup". I replaced line 169 by : mexopts = [prefdir '\mex_C_win64.xml'];. It did work with a few warnings...
Matt J (view profile)
Thanks, James. This thread describes where the information has moved to
http://www.mathworks.com/matlabcentral/answers/67521#answer_138814
I can post those xml files somewhere, if you like. However, is there no way to get mtimesx to compile simply with the "mex" command, maybe with a small sacrifice in performance?
James Tursa (view profile)
Hmmm ... Well, I don't have access to R2014 so I will have to code in the blind on this. My understanding is that MATLAB will in real time make a guess as to the correct compiler to use based on source code extension. So the information that was in mexopts.bat (user had already selected a compiler) is no longer available for my automatic build stuff. I will put out an update shortly, but may have to ask for info direct from the user for compiler stuff. I will try to get it out there sometime next week.
Matt J (view profile)
Thought I'd ping again. It doesn't look like mtimesx_build is compatible with R2014, since it looks for mexopts.bat, which is now gone. Can we hope to get a version that supports R2014? James?
Matt J (view profile)
I have the same problem as Safdar in R2014a. Even after running mex setup, it appears as though mtimesx_build cannot find mexopts.bat. Possibly, the locations of relevant files/directories may have shifted?
Stefan (view profile)
Safdar (view profile)
Sorry for the multiple posts. It seems something went wrong when submitting the post. Sorry
Safdar (view profile)
I got the following message
... Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c
... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex setup
Error in mtimesx (line 271)
mtimesx_build;
Safdar (view profile)
Hi James,
I unzipped the file in my working folder and tries to use mtimesx, I got the following message
... Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c
... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex setup
Error in mtimesx (line 271)
mtimesx_build;
Safdar (view profile)
Hi James,
I unzipped the file in my working folder and tries to use mtimesx, I got the following message
... Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c
... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex setup
Error in mtimesx (line 271)
mtimesx_build;
Safdar (view profile)
Hi James,
I unzipped the folder in my working path and then tried to used mtimesx, I got the following error
... Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c
... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex setup
Error in mtimesx (line 271)
mtimesx_build;
Alireza (view profile)
@Evan: have you tired blas_lib = 'lmwblas'?
Evan (view profile)
I would like to second Matthieu's question. For blas_lib, I've tried both
blas_lib = '/Applications/MATLAB_R2013b.app/bin/maci64/lapack.spec';
and
blas_lib = '/Applications/MATLAB_R2013b.app/bin/maci64/blas.spec';
With both, I get:
ld: warning: ignoring file /Applications/MATLAB_R2013b.app/bin/maci64/blas.spec, file was built for unsupported file format
Anyone have any ideas?
Marc Crapeau (view profile)
Works well for me on Linux.
If it can helps, I have installed the library libblas.so in my home following the version "shared library" of this page: http://gcc.gnu.org/wiki/GfortranBuild
Then I have compiled the mex file with the command return by the mtimesx routine when the automatic installation failed:
blas_lib = 'the_actual_path_and_name_of_your_systems_BLAS_library';
mex('DDEFINEUNIX','largeArrayDims','mtimesx.c',blas_lib)
Gained a factor 2 in speed for my 3D matrices multiplications, and a clearer matlab code :)
Hannes (view profile)
Hi all,
I tried to locate compliers via mex setup, but got the answer => No supported SDK or compiler was found on this computer.
I installed Microsoft Windows SDK 7.1 but still not able to work. Does somebody know the path and install directory or what the problem is?
thanks for help
WK (view profile)
I don't know why, I did the test again in the linux cluster and the windows one, it works in the parallel computation now. So strange!
WK (view profile)
James, by the way, what version of matlab last time you test this package? I place it in my labtop (windows 64) and linux cluster running matlab 2013, only few operations that mtimesx will beat matlab and it only beat is for less than 10%. I saw that this package was uploaded in 2011 so just wonder if matlab 2013 made lots of change to boost the performance.
James Tursa (view profile)
@WK: I don't have the parallel toolbox, so am unable to give much advice here. My understanding is that each worker in the parallel environment is a separate process, which I would expect would mean that MTIMESX could be used. But since I don't have the toolbox I can't verify this. Have you actually run some tests and gotten wrong results?
WK (view profile)
I just get it compiled and installed in my 64bit matlab with windows SDK7 as well as the unix cluster. I find that it doesn't work in spmd, doesn't it?
WK (view profile)
Thanks for the great library. I have two questions. Do I have to compile it to get library? So it means the library is really depending on the machine, right? Also, our lab has a multinode cluster, each node are equipped with 8 cores. I think this library is based on multithreading, if I run the matlab code in spmd and in each core, I run your library so will it conflict? Sorry, I didn't have any experience in compiling mex before, I tried but I don't have MSVC installed, I only have mingw.
JiaDa (view profile)
How about announce another function for fast matrix inverse with multidimensional support? I wrote a version but I think your version will far fast than mine. :P
JiaDa (view profile)
Matthieu (view profile)
Hello,
I'm trying to use mtimesx on mac OS 10.8 with matlab R2011b but I don't understand how to compile it.
I tryed several solutions described in comments but none worked.
blas_lib = '/Applications/MATLAB_R2011b.app/bin/maci64'
blas_lib =
/Applications/MATLAB_R2011b.app/bin/maci64
>> mex('DDEFINEUNIX','mtimesx.c',blas_lib)
mtimesx.c: In function 'mexFunction':
mtimesx.c:592: warning: assignment discards qualifiers from pointer target type
ld: can't map file, errno=22 for architecture x86_64
collect2: ld returned 1 exit status
mex: link of ' "mtimesx.mexmaci64"' failed.
Error using mex (line 206)
Unable to complete successfully.
Can you help me ?
James Tursa (view profile)
@Agustin: It looks like you put the files in the lcc compiler directory. I would advise that you put the files in a work directory instead (not any of the "official" sys or bin etc directories that are installed with MATLAB) and try again. Make sure this work directory is on the MATLAB path. As a side note, in general it is *not* a good idea to put your own files into any of the "official" directories that are installed with MATLAB ... doing so risks breaking builtin MATLAB functions and capabilities.
agustin darrosa (view profile)
thanks james but i have this problem
Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.c
... Found file mtimesx_RealTimesReal.c in C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx_RealTimesReal.c
... Opened the mexopts.bat file in C:\Users\Gusa\AppData\Roaming\MathWorks\MATLAB\R2008a\mexopts.bat
... Reading the mexopts.bat file to find the compiler and options used.
... LCC is the selected compiler
... OpenMP compiler not detected ... you may want to check this website:
http://openmp.org/wp/openmpcompilers/
... Using BLAS library lib_blas = 'C:\Program Files\MATLAB\R2008a\extern\lib\win32\lcc\libmwblas.lib'
... Now attempting to compile ...
mex('C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.c',lib_blas,'DCOMPILER=LCC')
it can´t find C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.exp
it cant´find C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.lib
... mex mtimesx.c build completed ... you may now use mtimesx.
then i introduce:
C = mtimesx(a,b)
??? Undefined function or method 'mtimesx' for input arguments of type 'double'.
what can i do?
James Tursa (view profile)
@Agustin: For Windows, generally you just put all the files in a directory on the MATLAB path and then type mtimesx at the prompt. For other systems, see other posts below.
agustin darrosa (view profile)
Hi
How can i introduce this "function" in matlab??
I tried it but i have errors
Sorry for my ignorance
James Tursa (view profile)
@Tonio: No, MTIMESX does not do this. But from your description it sounds like something like this might work for you:
a = cell array of matrices
b = cell array of vectors
c = cellfun(@mtimes,a,b,'UniformOutput',false)
Tonio (view profile)
Can I used that for multiplying an (m,n,k) * (n,k) = (m,k) where each k 'slice' needs to be multiplied together and they may contain variables number of elements, i.e. 'm' can vary in each slice? In my code each slice is stored in a cell in matlab.
James Tursa (view profile)
@Robert: Yes, MTIMESX is CPU based, not GPU.
Robert (view profile)
Hello James,
nice work. A basically like to multiply n matrices B (400,400) with one Matrix A (400,400). I used arrayfun before, but your code is about 2.5 times faster. Soon I will get a Tesla NVidia graphic card and I would like to ask, is there any possibility to run the calculation on the GPU instead on the CPU?
I tried to call mtimesx with tww GPU Arrays but as my computation times is still fast (on my slow GPU) I guess he was not using my GPU right now.
Perhaps I also missed this, but to my understanding mtimesx is using the CPU only, or??
Thanks
Robert
Charles (view profile)
@Michael Thanks, that did it.
Michael Völker (view profile)
Charles,
for my own record and to make it machinesearchable, I wrote earlier how I was able to compile mtimesx. Taking the liberties to quote myself:
> On a 64Bit Debian based Linux I managed to compile it with
> mex DDEFINEUNIX CFLAGS="\$CFLAGS march=native" largeArrayDims lmwblas lmwlapack lgomp mtimesx.c
> using gcc4.5.
Can you try that, or did you already try?
Michael
Charles (view profile)
@Sebastian Thanks for the response. I think I wasn't linking to the right file. Now, I seem to be linking to the blas library, but I'm getting a new error. I am using the code "mex DEFINEUNIX 'mtimesx.c' lmwblas" but I get an error that reads "mtimesx.c:592: warning: assignment discards qualifiers from pointer target type ". I saw that someone else had gotten this error before and the code was fixed, so I downloaded the files again but I still am getting the error. When I tried to use mtimesx in my code I get an error and matlab crashes. I haven't had any problems with other mex files, so it seems to be specific to mtimesx.
Any help is appreciated. Thanks in advance.
Sebastiaan (view profile)
@Charles: under Unix, the BLAS library linked to is normally mwblas , not blas_lib.
Charles (view profile)
This program is great. Really useful. I am having great success with it on my PC. However, I cannot seem to create the mex file on a UNIX computer. I am defining the BLAS library as blas_lib. Then I enter the code "mex('DDEFINEUNIX','mtimesx.c',blas_lib)". However, I get a string of errors that are all similar to "mtimesx.c;(.text+0x76a1)undefined reference to `saxpy_'". It says this for saxpy, sger, sdot, daxpy, dger, ddot, sgemv,ssyrk, ssyr2k, sgemm, dgemv, dsyrk and dgemm. Can anyone please help me with creating the .mexa64 file? Thanks.
James Tursa (view profile)
@KSIDHU: Why can't you use .* and ./ directly? What are the exact dimensions of your two matrices? If the matrices need singleton expansion, then look at the function bsxfun. What is the overall operation you are trying to do?
KSIDHU (view profile)
Hi
How can I perform elementwise multiplication and division of two matrices?
I have looked through the manual but cannot find any information on it.
Is there a way to check if BLAS multithreading is available? Is this better than OpenMP multithread
I need to perform .*, ./ on matrices with numel ~3000x100,000 so any tips would be great, and also is there a multi threaded sum function for large matrices (along dim 1 or 2).
MATLAB 2010a
Thanks in advance
KS
James Tursa (view profile)
I like the reasoning you both have spelled out. I will make logical*other conform to the other, and make logical*logical output a double by default but have an option to do logical AND/OR operations and output a logical instead. Thanks for your inputs.
John (view profile)
I like Michael's reasoning and I think he is right about the 'double' result. So in general I think that the double should be the default. However, given that i enjoy being efficient with data, I would appreciate a 'logical' option.
Michael Völker (view profile)
Damn, the EMail notification stopped to work, so I didn't know you replied to my message.
My spontaneous thought was that (logical matrix) * (logical matrix) should *of course* result in a double output, since matrix multiplication involves a sum, so we should expect to get results different than 0 or 1.
So, as you already guessed, I would like to have an autoadjustprecision feature.
This is what I would expect:
[inputA] * [inputB] ==> [output]

logical  logical  double
logical  single  single
logical  double  double
single  logical  single
double  logical  double
(I hope it does not get formatted too ugly...)
I also like John's idea to make the behaviour more optional.
John (view profile)
Personally, I do not understand why it should be a 'double' result. So, my vote goes towards a logical result.
However, would it be possible to add this as an option? (e.g.: mtimesx(a,b,'logical') will produce a logical result, if a and b are logical). This way the advanced user can be efficient with data, whereas the less experienced user will still get results that they are familiar with....
James Tursa (view profile)
@John: OK, that's two votes. Automatic logical conversion goes into the next version.
Q: Should a (logical matrix) * (logical matrix) give a double matrix result (ala MATLAB), or a logical matrix result (i.e., essentially treating the *  operations as AND OR operations)?
John (view profile)
Excellent code!
I experienced the same 'error' as Michael Völker, so I think automatically converting the logical/sparse data to double might be a good idea as an enhancement!
Nevertheless, great work! Saved me a lot of time/coding.
James Tursa (view profile)
@Michael Völker: The error message you are getting is not a bug. As stated in the doc, using arguments that are not double or single will cause MTIMESX to invoke the builtin mtimes function to do the matrix multiply. If MATLAB will do it then great ... you get that result. But if MATLAB will not do it then you will get whatever error message MATLAB puts out. This is the intended behavior. You can convert the logical to double first, of course, to avoid the error message. If you would like MTIMESX to do this automatically for logical inputs then I can take that as an enhancement request (seems reasonable to me).
Michael Völker (view profile)
James,
I found a bug with logical input.
Try:
foo = mtimesx( true(2,2), randn(1,1,5) );
And you probably get this error message:
Error using *
Inputs must be 2D, or at least one input must be scalar.
To compute elementwise TIMES, use TIMES (.*) instead.
Error in mtimesx_sparse (line 47)
result = a * b;
Michael
Sam T (view profile)
James:
I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!
Lately, I have been trying to use MTIMESX code to multiply two big ndimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:
A = 3 x 3 x 3
B = 3 x 2
and C = A*B;
such that dimension of C = 3 x 3 x 2
where C(i, j, l) = A(i, j, k) * B(k, l);
i.e.
C(:, :, 1) = A(:, :, k) * B(k, 1);
C(:, :, 2) = A(:, :, k) * B(k, 2);
Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).
This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.
I am wondering if there is a way around it? I look forward to hearing from you.
Thanks for your help.
James Tursa (view profile)
@Sam T: I don't think I get it yet. In your example you have:
C(:, :, 1) = A(:, :, k) * B(k, 1);
A(:,:,k) is 3x3
B(k,1) is 1x1
Are you trying to do an nD scalar*matrix multiply here? MTIMESX can do that (with some modifications of the inputs), but I am not quite sure that is really what you want. Can you clarify? Maybe write out an explicit for loop showing how C is to be filled in its entirety?
Sam T (view profile)
James:
I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!
Lately, I have been trying to use MTIMESX code to multiply two big ndimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:
A = 3 x 3 x 3
B = 3 x 2
and C = A*B;
such that dimension of C = 3 x 3 x 2
where C(i, j, l) = A(i, j, k) * B(k, l);
i.e.
C(:, :, 1) = A(:, :, k) * B(k, 1);
C(:, :, 2) = A(:, :, k) * B(k, 2);
Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).
This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.
I am wondering if there is a way around it? I look forward to hearing from you.
Thanks for your help.
Sam T (view profile)
James:
I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!
Lately, I have been trying to use MTIMESX code to multiply two big ndimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:
A = 3 x 3 x 3
B = 3 x 2
and C = A*B;
such that dimension of C = 3 x 3 x 2
where C(i, j, l) = A(i, j, k) * B(k, l);
i.e.
C(:, :, 1) = A(:, :, k) * B(k, 1);
C(:, :, 2) = A(:, :, k) * B(k, 2);
Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).
This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.
I am wondering if there is a way around it? I look forward to hearing from you.
Thanks for your help.
Sam T (view profile)
James:
I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!
Lately, I have been trying to use MTIMESX code to multiply two big ndimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:
A = 3 x 3 x 3
B = 3 x 2
and C = A*B;
such that dimension of C = 3 x 3 x 2
where C(i, j, l) = A(i, j, k) * B(k, l);
i.e.
C(:, :, 1) = A(:, :, k) * B(k, 1);
C(:, :, 2) = A(:, :, k) * B(k, 2);
Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).
This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.
I am wondering if there is a way around it? I look forward to hearing from you.
Thanks for your help.
Boaz Schwartz (view profile)
Clay Fulcher (view profile)
mtimesx is an awesome code. Besides speeding up my structural dynamics and signal processing applications, it is much easier to apply than using loops. I'm also using a multiD matrix inverter called multinv that can be downloaded from this forum. The two codes together allow me to operate on matrices of functions as easily as matrices of single values.
Clay Fulcher (view profile)
@ James Tursa: Thanks James! I greatly appreciate it!
James Tursa (view profile)
@ Clay Fulcher: Some options to possibly get more speed:
1) NDFUN by Peter Boettcher. E.g., see this thread:
http://www.mathworks.com/matlabcentral/newsreader/view_thread/294669#861239
2) FEX submissions. E.g., see this submission:
http://www.mathworks.com/matlabcentral/fileexchange/31222
3) Wait a couple of months for the next MTIMESX submission, when this capability will be included (along with inplace operations and nD matrix multiply versions of PROD and CUMPROD).
Clay Fulcher (view profile)
James, I need to invert a 3D array of function values. I have to step through each slice of the array in a loop, inverting each slice. Do you know of any way to do this more efficiently?
Jonathan Sullivan (view profile)
@ James
The array sizes I'm working with are much larger what I posted. The size of A is 100x33x3. The size of B is 33x1x3x5000x5. That might be why you had trouble recreating it. I'm running on Windows 7, 64 bit, MATLAB 2011b, compiler is cl. If it makes a difference, some of the values in the first matrix are inf.
James Tursa (view profile)
@ Jonathan: Yes, weird indeed. For slices 100x100 in size, MTIMESX will always use BLAS dgemm calls regardless of the method setting (you can verify this by using the 'DEBUG' setting). That is, the exact same code path gets executed ragardless of whether 'MATLAB', 'SPEED', or 'SPEEDOMP' is set. That setting should have had no effect on this particular outcome. Also, I have been unable to reproduce this error on a 32bit WinXP system. Can you provide me some specifics on your MATLAB version, machine OS, compiler used, etc? Does it crash if you restart MATLAB and immediatly try the example?
Jonathan Sullivan (view profile)
@ James
So the problem, at least in part, is caused by using the 'SPEEDOMP' flag. I removed the flag, and now it works. Weird, huh?
James Tursa (view profile)
@ Jonathan Sullivan: Your first example without repmat should have worked. Looks like I introduced a bug in the latest release. I will look into this right away ...
Jonathan Sullivan (view profile)
James,
Fantastic code. It has come in handy. It is well documented, incredibly fast, and extremely useful, especially for ND arrays.
For the ND case, this code would be even more powerful if the singleton dimension capability were expanded. Currently, all the dimensions from 3:end must be either the same size (A to B), or must all be singleton for one of the variables.
For example
A = rand(100,100,3);
B = rand(100,100,3,5);
C = mtimesx(A,B); % Does not work. Crashes MATLAB.
You can get around this by using repmat:
A = rand(100,100,3);
B = rand(100,100,3,5);
C = mtimesx(repmat(A,[1 1 1 5]),B); % Works, but slow
Unfortunately, explicitly expanding out large arrays has overhead which is more than I'd like.
Overall code. A+.
Michael Völker (view profile)
James, thank you for your hint to using 'lapack' from the FEX.
Indeed it produces the desired result, by calling
lapack('D=DSDOT(i,s,i,s,i)', length(A), A, 1, A, 1)
with 'A' from my example above. Unfortunately, this is ~5 times slower than calling mtimesx() with double input.
On double input, calling the equivalent 'DDOT' routine from the lapackpackage takes more than 10 times longer than your mtimesx (wow, by the way...) to compute the very same result.
The difference appears to be even worse for complex input.
So I am still hoping that you might somehow get this feature in your code and maintain the speed...
James Tursa (view profile)
@ Michael Völker: FYI, another way to use the DSDOT routine is to use this submission:
http://www.mathworks.com/matlabcentral/fileexchange/16777lapack
Not sure if it works on 64bit systems, but you might give it a try.
James Tursa (view profile)
@ Michael Völker: Thanks for the comments. I will look into your request. There is a BLAS routine called DSDOT that does what you request. I will look into how easily I can incorporate this into MTIMESX for BLAS specific results. For the SPEED modes MTIMESX sometimes uses custom code to calculate the dot product (e.g., see function RealKindDotProduct), and in fact the accumulation is done in double precision regardless of input type (calculations themselves are single for single inputs), so it might not be too difficult to adapt this to return the double precision output if requested instead of always converting back to single for single inputs. Thanks for the suggestion.
Michael Völker (view profile)
Incredibly great work. Thank you for the elaborate code, the very useful features, the intuitive syntax and behaviour and the docu.
On a 64Bit Debian based Linux I managed to compile it with
mex DDEFINEUNIX CFLAGS="\$CFLAGS march=native" largeArrayDims lmwblas lmwlapack lgomp mtimesx.c
using gcc4.5.
After using mtimesx for a while I currently miss only one feature, which is to define the precision of the output independently from the input variables.
This would be useful when calculating the sum of squares of a huge singleprecision vector where the correct result exceeds realmax('single') and prior casting takes a lot of time.
Example:
>> A = randn( 1e8, 1,'single'); % plenty of sane values
>> A(1) = 1e20; % only few outliers
>> sos = A' * A % or sos = mtimesx( A, 'C', A )
sos =
Inf
Doing A = double(A) beforehand solves the problem:
>> sos = A' * A
sos =
1.000000040081755e+40
but
>> tic, A = double(A); toc
Elapsed time is 0.472007 seconds.
In fact, that casting to double takes much more time than the actual computation in mtimesx. Since I do this repeatedly in an iteration, the overhead sums up to minutes.
So my wish would be like this:
C = mtimesx( A, B, 'double' ); % C is double even if A or B are single
Nevertheless, thank you so much for this code.
James Tursa (view profile)
@Gary: Followup, ndfun as I recall only works for real variables. If you have complex variables then the only efficient option I could suggest would be a custom mex routine.
James Tursa (view profile)
@Gary: For this type of matrix product you should be using something like ndfun('mprod',a) by Peter Boettcher. You can find the source code here:
http://www.mit.edu/~pwb/matlab/
If you need a 64bit version of this you can see the hack I posted on this thread:
http://www.mathworks.com/matlabcentral/newsreader/view_thread/294669#791467
Using ndfun will avoid all the array slice copies you are doing in the above example code.
Gary (view profile)
Thanks for the answer, but I think you misunderstood my problem or me your answer.
I have eg 10000 matrices and want to calculate the product. By dividing into 2*5000 I can use the advantage of your nD multiply. Here is the code:
a=rand(n,n,L)
tic
b=a(:,:,1);
for i=2:L
b=a(:,:,i)*b;
end
toc
c=a;
tic
while(L~=1)
L2=floor(L/2)*2;
d=c(:,:,end);
c = mtimesx(c(:,:,2:2:L2),c(:,:,1:2:L21));
if(L2~=L)
c(:,:,end+1) = d;
end
L=size(c,3);
end
toc
So will this never be faster than normal multiplies?
James Tursa (view profile)
@Gary: MTIMESX is generally not going to be any faster at generic matrix multiplies than MATLAB since it is calling exactly the same BLAS routines as MATLAB. The exceptions are small (4x4 or less) multiplies, dot products, outer products, and some matrixvector multiplies since they can sometimes be done faster with inline code which MTIMESX uses in the SPEED or LOOP modes. Also, the nD multiply case is almost always faster with MTIMESX than the equivalent loop(s) in MATLAB for any size since array slice copies are avoided.
Gary (view profile)
Hi,
I am trying to compute the product of many matrices (M1*M2*...). I used mtimesx to multiply half of all matrices by the other half until the result(e.g. a=M1*M2, b=M3*M4, a*b). It works very well when the matrix have small size like 4x4 but when for larger size it becomes slower than Matlab. Is that expected?
Tom (view profile)
Was a bit to fast, sorry. Someone already asked this! It works fine now!
Great job!
dm (view profile)
Great submission! Find it VERY useful for my work!
Also, I think you can change
"Beats MATLAB 300%  400% in some cases ... really!"
in the title, and multiply the numbers with at least a factor of 25 (based on the results from mtimesx_test_nd).
James Tursa (view profile)
@ Dan Scholnik: Have you tried running the mtimesx_test_nd file yet? That should invoke small OpenMP test cases. For larger test cases it may in fact be the case that later and/or 64bit versions of MATLAB are hard to improve on for speed. Alas, I don't have a 64bit system to tinker with.
At the moment I am working on incorporating inplace operations in MTIMESX and that will be in the next major release, but I will put your dense triangular suggestion on the list of potential future improvements.
Dan Scholnik (view profile)
Compiled for Matlab 2011a on Fedora 14 x86_64 using
mex CFLAGS='$CFLAGS fopenmp' LDFLAGS='$LDFLAGS fopenmp' largeArrayDims O DUNIX mtimesx.c lmwblas
Not sure all of that was needed, but it works. Compiler was gcc 4.5.1.
mtimesx('OPENMP') returns 1 and mtimesx('OMP_GET_NUM_PROCS') returns 4 (I have a dualcore i7 w/ hyperthreadding), however mtimesx never reports using anything but the BLAS no matter which flags I pass in. Best I can tell the flags have no effect. The matlabprovided mwblas is multithreaded, so perhaps there's no improving on it. Speeds were generally not much different than the matlab multiply, except (as mentioned previously) for cases where the matlab transpose or conjugate operations themselves consumed significant portions of the total time.
I also tried using the fedoraprovided BLAS via "lblas" but it appears to be singlethreaded and very slow. Here again mtimesx never used multiple processors that I could tell.
A possible feature suggestion, albeit one that deviates from the generalpurpose nature of the current code: How about routines for dense triangularmatrix multiplication? Matlab doesn't seem to have any optimizations for this case (although it does for backsubstitution, which is common with triangular matrices).
pink (view profile)
can provide your email address, because my code is too long to post
wahyoe_slipnot@yahoo.co.id
wahyoe
pink (view profile)
yes, it can work, thanks a lot james
I wanted to ask
1. if the form below, how do I use mtimesx
for i = 1: m
A (:,:, i) = B (:,:, i). '* Ex (:,:, i) * C (:,:, i);
end
2. If I compile into an application (exe) files that I have to enter into deploytool, because I saw a lot of files in it
thanks
Wahyoe
James Tursa (view profile)
@ wahyoe Unggul: Did you get the build process to work?
James Tursa (view profile)
@ wahyoe Unggol: The error message is telling you that you need to select a C compiler first. So do this:
mex setup
(Then press Enter)
(Then enter the number of a C compiler such as lcc)
(Then press Enter again)
Once this is done, try running mtimesx again and see if it builds OK.
To answer your specific question I would need to know the dimensions of each variable. I.e., what is the size of RT and Ex?
pink (view profile)
hi all
I tried running "mtimesx_test_sdspeed.m", but an error occurs
? Error using ==> mtimesx_build at 169
AC / C + + compiler has not been selected with Mexsetup
Error in ==> mtimesx at 271
mtimesx_build;
Error in ==> mtimesx_test_sdspeed at 100
compver = [computer ',' version ', mtimesx mode' mtimesx ', the median of'
num2str (n) 'runs'];
I wanted to ask, how the use of "mtimesx" in the form like this
for i = 1:2000
Keg (:,:, i) = RT (:,:, i). '* Ex (:,:, i) * RT (:,:, i);
end
and I want to ask whether in addition to speed, can also reduce the data to be more efficient?
James Tursa (view profile)
@ Daniel Lyddy: Confirmed typo in the prototypes for these functions. Change the int * to mwSignedIndex *. So
void xSYRK(char *, char *, mwSignedIndex *, mwSignedIndex *, RealKind *, RealKind *,
mwSignedIndex *, RealKind *, RealKind *, int*);
void xSYR2K(char *, char *, int*, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *,
RealKind *, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *);
becomes
void xSYRK(char *, char *, mwSignedIndex *, mwSignedIndex *, RealKind *, RealKind *,
mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *);
void xSYR2K(char *, char *, mwSignedIndex *, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *,
RealKind *, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *);
Thanks for pointing this out. That being said, these typos in all likelihood should not affect the actual results since in both cases 64bit pointers were being passed and the underlying data is in fact mwSignedIndex in spite of the prototype. I will upload fixed files soon.
Daniel Lyddy (view profile)
I am getting some warnings about incompatible pointer types when compiling on a 64bit Centos Linux 5.5 system. For example, in line 4909 referred to below, argument 10 is 'LDC' which is #defined to &ldc which is a pointer to mwSignedIndex. In the blas/lapack dsyrk function, argument 10 is also called 'LDC', but it is an integer. Pointers on 64bit systems are typically larger than 32 bit ints ... maybe these blas/lapack arguments need to be cast to something else?
mex('DDEFINEUNIX','largeArrayDims','mtimesx.c',blas_lib)
mtimesx.c: In function ‘mexFunction’:
mtimesx.c:592: warning: assignment discards qualifiers from pointer target type
In file included from mtimesx.c:1303:
mtimesx_RealTimesReal.c: In function ‘DoubleTimesDouble’:
mtimesx_RealTimesReal.c:4209: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4211: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4212: warning: passing argument 3 of ‘dsyr2k_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4225: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4227: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4228: warning: passing argument 3 of ‘dsyr2k_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4241: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4243: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4257: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4259: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
In file included from mtimesx.c:1481:
mtimesx_RealTimesReal.c: In function ‘FloatTimesFloat’:
mtimesx_RealTimesReal.c:4209: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4211: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4212: warning: passing argument 3 of ‘ssyr2k_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4225: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4227: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4228: warning: passing argument 3 of ‘ssyr2k_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4241: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4243: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4257: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4259: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
Bruno Luong (view profile)
I'm not using deploytool, just MCC command.
James Tursa (view profile)
@ Bruno Luong: Sorry I am not going to be much help here since I do not have a Windows 7 platform to test with. Are you using deploytool, and then the result can't use mtimesx?
Bruno Luong (view profile)
There is an annoying compatibility issue with deploying MTIMESX on Window7 platform, The MCR issues an error message "mex file in incorrect". It works fine under Matlab or other OS. I just can't use it when the issue is not resolved.
James Tursa (view profile)
@ Matt J: Issue mtimesx without any arguments to find out what the current calculation mode is. If you enter any of the OpenMP directives but mtimesx was not compiled with an OpenMP enabled compiler then mtimesx will revert to the nearest functionality. So SPEEDOMP will revert to SPEED, LOOPSOMP will revert to LOOPS, and the omp_etc functions will revert to substitute functions instead of the actual OpenMP functions. Your above example looks like what would happen if mtimesx was not compiled with an OpenMP enabled compiler. To see how mtimesx thinks it was compiled, issue the mtimesx('OPENMP') command and it will return a 1 (yes OpenMP) or 0 (No OpenMP).
Matt J (view profile)
James, the latest version of mtimesx doesn't display at the command line any of the new modes (so of course I don't know if they're being used).
>> mtimesx SPEEDOMP
ans =
SPEED
Kai (view profile)
Jan Simon (view profile)
Thanks James! See this thread for applying MTIMESX to calculate a fast 2norm for the matrix along a specified dimension:
http://www.mathworks.com/matlabcentral/newsreader/view_thread/293747#787738
James Tursa (view profile)
@Matt J: I will look into it. Rather than preconverting I may be able to fake out a call back to MATLAB with a pointer to the interior of the nD array and fudged up dimensions. Highly against the mex rules, but it might work. I should have taken advantage of the simple reshape method you use for the nontransposed cases (I didn't recognize it at the time I was updating the code) and that should not be too hard to implement. But the transpose cases for the 2D slices and the case for the sparse matrix being the right operand is going to be tricky to avoid a data copy. Guess I have a weekend project now ...
Matt J (view profile)
James  I appreciate you taking my comments into account in the latest rev, but I'm still seeing weird stuff. From the manual:
"If you combine sparse inputs with full nD (n > 2) inputs, MTIMESX will convert the sparse input to full since MATLAB does not
support a sparse nD result."
Preconverting the input from sparse to full leads to very slow performance, certainly slower than regular MATLAB as the comparison below shows. Is it really the only convenient solution? Also, MATLAB not supporting sparse nD is again not the issue, since even for n=2, sparse*full is always full in regular MATLAB.
%fake data
N=300;M=200;
A=speye(N);
B=rand(N,N,M);
Bs=single(B);
mtimesx SPEED;
tic;
ans1=mtimesx(A,B);
toc;
%Elapsed time is 0.681635 seconds.
tic;
ans2=reshape( A*reshape(B,N,[]) , size(B) );
toc;
%Elapsed time is 0.159543 seconds.
isequal(ans1,ans2)%1
"If you combine double sparse and single inputs, MTIMESX will convert the single input to double since MATLAB does not support a single sparse result."
As above, even if MATLAB did support sparse*fullsingle you would expect the result to be full single, not sparse single. And also, this feature is showing very slow behavior as compared to regular MATLAB, for some reason
tic;
ans3=mtimesx(A,Bs);
toc;
%Elapsed time is 0.817832 seconds.
tic;
ans4=single( reshape( A*reshape(double(Bs),N,[]) ,size(B)) );
toc;
%Elapsed time is 0.402111 seconds.
isequal(ans3,ans4),%1
James Tursa (view profile)
@Matt J: Thanks for the comments. I will get to work on the suggested changes right away.
Matt J (view profile)
Just a couple more remarks regarding some of the limitations mentioned in the user manual,
"You cannot combine double sparse and single inputs, since MATLAB does not support a single sparse result."
It would be worthwhile to silently cast the single input to double prior to the matrix multiply. I've never understood why MATLAB doesn't implement the same. It only forces the user to do it manually.
"You also cannot combine sparse inputs with full nD (n > 2) inputs, since MATLAB does not support a sparse nD result."
Don't understand that. Native MATLAB produces a full result when combining sparse matrix multiplicands with full ones. No reason why you couldn't do the same for nD, as far as I can see...
Matt J (view profile)
I like it. The only thing I find a little peculiar is that the output of
mtimesx('MODE') reports the previous mode rather than the current one.
Sebastiaan (view profile)
Something like A(A<1e8) = 0 should automatically remove small numbers from the sparsity pattern.
If this is too slow, have a look at spfun; it may be faster.
However, if you mean if you can remove small elements from the output matrix, you should write something of your own, partially multiplying the matrix and removing small elements. As a tip, use full column vectors of the matrix, this will make the multiplication much faster than square blocks, since sparse matrices are stored by columns. E.g.:
C = spalloc(size(A,1), size(B,2), 0);
for j=1:size(B,2)
R = A*B(:,j);
R(R<1e8) = 0;
C(:,j) = R;
end
is (probably) a fast implementation. Try to take multiple columns for speed of course.
Julio (view profile)
Thanks Sebastiaan, I never thought to check the density of the resulting matrix and you are absolutely right: the memory blew up! By the way, is there any way to define an “operational zero” for sparse matrices operations in order to automatically make MATLAB assume a zero for numbers say < than 1e8?
Thank you again.
Sebastiaan (view profile)
Could it not be possible that you are really out of memory? Your result is a 460191 by 460191 matrix, which is not necessarily sparse. If the sparsity pattern is poor, the result may even be a dense matrix. Storing a full matrix this size requires 1578GiB of memory, so you have to be sure that the result is ~2% sparse. If neither of the input matrices are <1% sparse, this is likely your problem.
Julio (view profile)
Thanks for your prompt reply. “mtimes” at the MATLAB prompt gives exactly the same error so, nothing wrong with “mtimesx”. Is it possible that this is a repetition of the “outerproduct” problem? I am going to try partition the matrix “cb” in several chunks and then concatenate the results. Maybe it will work this way.
Thanks again.
James Tursa (view profile)
@ Julio: MTIMESX does not do generic sparse matrix multiplies internally, it just calls the MATLAB builtin function mtimes. i.e., it does the same thing as if you had just typed c21 * cb at the MATLAB prompt. Can you verify that c21 * cb works at the MATLAB prompt but mtimesx(c21,cb) does not work? Be sure that you start them both from the same state (e.g., clear all other big variables including ans before attempting the calculation). That would be surprising to me since it would indicate some type of mex limitation that I am currently unaware of.
Julio (view profile)
Hi, I am new using mtimesx but, so far, I think is great. Meanwile, I am trting to multiply 2 large sparse matrices "ci=mtimesx(c21,cb)". c21 is 460191 by 5579 and cb is 5579 by 460191. After several minutes I got a error message (similar to mtimes):
======================
??? Error using ==> mtimes
Out of memory. Type HELP MEMORY for your options.
Error in ==> mtimesx_sparse at 47
result = a * b;
Error in ==> mtimesx at 277
[varargout{1:nargout}] = mtimesx(varargin{:});
======================
I don't think that the problem is lack of memory (I have a 64 bit system with 8 GB of RAM and the memory usage never went above 5 GB). Couls you please comment on this.
Thanks in advance.
Julio
James Tursa (view profile)
Also in today's release that I forgot to mention below: Added custom inline code for small matrix multiplies where all of the first two dimensions are <= 4 (in 'SPEED' mode).
D Sen (view profile)
Thanks! I am seeing about a 3.25x improvement in speed for my application.
James Tursa (view profile)
@ D Sen: In addition, you would need to reshape audio_dft to be 32 x 1 x 4097.
James Tursa (view profile)
@ D Sen: The E array slicing is not set up the way mtimesx needs. mtimesx needs the array slices to be contiguous in memory (a requirement for the underlying BLAS calls). So your E array would need to be a 25 x 32 x 4097 array, making the 25 x 32 slices contiguous in memory (the 2D slices need to be the first two dimensions). The way you currently have it set up, all of your 25 x 32 array slices of E are individually scattered throughout memory and are not the first two dimensions, and mtimesx cannot use them that way.
D Sen (view profile)
I have been trying to use this to multiply a 3D matrix with a 2D matrix.
The matlab code which works is as follows:
C = zeros ( 25, 4097 );
for( i_ = 1:4097 )
C(:, i_) = squeeze( E( i_, :, : ) ) * audio_dft( :, i_ );
end
Here E and audio_dft have the following sizes:
size(E)
ans =
4097 25 32
size( audio_dft)
ans =
32 4097
I get:
mtimesx( E, audio_dft )
??? Error using ==> mtimesx
Inner matrix dimensions must agree.
Val Schmidt (view profile)
Never mind, I found my problem. I have not successfully built mtimesx on Snow Leopard 10.6 with 64 bit MATLAB.
To do so I edited
/Applications/MATLAB_R2009b.app/bin/gccopts.sh
In the section for "maci64" I changed "SDKROOT" and "MACOSX_DEPLOYMENT_TARGET" to reflect a change in OS from 10.5 to 10.6.
Then I selected this opts file with
mex setup
selecting "1" at the prompt.
Then I compiled everything with:
mex v DDEFINEUNIX largeArrayDims mtimesx.c lmwblas lmwlapack
(look out for the 2 D's in DDEFINEUNIX  that was my error)
Val Schmidt (view profile)
I'm trying to compile on 64bit MATLAB on Mac OSX 10.6 without success. The only changes I've made to the default have been to modify the MATLABROOT/bin/gccopts.sh file such that the SDK and OS Version are set to the 10.6 variety. (and then ran mex setup)
It seems the linker cannot find the linear algebra packages, but it is not clear to me why. The output is below and I would be thankful for ideas on what to try next.
thanks
snip
> gcc4.0 O Wl,twolevel_namespace undefined error arch x86_64 Wl,syslibroot,/Developer/SDKs/MacOSX10.6.sdk mmacosxversionmin=10.6 bundle Wl,exported_symbols_list,/Applications/MATLAB_R2009b.app/extern/lib/maci64/mexFunction.map o "mtimesx.mexmaci64" mtimesx.o L/Applications/MATLAB_R2009b.app/bin/maci64 lmx lmex lmat lstdc++
Undefined symbols:
"_dsyr2k", referenced from:
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
"_dgemm", referenced from:
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
"_dgemv", referenced from:
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
"_sgemm", referenced from:
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
"_sgemv", referenced from:
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
"_dsyrk", referenced from:
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
"_ssyrk", referenced from:
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
"_ssyr2k", referenced from:
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
"_sdot", referenced from:
_floatcomplexdotproduct in mtimesx.o
_floatcomplexdotproduct in mtimesx.o
_floatcomplexdotproduct in mtimesx.o
_floatcomplexdotproduct in mtimesx.o
_floatcomplexdotproduct in mtimesx.o
"_ddot", referenced from:
_doublecomplexdotproduct in mtimesx.o
_doublecomplexdotproduct in mtimesx.o
_doublecomplexdotproduct in mtimesx.o
_doublecomplexdotproduct in mtimesx.o
_doublecomplexdotproduct in mtimesx.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
mex: link of ' "mtimesx.mexmaci64"' failed.
Roland Kruse (view profile)
The code is suprisingly fast for multiplication of a sparse matrix by a skalar. I really wonder why Matlab is so slow in this (very simple) case.
Val Schmidt (view profile)
James,
This is code is proving to be a FANTASTIC speed up to my work. I am so grateful that I thought I would post and comment.
First, I am working on R2008b on an intel mac. I compiled your code with
mex DEFINEUNIX mtimesx.c
and although I received a handful of warnings about redefined pointers, it compiled fine without error. I also tried compiling the code with a few extra optimizations, out of curiosity:
mex DEFINEUNIX CFLAGS = '$CFLAGS O3 funrollloops fexpensiveoptimizations ffastmath' mtimesx.c
The result wasn't significantly faster than the original on my own processing test (I did not take the time to run it through your tests, however.).
I did run mtimesx_test_ddspeed initially to compare with MATLAB native routines, and while I didn't capture the results, I found that most things native MATLAB could do as fast or faster than mtimesx, with the notable exception of math involving complex conjugates, in which mtimesx was far, far faster.
However, the fantastic benefit for me was the native handling of 4D matrices by mtimesx. In my own code in which I am rotating several (8.3) million 3D vectors through a 3x3 rotation matrices, mtimesx increased the speed of my calculations 2100%!
Thanks so much for your time and effort. It is most gratefully appreciated.
Val
James Tursa (view profile)
@ veltz: Thanks for your comments. Regarding openMP, I think the short answer is no. I will give my reasons shortly, but please be advised I don't do any openMP stuff myself so my comments might not be correct.
The basic loop for nd stuff is this loop:
for( ip=0; ip<p; ip++ ) {
However, this is just the loop for the blocks in the C result. The blocks from the A and B arrays are calculated inside the loop based on the singleton expansion code (which depends on the dimensions of A and B) and are not known ahead of time ... at the end of one iteration the calculation for the A and B blocks for the next iteration takes place (the C block pointer just advances by the same amount each time). So it is not set up for a parallel loop. I would have to change the code and create three new arrays of pointers so that ahead of time all of the individual block starting points are known. *Then* maybe it could be put in a parallel loop.
But I don't think this would do much good. The multithreaded stuff in MATLAB seems to happen at the BLAS/LAPACK level, so mtimesx is already getting the benefit of multithreading for those calls without doing anything special in the mtimesx code itself to get it. For example, if you compare a simple matrix multiply timing at the MATLAB prompt using two large matrices, you can see about a 1.6x improvement with 2 threads vs 1 thread. If I code up a very simple mex routine that calls dgemm and does nothing else (no pragma omp etc) I see the exact same timing benefit when switching the number of threads used. So the cmex code gets the multithreaded benefit by virtue of the BLAS/LAPACK routines used. Wrapping another layer of threading on top of that will likely not result in any timing benefit, and may (speaking from ignorance here) in fact conflict with the multithreading that is already going on inside the BLAS/LAPACK routines themselves.
veltz (view profile)
First, thank you for this marvellous code :)
Is there a way to use openMP in your code ? For example, when doing
C = zeros(2,6,4,5);
for m=1:4
for n=1:5
C(:,:,m,n) = A(:,:,m,n) * B(:,:,m,n);
end
end
we could call un pragma omp parallel for. As the code is pretty big, where does such loop appear in your code ?
Jon C (view profile)
Debian 64bit compiled right with:
mex CFLAGS="\$CFLAGS std=c99" DDEFINEUNIX largeArrayDims lmwblas mtimesx.c
(the functional syntax of mex did not parse the arguments properly)
The GCC ansi switch included by the default mexopts apparently refers to C89 (which does not recognize //  C++ style comments). I wonder what is wrong with C99 as a default in mexopts?
Jon C (view profile)
Joshua Dillon (view profile)
Very useful code. Good work.
The following snippet may be useful to those compiling on Ubuntu:
libblas='/usr/lib/libblas.so';
if exist(libblas,'file')~=2
system('sudo aptitude install libblasdev');
end
% This will compile the code on linux
mex('CFLAGS=std=c99 fPIC','DDEFINEUNIX','largeArrayDims','lmwblas','mtimesx.c');
James Tursa (view profile)
@Fabio: Thanks very much for the 64bit info. I was unaware that gcc did not like the // commenting style, which I prefer over the /* */. However, I will create additional files with changed comments and upload them in the next week or so. I would be very interested in seeing the details of what you had to change to get it to compile and the results of any benchmark tests you happen to run. Feel free to email me at the address listed in the pdf file. Thanks!
Fabio Veronese (view profile)
Great job!
Thanks to mtimesx I changed a huge for loop to 3D matrix computation, reducing script execution time from 14 s to 1.5 s!
Tested and compiled on Gentoo Linux (64bit) 2.6.33 on amd64 system. Compiling with gcc 4.3.* (not the deprecated 4.2 as suggested by Mathworks) I had to delete all quotes from C code files. Probably "//" sequence is not friendly to newer gcc compiler, maybe you can replace it with " /*quote*/ " wich is correctly ignored. Also to successfully compile I had to use:
mex('DDEFINEUNIX','largeArrayDims', 'lmwblas','mtimesx.c')
wich is something different from default. Both 'lmwlapack', 'lmwblas' are the options suggested for compiling in UNIX systems (see "Building MEXFiles" in doc), but that was enough. I can say it works correctly, but I didn't run the benchmarks files.
Hope this helps!
Feel free to ask if some testing on LINUX 64bit is needed. I'm going to run tests soon.
Eric Larrieux (view profile)
Thanks for the quick fix!
James Tursa (view profile)
Confirmed bug. The bug happens during the following conditions in MATLAB mode:
(row vector) * (nonsquare matrix)
(nonsquare matrix transposed) * (column vector)
Sorry about that! This was not caught in testing because I only used square matrices for these particular tests. The bug occurs because the BLAS routines DGEMV and SGEMV do not switch the dimension arguments when a transpose is involved. I had erroneously assumed that they did because that is how the BLAS routines DGEMM and SGEMM arguments work for general matrix multiplies. I have no idea why the interface for these routines is not set up the same. In any event, I will upload a fix tonight. In the meantime, you can manually put the fix in as follows:
Edit the mtimesx_RealTimesReal.c file.
Global replace the following text:
xGEMV(PTRANSA, M, K
with this text:
xGEMV(PTRANSA, &m1, &n1
And then also global replace the following text:
xGEMV(PTRANSB, N, K
with this:
xGEMV(PTRANSB, &m2, &n2
Those replacements will force the DGEMV and SGEMV calls to use the original dimensions of the first matrix argument instead of the switched dimensions.
Eric Larrieux (view profile)
Thanks! Also, FYI: I am running R2009a...
James Tursa (view profile)
I will look into it.
Eric Larrieux (view profile)
Hi I running matlab on a 32bit windows machine and I am getting incorrect (all 0) results when i perform some simple 3D matrix multiplication in "matlab' mode only ('speed' mode seeps yield the correct results in all the cases i have tested so far). i.e.
>> % define a and b
>> a=2*ones(1,2,2)
a(:,:,1) =
2 2
a(:,:,2) =
2 2
>> b=3*ones(2,3,2)
b(:,:,1) =
3 3 3
3 3 3
b(:,:,2) =
3 3 3
3 3 3
>>
>> % verify result using normal matrix multiply
>> a(:,:,1)*b(:,:,1)
ans =
12 12 12
>> mtimes(a(:,:,1),b(:,:,1))
ans =
12 12 12
>>
>> % test results using mtimesx
>> mtimesx(a(:,:,1),b(:,:,1),'matlab')
ans =
0 0 0
>> mtimesx(a(:,:,1),b(:,:,1),'speed')
ans =
12 12 12
I just downloaded and installed the package yesterday, so I believe the outerproduct bug discussed about should be fixed. Does anyone have any idea what is causing this error? Thanks!
eric
James Tursa (view profile)
@newpolaris: Thanks for the input! It's hard for me to debug the 64bit stuff since I don't have a system to test with, so comments like yours are very much appreciated. I have uploaded a new set of files with the largearraydims typo in the build routine fixed.
newpolaris ? (view profile)
grate! but mistyping exist on mtimesx_build.m for 64bit Windows
on line 247 > largearraydims = 'largearraydims';
it cause error like,
Warning: MEX could not find the library "argearraydims"
specified with l option on the path specified
with the L option.
... mex mtimesx.c build completed ... you may now use mtimesx.
so it muse be replace by "largearraydims = 'largeArrayDims';"
newpolaris ? (view profile)
newpolaris ? (view profile)
in my case, result is all zero matrix :)
Tobias Wood (view profile)
Cannot thank you enough for this. Got a huge speed improvement (10 seconds down to 0.6) for a script using this. I'm staggered it's not in the core of Matlab.
Bruno Luong (view profile)
It must have some gut to tackle what Matlab pretends to do the best: matrix computation.
The package is nicely presented. This is no doubt a professional quality code using the junk of stateofart BLAS, an accompaniment PDF document to explain the “why” and the “how” in great detail.
As with any well written MEXbased package, there is an installation file to make conveniently the compilation before the package can be used. Just type the name and the job is done. The main path (no subfolders) of the package is left to user to add to Matlab preference.
Obviously there are 8 separate files that can be used to benchmark with Matlab. Their names tell something that not obvious for new comers, but for those who are familiar with Matlab, there is no doubt: It tests all 4 combination of double/single and one for equality test, another for speed test. I choose the “dd” one (double/double) likely the most used combination, and run the “equal” test. It crashes. Too bad, then despite of the crash I took my chance to run the speed test. Well, a bunch of tests are tic/toc against Matlab, with clear outputs indicating if it’s slower or faster than Matlab and how much in percent the runtimes are comparable. Something strange happened: in some cases, the MTIMESX show few hundred times faster than Matlab. I told: “It cannot be true; why Matlab performs so poorly?”. I decided to go stepbystep the speedtest function, and discovered there may be another bug. I contacted the author, and bingo, I get his explanation and solution few hours later. The bug (just one line missing) is due to last minute change before submission. Followed James’s instruction, I modified the code, rebuilt the Mex file, and run the test code. The results look much more reasonable. And I rerun the “equal” test, there is no longer crash.
One more word on my setup before reading the next: Windows Vista 32bit, 2009B, and an old loyal VS C++ 6.0. It is important because MTIMESX performance will depend greatly on the setup, and this is clearly stated by the author.
The main function MTIMESX can operate on two modes: “MATLAB” (default) and “SPEED”. The first one tries to be as much as compatible with Matlab, and the second one is all about – as the name suggests  speed oriented. In all my first two tests I run “MATLAB” modes, the last one will compare two modes.
PERTINENT:
I. 2D Arrays
Speed tests run with all possible combination of cases: complex/real; normal, transposed, conjugate, conjugatetransposed; then combinations of matrix/vector/scalar; all that apply for first or second matrix arguments. Each argument might have then 2*4*3=24 states, not counting a vector can be column or row. Combine both, the test run probably no less than 24^2=576 cases, and for 2D arrays only (ND arrays supported, we’ll be back on that).
Most of the time, MTIMESX (in Matlab mode) seems to perform more or less a comparable speed with Matlab, except for few cases where speed advantage is clearly significant. Here is one typical of such case:
>> n=2000;
>> A=rand(n)+1i*rand(n);
>> B=rand(1,n)+1i*rand(1,n);
>> tic; C=conj(A)*B.'; toc
Elapsed time is 0.094847 seconds.
>> tic; C=mtimesx(A,'G',B,'T'); toc
Elapsed time is 0.034242 seconds.
That about 3 times faster than Matlab. Taking a closer look, it seems Matlab is penalized by carrying out explicitly the transpose and conjugate before the product as show in the next
>> A1=conj(A);
>> B1=B.';
>> tic; C=A1*B1; toc
Elapsed time is 0.030122 seconds.
whereas MTIMESX performs those operations *onthefly* during the product. Very good. But I would argue that the speed gain is less in practice. In fact if the product is to be carried out once, careful developers usually manage to build conj(A) and B.’ and not A and B. In this case all the speed gain of MTIMESX is irrelevant. If A and conj(A) are used to perform product once each, then there is speed gain is about 1.5 (3/2). If they are used many times, then the developer might perform CONJ once and store it somewhere before product function MTIMES is invoked, and the speed gain get even closer to 1. Still there is clearly an advantage in MTIMEX, for speed and it avoid storing extra matrix conj(A).
My impression is this is the only (and significant) factor that contributes to the speed gain: performing CONJUGATE on the fly. Recent Matlab (2008A, i.e., v.7.6) benefits smart TRANSPOSE on the fly, according to TimDavies.
II. NDarrays
The nicest feature of MTIMESX is the capability to perform multiple matrix products ranged in a NDarray. Previously there is such function available on Matlab File Exchange coded by Peter Boettcher also using BLAS, but it seems this package has now a minimum support, and users has trouble to compile it on 64platform. So this new feature is very welcome.
When a Matlab developer wants to make product of multiple matrix of the same size, up to date he has 4 alternatives
1. Use forloop
2. use a product based on sparse diagonal block (for example function SliceMultiProd available in http://www.mathworks.com/matlabcentral/fileexchange/24260multiplesamesizelinearsolver
3. Use the package MULTIPROD where the elementwise product and sum are carried out, http://www.mathworks.com/matlabcentral/fileexchange/8773multiplematrixmultiplicationswitharrayexpansionenabled
4. Use BLAS based, such as Peter Boettcher’s ndfun, and now James Tursa’s MTIMESX
Here is a code I use to benchmark the four approaches. Note that in small matrix size (of less than said 5) are often encountered in practice.
function mprodbench(m,n)
% Bench mark four approaches of product of n (m x m) matrices
A=rand(m,m,n);
B=rand(m,m,n);
C=zeros(size(A,1),size(B,2),size(B,3));
tic
for k=1:size(B,3)
C(:,:,k) = A(:,:,k)*B(:,:,k);
end
t1=toc;
tic
C=SliceMultiProd(A,B);
t2=toc;
tic
C=multiprod(A,B);
t3=toc;
tic
C=mtimesx(A,B);
t4=toc;
fprintf('Matrix size=%d, number matrices=%d\n', m, n);
fprintf('\tforloop = %f, rel speed gain = (%f)\n', t1, t1/t1);
fprintf('\tSliceMultiProd = %f, rel speed gain = (%f)\n', t2, t1/t2);
fprintf('\tmultiprod = %f, rel speed gain = (%f)\n', t3, t1/t3);
fprintf('\tmtimesx
end
And here is the results:
> mprodbench(3,1e4)
Matrix size=3, number matrices=10000
forloop = 0.055119, rel speed gain = (1.000000)
SliceMultiProd = 0.014130, rel speed gain = (3.900815)
multiprod = 0.007815, rel speed gain = (7.052806)
mtimesx = 0.006185, rel speed gain = (8.911272)
>> mprodbench(10,1e4)
Matrix size=10, number matrices=10000
forloop = 0.074665, rel speed gain = (1.000000)
SliceMultiProd = 0.146336, rel speed gain = (0.510228)
multiprod = 0.151051, rel speed gain = (0.494303)
mtimesx = 0.022569, rel speed gain = (3.308258)
That shows MTIMESX is the best and consistent approach of multiple samesize matrix product. It supports inputs with any multipletrailing dimensions.
III. SPEED VS MATLAB MODES
From my test, speed mode is faster about 3050% typically for tensorial product (column vector multiplied row vector). Beside that case, the speed gain is probably not spectacular.
CONCLUSION:
 For whom? Developers who are interested in optimizing speed and want to take advantage the last ounce of performance from their computers. I have not tested in older version of Matlab but MTIMES, but it will be more surely pertinent as Matlab makes constant progress such as smart matrix transposition. Those who run on latest Matlab (from 2008A) has limited benefits besides saving extra conjugate and transpose operations of their matrices. The syntax of MTIMESX is simple, though quite not lean as Matlab operator is the price to pay.
 MTIMESX is clearly a best approach for multipleproduct of matrices. Go for it!
Fully justified, until Mathlab catching up.
Bruno
James Tursa (view profile)
As a workaround prior to Monday you can make this change to the mtimesx_RealTimesReal.c file near line 222:
scalarmultiply = 0; // < add this line
if( m1 == 1 && n1 == 1 ) {
scalarmultiply = 1;
James Tursa (view profile)
There is a bug in the outer product code. Will be fixed as soon as TMW posts updated version. Probably Monday Dec 7.