Code covered by the BSD License  

Highlights from
MTIMESX - Fast Matrix Multiply with Multi-Dimensional Support

4.95833

5.0 | 28 ratings Rate this file 163 Downloads (last 30 days) File Size: 256 KB File ID: #25977
image thumbnail

MTIMESX - Fast Matrix Multiply with Multi-Dimensional Support

by

 

30 Nov 2009 (Updated )

Beats MATLAB 300% - 400% in some cases ... really!

| Watch this File

File Information
Description

MTIMESX is a fast general purpose matrix and scalar multiply routine that has the following features:

- Supports multi-dimensional (nD, n>2) arrays directly
- Supports Transpose, Conjugate Transpose, and Conjugate pre-operations
- Supports singleton expansion
- Utilizes BLAS calls, custom C loop code, or OpenMP multi-threaded C loop code
- Can match MATLAB results exactly or approximately as desired
- Can meet or beat MATLAB for speed in most cases

MTIMESX has six basic operating modes:

- BLAS: Always uses BLAS library calls
- LOOPS: Always uses C loops if available
- LOOPSOMP: Always uses OpenMP multi-threaded C loops if available
- MATLAB: Fastest BLAS or LOOPS method that matches MATLAB exactly (default)
- SPEED: Fastest BLAS or LOOPS method even if it doesn't match MATLAB exactly
- SPEEDOMP: Fastest BLAS, LOOPS, or LOOPOMP method even if it doesn't match MATLAB exactly

MTIMESX inputs can be:

single
double
double sparse

The general syntax is (arguments in brackets [ ] are optional):

    mtimesx( [directive] )
    mtimesx( A [,transa] ,B [,transb] [,directive] )

Where transa, transb, and directive are the optional inputs:

    transa = A character indicating a pre-operation on A:
    transb = A character indicating a pre-operation on B:
             The pre-operation can be any of:
             'N' or 'n' = No pre-operation (the default if trans_ is missing)
             'T' or 't' = Transpose
             'C' or 'c' = Conjugate Transpose
             'G' or 'g' = Conjugate (no transpose)
    directive = One of the modes listed above, or other directives

Examples:

 C = mtimesx(A,B) % performs the calculation C = A * B
 C = mtimesx(A,'T',B) % performs the calculation C = A.' * B
 C = mtimesx(A,B,'g') % performs the calculation C = A * conj(B)
 C = mtimesx(A,'c',B,'C') % performs the calculation C = A' * B'
 mtimesx('SPEEDOMP','OMP_SET_NUM_THREADS(4)') % sets SPEEDOMP mode with number of threads = 4

For nD cases, the first two dimensions specify the matrix multiply involved. The remaining dimensions are duplicated and specify the number of individual matrix multiplies to perform for the result. i.e., MTIMESX treats these cases as arrays of 2D matrices and performs the operation on the associated parings. For example:

    If A is (2,3,4,5) and B is (3,6,4,5), then

    mtimesx(A,B) would result in C(2,6,4,5), where C(:,:,i,j) = A(:,:,i,j) * B(:,:,i,j), i=1:4, j=1:5

    which would be equivalent to the MATLAB m-code:

    C = zeros(2,6,4,5);
    for m=1:4
        for n=1:5
            C(:,:,m,n) = A(:,:,m,n) * B(:,:,m,n);
        end
    end

The first two dimensions must conform using the standard matrix multiply rules taking the transa and transb pre-operations into account, and dimensions 3:end must match exactly or be singleton (equal to 1). If a dimension is singleton then it is virtually expanded to the required size (i.e., equivalent to a repmat operation to get it to a conforming size but without the actual data copy). This is equivalent to a bsxfun capability for matrix multiplication.

Acknowledgements

This file inspired Mmx Multithreaded Matrix Operations On N D Matrices, Small Size Linear Solver, Frontal, and Merton Structural Credit Model (Matrixwise Solver).

MATLAB release MATLAB 7.4 (R2007a)
Other requirements A C compiler, such as the built-in lcc compiler. (You don't have to know anything about C to use mtimesx ... it is self-building)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (122)
23 Apr 2014 Safdar

Sorry for the multiple posts. It seems something went wrong when submitting the post. Sorry

23 Apr 2014 Safdar

I got the following message

... Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c
... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex -setup

Error in mtimesx (line 271)
mtimesx_build;

23 Apr 2014 Safdar

Hi James,

I unzipped the file in my working folder and tries to use mtimesx, I got the following message

... Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c
... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex -setup

Error in mtimesx (line 271)
mtimesx_build;

23 Apr 2014 Safdar

Hi James,

I unzipped the file in my working folder and tries to use mtimesx, I got the following message

... Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c
... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex -setup

Error in mtimesx (line 271)
mtimesx_build;

23 Apr 2014 Safdar

Hi James,
I unzipped the folder in my working path and then tried to used mtimesx, I got the following error

... Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c
... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex -setup

Error in mtimesx (line 271)
mtimesx_build;

26 Mar 2014 Alireza

@Evan: have you tired blas_lib = 'lmwblas'?

09 Mar 2014 Evan

I would like to second Matthieu's question. For blas_lib, I've tried both

blas_lib = '/Applications/MATLAB_R2013b.app/bin/maci64/lapack.spec';

and

blas_lib = '/Applications/MATLAB_R2013b.app/bin/maci64/blas.spec';

With both, I get:

ld: warning: ignoring file /Applications/MATLAB_R2013b.app/bin/maci64/blas.spec, file was built for unsupported file format

Anyone have any ideas?

05 Dec 2013 Marc Crapeau

Works well for me on Linux.

If it can helps, I have installed the library libblas.so in my home following the version "shared library" of this page: http://gcc.gnu.org/wiki/GfortranBuild

Then I have compiled the mex file with the command return by the mtimesx routine when the automatic installation failed:
blas_lib = 'the_actual_path_and_name_of_your_systems_BLAS_library';
mex('-DDEFINEUNIX','-largeArrayDims','mtimesx.c',blas_lib)

Gained a factor 2 in speed for my 3D matrices multiplications, and a clearer matlab code :)

22 Aug 2013 Hannes

Hi all,

I tried to locate compliers via mex -setup, but got the answer => No supported SDK or compiler was found on this computer.
I installed Microsoft Windows SDK 7.1 but still not able to work. Does somebody know the path and install directory or what the problem is?
thanks for help

18 Jul 2013 WK

I don't know why, I did the test again in the linux cluster and the windows one, it works in the parallel computation now. So strange!

18 Jul 2013 WK

James, by the way, what version of matlab last time you test this package? I place it in my labtop (windows 64) and linux cluster running matlab 2013, only few operations that mtimesx will beat matlab and it only beat is for less than 10%. I saw that this package was uploaded in 2011 so just wonder if matlab 2013 made lots of change to boost the performance.

17 Jul 2013 James Tursa

@WK: I don't have the parallel toolbox, so am unable to give much advice here. My understanding is that each worker in the parallel environment is a separate process, which I would expect would mean that MTIMESX could be used. But since I don't have the toolbox I can't verify this. Have you actually run some tests and gotten wrong results?

17 Jul 2013 WK

I just get it compiled and installed in my 64-bit matlab with windows SDK7 as well as the unix cluster. I find that it doesn't work in spmd, doesn't it?

15 Jul 2013 WK

Thanks for the great library. I have two questions. Do I have to compile it to get library? So it means the library is really depending on the machine, right? Also, our lab has a multi-node cluster, each node are equipped with 8 cores. I think this library is based on multi-threading, if I run the matlab code in spmd and in each core, I run your library so will it conflict? Sorry, I didn't have any experience in compiling mex before, I tried but I don't have MSVC installed, I only have mingw.

25 Jun 2013 JiaDa

How about announce another function for fast matrix inverse with multi-dimensional support? I wrote a version but I think your version will far fast than mine. :P

25 Jun 2013 JiaDa  
04 May 2013 Matthieu

Hello,

I'm trying to use mtimesx on mac OS 10.8 with matlab R2011b but I don't understand how to compile it.

I tryed several solutions described in comments but none worked.

blas_lib = '/Applications/MATLAB_R2011b.app/bin/maci64'

blas_lib =

/Applications/MATLAB_R2011b.app/bin/maci64

>> mex('-DDEFINEUNIX','mtimesx.c',blas_lib)
mtimesx.c: In function 'mexFunction':
mtimesx.c:592: warning: assignment discards qualifiers from pointer target type
ld: can't map file, errno=22 for architecture x86_64
collect2: ld returned 1 exit status

mex: link of ' "mtimesx.mexmaci64"' failed.

Error using mex (line 206)
Unable to complete successfully.

Can you help me ?

15 Mar 2013 James Tursa

@Agustin: It looks like you put the files in the lcc compiler directory. I would advise that you put the files in a work directory instead (not any of the "official" sys or bin etc directories that are installed with MATLAB) and try again. Make sure this work directory is on the MATLAB path. As a side note, in general it is *not* a good idea to put your own files into any of the "official" directories that are installed with MATLAB ... doing so risks breaking built-in MATLAB functions and capabilities.

15 Mar 2013 agustin darrosa

thanks james but i have this problem

Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.c
... Found file mtimesx_RealTimesReal.c in C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx_RealTimesReal.c
... Opened the mexopts.bat file in C:\Users\Gusa\AppData\Roaming\MathWorks\MATLAB\R2008a\mexopts.bat
... Reading the mexopts.bat file to find the compiler and options used.
... LCC is the selected compiler
... OpenMP compiler not detected ... you may want to check this website:
http://openmp.org/wp/openmp-compilers/
... Using BLAS library lib_blas = 'C:\Program Files\MATLAB\R2008a\extern\lib\win32\lcc\libmwblas.lib'
... Now attempting to compile ...

mex('C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.c',lib_blas,'-DCOMPILER=LCC')

it can´t find C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.exp
it cant´find C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.lib
... mex mtimesx.c build completed ... you may now use mtimesx.

then i introduce:

C = mtimesx(a,b)
??? Undefined function or method 'mtimesx' for input arguments of type 'double'.

what can i do?

12 Mar 2013 James Tursa

@Agustin: For Windows, generally you just put all the files in a directory on the MATLAB path and then type mtimesx at the prompt. For other systems, see other posts below.

12 Mar 2013 agustin darrosa

Hi

How can i introduce this "function" in matlab??

I tried it but i have errors

Sorry for my ignorance

08 Mar 2013 James Tursa

@Tonio: No, MTIMESX does not do this. But from your description it sounds like something like this might work for you:

a = cell array of matrices
b = cell array of vectors
c = cellfun(@mtimes,a,b,'UniformOutput',false)

06 Mar 2013 Tonio

Can I used that for multiplying an (m,n,k) * (n,k) = (m,k) where each k 'slice' needs to be multiplied together and they may contain variables number of elements, i.e. 'm' can vary in each slice? In my code each slice is stored in a cell in matlab.

04 Feb 2013 James Tursa

@Robert: Yes, MTIMESX is CPU based, not GPU.

04 Feb 2013 Robert

Hello James,

nice work. A basically like to multiply n matrices B (400,400) with one Matrix A (400,400). I used arrayfun before, but your code is about 2.5 times faster. Soon I will get a Tesla NVidia graphic card and I would like to ask, is there any possibility to run the calculation on the GPU instead on the CPU?
I tried to call mtimesx with tww GPU Arrays but as my computation times is still fast (on my slow GPU) I guess he was not using my GPU right now.
Perhaps I also missed this, but to my understanding mtimesx is using the CPU only, or??

Thanks

Robert

23 Jan 2013 Charles

@Michael Thanks, that did it.

22 Jan 2013 Michael Völker

Charles,

for my own record and to make it machine-searchable, I wrote earlier how I was able to compile mtimesx. Taking the liberties to quote myself:

> On a 64Bit Debian based Linux I managed to compile it with
> mex -DDEFINEUNIX CFLAGS="\$CFLAGS -march=native" -largeArrayDims -lmwblas -lmwlapack -lgomp mtimesx.c
> using gcc-4.5.

Can you try that, or did you already try?

Michael

22 Jan 2013 Charles

@Sebastian Thanks for the response. I think I wasn't linking to the right file. Now, I seem to be linking to the blas library, but I'm getting a new error. I am using the code "mex -DEFINEUNIX 'mtimesx.c' -lmwblas" but I get an error that reads "mtimesx.c:592: warning: assignment discards qualifiers from pointer target type ". I saw that someone else had gotten this error before and the code was fixed, so I downloaded the files again but I still am getting the error. When I tried to use mtimesx in my code I get an error and matlab crashes. I haven't had any problems with other mex files, so it seems to be specific to mtimesx.

Any help is appreciated. Thanks in advance.

21 Jan 2013 Sebastiaan

@Charles: under Unix, the BLAS library linked to is normally mwblas , not blas_lib.

20 Jan 2013 Charles

This program is great. Really useful. I am having great success with it on my PC. However, I cannot seem to create the mex file on a UNIX computer. I am defining the BLAS library as blas_lib. Then I enter the code "mex('-DDEFINEUNIX','mtimesx.c',blas_lib)". However, I get a string of errors that are all similar to "mtimesx.c;(.text+0x76a1)undefined reference to `saxpy_'". It says this for saxpy, sger, sdot, daxpy, dger, ddot, sgemv,ssyrk, ssyr2k, sgemm, dgemv, dsyrk and dgemm. Can anyone please help me with creating the .mexa64 file? Thanks.

30 Oct 2012 James Tursa

@KSIDHU: Why can't you use .* and ./ directly? What are the exact dimensions of your two matrices? If the matrices need singleton expansion, then look at the function bsxfun. What is the overall operation you are trying to do?

29 Oct 2012 KSIDHU

Hi
How can I perform elementwise multiplication and division of two matrices?
I have looked through the manual but cannot find any information on it.
Is there a way to check if BLAS multi-threading is available? Is this better than OpenMP multi-thread

I need to perform .*, ./ on matrices with numel ~3000x100,000 so any tips would be great, and also is there a multi threaded sum function for large matrices (along dim 1 or 2).

MATLAB 2010a
Thanks in advance
KS

13 Sep 2012 James Tursa

I like the reasoning you both have spelled out. I will make logical*other conform to the other, and make logical*logical output a double by default but have an option to do logical AND/OR operations and output a logical instead. Thanks for your inputs.

13 Sep 2012 John

I like Michael's reasoning and I think he is right about the 'double' result. So in general I think that the double should be the default. However, given that i enjoy being efficient with data, I would appreciate a 'logical' option.

12 Sep 2012 Michael Völker

Damn, the E-Mail notification stopped to work, so I didn't know you replied to my message.

My spontaneous thought was that (logical matrix) * (logical matrix) should *of course* result in a double output, since matrix multiplication involves a sum, so we should expect to get results different than 0 or 1.

So, as you already guessed, I would like to have an auto-adjust-precision feature.
This is what I would expect:

[inputA] * [inputB] ==> [output]
------------------------------------------
logical | logical | double
logical | single | single
logical | double | double
single | logical | single
double | logical | double

(I hope it does not get formatted too ugly...)

I also like John's idea to make the behaviour more optional.

11 Sep 2012 John

Personally, I do not understand why it should be a 'double' result. So, my vote goes towards a logical result.

However, would it be possible to add this as an option? (e.g.: mtimesx(a,b,'logical') will produce a logical result, if a and b are logical). This way the advanced user can be efficient with data, whereas the less experienced user will still get results that they are familiar with....

28 Aug 2012 James Tursa

@John: OK, that's two votes. Automatic logical conversion goes into the next version.

Q: Should a (logical matrix) * (logical matrix) give a double matrix result (ala MATLAB), or a logical matrix result (i.e., essentially treating the * - operations as AND OR operations)?

28 Aug 2012 John

Excellent code!

I experienced the same 'error' as Michael Völker, so I think automatically converting the logical/sparse data to double might be a good idea as an enhancement!

Nevertheless, great work! Saved me a lot of time/coding.

16 Aug 2012 James Tursa

@Michael Völker: The error message you are getting is not a bug. As stated in the doc, using arguments that are not double or single will cause MTIMESX to invoke the built-in mtimes function to do the matrix multiply. If MATLAB will do it then great ... you get that result. But if MATLAB will not do it then you will get whatever error message MATLAB puts out. This is the intended behavior. You can convert the logical to double first, of course, to avoid the error message. If you would like MTIMESX to do this automatically for logical inputs then I can take that as an enhancement request (seems reasonable to me).

16 Aug 2012 Michael Völker

James,

I found a bug with logical input.
Try:

foo = mtimesx( true(2,2), randn(1,1,5) );

And you probably get this error message:

Error using *
Inputs must be 2-D, or at least one input must be scalar.
To compute elementwise TIMES, use TIMES (.*) instead.

Error in mtimesx_sparse (line 47)
result = a * b;

Michael

28 Jul 2012 Sam T

James:

I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!

Lately, I have been trying to use MTIMESX code to multiply two big n-dimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:

A = 3 x 3 x 3
B = 3 x 2

and C = A*B;
such that dimension of C = 3 x 3 x 2

where C(i, j, l) = A(i, j, k) * B(k, l);
i.e.
C(:, :, 1) = A(:, :, k) * B(k, 1);
C(:, :, 2) = A(:, :, k) * B(k, 2);

Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).

This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.

I am wondering if there is a way around it? I look forward to hearing from you.

Thanks for your help.

25 Jul 2012 James Tursa

@Sam T: I don't think I get it yet. In your example you have:

C(:, :, 1) = A(:, :, k) * B(k, 1);

A(:,:,k) is 3x3
B(k,1) is 1x1

Are you trying to do an nD scalar*matrix multiply here? MTIMESX can do that (with some modifications of the inputs), but I am not quite sure that is really what you want. Can you clarify? Maybe write out an explicit for loop showing how C is to be filled in its entirety?

24 Jul 2012 Sam T

James:

I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!

Lately, I have been trying to use MTIMESX code to multiply two big n-dimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:

A = 3 x 3 x 3
B = 3 x 2

and C = A*B;
such that dimension of C = 3 x 3 x 2

where C(i, j, l) = A(i, j, k) * B(k, l);
i.e.
C(:, :, 1) = A(:, :, k) * B(k, 1);
C(:, :, 2) = A(:, :, k) * B(k, 2);

Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).

This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.

I am wondering if there is a way around it? I look forward to hearing from you.

Thanks for your help.

24 Jul 2012 Sam T

James:

I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!

Lately, I have been trying to use MTIMESX code to multiply two big n-dimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:

A = 3 x 3 x 3
B = 3 x 2

and C = A*B;
such that dimension of C = 3 x 3 x 2

where C(i, j, l) = A(i, j, k) * B(k, l);
i.e.
C(:, :, 1) = A(:, :, k) * B(k, 1);
C(:, :, 2) = A(:, :, k) * B(k, 2);

Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).

This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.

I am wondering if there is a way around it? I look forward to hearing from you.

Thanks for your help.

24 Jul 2012 Sam T

James:

I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!

Lately, I have been trying to use MTIMESX code to multiply two big n-dimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:

A = 3 x 3 x 3
B = 3 x 2

and C = A*B;
such that dimension of C = 3 x 3 x 2

where C(i, j, l) = A(i, j, k) * B(k, l);
i.e.
C(:, :, 1) = A(:, :, k) * B(k, 1);
C(:, :, 2) = A(:, :, k) * B(k, 2);

Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).

This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.

I am wondering if there is a way around it? I look forward to hearing from you.

Thanks for your help.

02 Jul 2012 Boaz Schwartz  
01 Jun 2012 Clay Fulcher

mtimesx is an awesome code. Besides speeding up my structural dynamics and signal processing applications, it is much easier to apply than using loops. I'm also using a multi-D matrix inverter called multinv that can be downloaded from this forum. The two codes together allow me to operate on matrices of functions as easily as matrices of single values.

30 May 2012 Clay Fulcher

@ James Tursa: Thanks James! I greatly appreciate it!

14 May 2012 James Tursa

@ Clay Fulcher: Some options to possibly get more speed:

1) NDFUN by Peter Boettcher. E.g., see this thread:

http://www.mathworks.com/matlabcentral/newsreader/view_thread/294669#861239

2) FEX submissions. E.g., see this submission:

http://www.mathworks.com/matlabcentral/fileexchange/31222

3) Wait a couple of months for the next MTIMESX submission, when this capability will be included (along with in-place operations and nD matrix multiply versions of PROD and CUMPROD).

13 May 2012 Clay Fulcher

James, I need to invert a 3D array of function values. I have to step through each slice of the array in a loop, inverting each slice. Do you know of any way to do this more efficiently?

08 Feb 2012 Jonathan Sullivan

@ James
The array sizes I'm working with are much larger what I posted. The size of A is 100x33x3. The size of B is 33x1x3x5000x5. That might be why you had trouble recreating it. I'm running on Windows 7, 64 bit, MATLAB 2011b, compiler is cl. If it makes a difference, some of the values in the first matrix are -inf.

08 Feb 2012 James Tursa

@ Jonathan: Yes, weird indeed. For slices 100x100 in size, MTIMESX will always use BLAS dgemm calls regardless of the method setting (you can verify this by using the 'DEBUG' setting). That is, the exact same code path gets executed ragardless of whether 'MATLAB', 'SPEED', or 'SPEEDOMP' is set. That setting should have had no effect on this particular outcome. Also, I have been unable to reproduce this error on a 32-bit WinXP system. Can you provide me some specifics on your MATLAB version, machine OS, compiler used, etc? Does it crash if you restart MATLAB and immediatly try the example?

08 Feb 2012 Jonathan Sullivan

@ James
So the problem, at least in part, is caused by using the 'SPEEDOMP' flag. I removed the flag, and now it works. Weird, huh?

08 Feb 2012 James Tursa

@ Jonathan Sullivan: Your first example without repmat should have worked. Looks like I introduced a bug in the latest release. I will look into this right away ...

08 Feb 2012 Jonathan Sullivan

James,

Fantastic code. It has come in handy. It is well documented, incredibly fast, and extremely useful, especially for N-D arrays.

For the N-D case, this code would be even more powerful if the singleton dimension capability were expanded. Currently, all the dimensions from 3:end must be either the same size (A to B), or must all be singleton for one of the variables.

For example
A = rand(100,100,3);
B = rand(100,100,3,5);
C = mtimesx(A,B); % Does not work. Crashes MATLAB.

You can get around this by using repmat:
A = rand(100,100,3);
B = rand(100,100,3,5);
C = mtimesx(repmat(A,[1 1 1 5]),B); % Works, but slow

Unfortunately, explicitly expanding out large arrays has overhead which is more than I'd like.

Overall code. A+.

30 Oct 2011 Michael Völker

James, thank you for your hint to using 'lapack' from the FEX.
Indeed it produces the desired result, by calling

lapack('D=DSDOT(i,s,i,s,i)', length(A), A, 1, A, 1)

with 'A' from my example above. Unfortunately, this is ~5 times slower than calling mtimesx() with double input.

On double input, calling the equivalent 'DDOT' routine from the lapack-package takes more than 10 times longer than your mtimesx (wow, by the way...) to compute the very same result.
The difference appears to be even worse for complex input.

So I am still hoping that you might somehow get this feature in your code and maintain the speed...

19 Oct 2011 James Tursa

@ Michael Völker: FYI, another way to use the DSDOT routine is to use this submission:

http://www.mathworks.com/matlabcentral/fileexchange/16777-lapack

Not sure if it works on 64-bit systems, but you might give it a try.

19 Oct 2011 James Tursa

@ Michael Völker: Thanks for the comments. I will look into your request. There is a BLAS routine called DSDOT that does what you request. I will look into how easily I can incorporate this into MTIMESX for BLAS specific results. For the SPEED modes MTIMESX sometimes uses custom code to calculate the dot product (e.g., see function RealKindDotProduct), and in fact the accumulation is done in double precision regardless of input type (calculations themselves are single for single inputs), so it might not be too difficult to adapt this to return the double precision output if requested instead of always converting back to single for single inputs. Thanks for the suggestion.

19 Oct 2011 Michael Völker

Incredibly great work. Thank you for the elaborate code, the very useful features, the intuitive syntax and behaviour and the docu.

On a 64Bit Debian based Linux I managed to compile it with
mex -DDEFINEUNIX CFLAGS="\$CFLAGS -march=native" -largeArrayDims -lmwblas -lmwlapack -lgomp mtimesx.c
using gcc-4.5.

After using mtimesx for a while I currently miss only one feature, which is to define the precision of the output independently from the input variables.
This would be useful when calculating the sum of squares of a huge single-precision vector where the correct result exceeds realmax('single') and prior casting takes a lot of time.

Example:
>> A = randn( 1e8, 1,'single'); % plenty of sane values
>> A(1) = 1e20; % only few outliers
>> sos = A' * A % or sos = mtimesx( A, 'C', A )

sos =

Inf

Doing A = double(A) beforehand solves the problem:
>> sos = A' * A

sos =

1.000000040081755e+40

but
>> tic, A = double(A); toc
Elapsed time is 0.472007 seconds.

In fact, that casting to double takes much more time than the actual computation in mtimesx. Since I do this repeatedly in an iteration, the overhead sums up to minutes.

So my wish would be like this:
C = mtimesx( A, B, 'double' ); % C is double even if A or B are single

Nevertheless, thank you so much for this code.

21 Jun 2011 James Tursa

@Gary: Follow-up, ndfun as I recall only works for real variables. If you have complex variables then the only efficient option I could suggest would be a custom mex routine.

21 Jun 2011 James Tursa

@Gary: For this type of matrix product you should be using something like ndfun('mprod',a) by Peter Boettcher. You can find the source code here:

http://www.mit.edu/~pwb/matlab/

If you need a 64-bit version of this you can see the hack I posted on this thread:

http://www.mathworks.com/matlabcentral/newsreader/view_thread/294669#791467

Using ndfun will avoid all the array slice copies you are doing in the above example code.

21 Jun 2011 Gary

Thanks for the answer, but I think you misunderstood my problem or me your answer.
I have eg 10000 matrices and want to calculate the product. By dividing into 2*5000 I can use the advantage of your nD multiply. Here is the code:

a=rand(n,n,L)

tic
b=a(:,:,1);
for i=2:L
b=a(:,:,i)*b;
end
toc

c=a;
tic
while(L~=1)
L2=floor(L/2)*2;
d=c(:,:,end);
c = mtimesx(c(:,:,2:2:L2),c(:,:,1:2:L2-1));
if(L2~=L)
c(:,:,end+1) = d;
end
L=size(c,3);
end
toc

So will this never be faster than normal multiplies?

20 Jun 2011 James Tursa

@Gary: MTIMESX is generally not going to be any faster at generic matrix multiplies than MATLAB since it is calling exactly the same BLAS routines as MATLAB. The exceptions are small (4x4 or less) multiplies, dot products, outer products, and some matrix-vector multiplies since they can sometimes be done faster with inline code which MTIMESX uses in the SPEED or LOOP modes. Also, the nD multiply case is almost always faster with MTIMESX than the equivalent loop(s) in MATLAB for any size since array slice copies are avoided.

20 Jun 2011 Gary

Hi,
I am trying to compute the product of many matrices (M1*M2*...). I used mtimesx to multiply half of all matrices by the other half until the result(e.g. a=M1*M2, b=M3*M4, a*b). It works very well when the matrix have small size like 4x4 but when for larger size it becomes slower than Matlab. Is that expected?

30 May 2011 Tom

Was a bit to fast, sorry. Someone already asked this! It works fine now!

Great job!

05 May 2011 dm

Great submission! Find it VERY useful for my work!

Also, I think you can change

"Beats MATLAB 300% - 400% in some cases ... really!"

in the title, and multiply the numbers with at least a factor of 25 (based on the results from mtimesx_test_nd).

14 Apr 2011 James Tursa

@ Dan Scholnik: Have you tried running the mtimesx_test_nd file yet? That should invoke small OpenMP test cases. For larger test cases it may in fact be the case that later and/or 64-bit versions of MATLAB are hard to improve on for speed. Alas, I don't have a 64-bit system to tinker with.

At the moment I am working on incorporating in-place operations in MTIMESX and that will be in the next major release, but I will put your dense triangular suggestion on the list of potential future improvements.

14 Apr 2011 Dan Scholnik

Compiled for Matlab 2011a on Fedora 14 x86_64 using

mex CFLAGS='$CFLAGS -fopenmp' LDFLAGS='$LDFLAGS -fopenmp' -largeArrayDims -O -DUNIX mtimesx.c -lmwblas

Not sure all of that was needed, but it works. Compiler was gcc 4.5.1.

mtimesx('OPENMP') returns 1 and mtimesx('OMP_GET_NUM_PROCS') returns 4 (I have a dual-core i7 w/ hyperthreadding), however mtimesx never reports using anything but the BLAS no matter which flags I pass in. Best I can tell the flags have no effect. The matlab-provided mwblas is multithreaded, so perhaps there's no improving on it. Speeds were generally not much different than the matlab multiply, except (as mentioned previously) for cases where the matlab transpose or conjugate operations themselves consumed significant portions of the total time.

I also tried using the fedora-provided BLAS via "-lblas" but it appears to be single-threaded and very slow. Here again mtimesx never used multiple processors that I could tell.

A possible feature suggestion, albeit one that deviates from the general-purpose nature of the current code: How about routines for dense triangular-matrix multiplication? Matlab doesn't seem to have any optimizations for this case (although it does for back-substitution, which is common with triangular matrices).

05 Apr 2011 pink

can provide your email address, because my code is too long to post
wahyoe_slipnot@yahoo.co.id
wahyoe

05 Apr 2011 pink

yes, it can work, thanks a lot james
I wanted to ask
1. if the form below, how do I use mtimesx
for i = 1: m
A (:,:, i) = B (:,:, i). '* Ex (:,:, i) * C (:,:, i);
end
2. If I compile into an application (exe) files that I have to enter into deploytool, because I saw a lot of files in it
thanks
Wahyoe

03 Apr 2011 James Tursa

@ wahyoe Unggul: Did you get the build process to work?

28 Mar 2011 James Tursa

@ wahyoe Unggol: The error message is telling you that you need to select a C compiler first. So do this:

mex -setup
(Then press Enter)
(Then enter the number of a C compiler such as lcc)
(Then press Enter again)

Once this is done, try running mtimesx again and see if it builds OK.

To answer your specific question I would need to know the dimensions of each variable. I.e., what is the size of RT and Ex?

28 Mar 2011 pink

hi all
I tried running "mtimesx_test_sdspeed.m", but an error occurs

? Error using ==> mtimesx_build at 169
AC / C + + compiler has not been selected with Mex-setup

Error in ==> mtimesx at 271
mtimesx_build;

Error in ==> mtimesx_test_sdspeed at 100
compver = [computer ',' version ', mtimesx mode' mtimesx ', the median of'
num2str (n) 'runs'];

I wanted to ask, how the use of "mtimesx" in the form like this
for i = 1:2000
Keg (:,:, i) = RT (:,:, i). '* Ex (:,:, i) * RT (:,:, i);
end
and I want to ask whether in addition to speed, can also reduce the data to be more efficient?

23 Feb 2011 James Tursa

@ Daniel Lyddy: Confirmed typo in the prototypes for these functions. Change the int * to mwSignedIndex *. So

void xSYRK(char *, char *, mwSignedIndex *, mwSignedIndex *, RealKind *, RealKind *,
mwSignedIndex *, RealKind *, RealKind *, int*);
void xSYR2K(char *, char *, int*, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *,
RealKind *, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *);

becomes

void xSYRK(char *, char *, mwSignedIndex *, mwSignedIndex *, RealKind *, RealKind *,
mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *);
void xSYR2K(char *, char *, mwSignedIndex *, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *,
RealKind *, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *);

Thanks for pointing this out. That being said, these typos in all likelihood should not affect the actual results since in both cases 64-bit pointers were being passed and the underlying data is in fact mwSignedIndex in spite of the prototype. I will upload fixed files soon.

22 Feb 2011 Daniel Lyddy

I am getting some warnings about incompatible pointer types when compiling on a 64-bit Centos Linux 5.5 system. For example, in line 4909 referred to below, argument 10 is 'LDC' which is #defined to &ldc which is a pointer to mwSignedIndex. In the blas/lapack dsyrk function, argument 10 is also called 'LDC', but it is an integer. Pointers on 64-bit systems are typically larger than 32 bit ints ... maybe these blas/lapack arguments need to be cast to something else?

mex('-DDEFINEUNIX','-largeArrayDims','mtimesx.c',blas_lib)
mtimesx.c: In function ‘mexFunction’:
mtimesx.c:592: warning: assignment discards qualifiers from pointer target type
In file included from mtimesx.c:1303:
mtimesx_RealTimesReal.c: In function ‘DoubleTimesDouble’:
mtimesx_RealTimesReal.c:4209: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4211: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4212: warning: passing argument 3 of ‘dsyr2k_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4225: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4227: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4228: warning: passing argument 3 of ‘dsyr2k_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4241: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4243: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4257: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4259: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
In file included from mtimesx.c:1481:
mtimesx_RealTimesReal.c: In function ‘FloatTimesFloat’:
mtimesx_RealTimesReal.c:4209: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4211: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4212: warning: passing argument 3 of ‘ssyr2k_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4225: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4227: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4228: warning: passing argument 3 of ‘ssyr2k_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4241: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4243: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4257: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4259: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type

05 Jan 2011 Bruno Luong

I'm not using deploytool, just MCC command.

05 Jan 2011 James Tursa

@ Bruno Luong: Sorry I am not going to be much help here since I do not have a Windows 7 platform to test with. Are you using deploytool, and then the result can't use mtimesx?

05 Jan 2011 Bruno Luong

There is an annoying compatibility issue with deploying MTIMESX on Window7 platform, The MCR issues an error message "mex file in incorrect". It works fine under Matlab or other OS. I just can't use it when the issue is not resolved.

16 Nov 2010 James Tursa

@ Matt J: Issue mtimesx without any arguments to find out what the current calculation mode is. If you enter any of the OpenMP directives but mtimesx was not compiled with an OpenMP enabled compiler then mtimesx will revert to the nearest functionality. So SPEEDOMP will revert to SPEED, LOOPSOMP will revert to LOOPS, and the omp_etc functions will revert to substitute functions instead of the actual OpenMP functions. Your above example looks like what would happen if mtimesx was not compiled with an OpenMP enabled compiler. To see how mtimesx thinks it was compiled, issue the mtimesx('OPENMP') command and it will return a 1 (yes OpenMP) or 0 (No OpenMP).

16 Nov 2010 Matt J

James, the latest version of mtimesx doesn't display at the command line any of the new modes (so of course I don't know if they're being used).

>> mtimesx SPEEDOMP

ans =

SPEED

28 Oct 2010 Kai  
14 Oct 2010 Jan Simon

Thanks James! See this thread for applying MTIMESX to calculate a fast 2-norm for the matrix along a specified dimension:
http://www.mathworks.com/matlabcentral/newsreader/view_thread/293747#787738

08 Oct 2010 James Tursa

@Matt J: I will look into it. Rather than pre-converting I may be able to fake out a call back to MATLAB with a pointer to the interior of the nD array and fudged up dimensions. Highly against the mex rules, but it might work. I should have taken advantage of the simple reshape method you use for the non-transposed cases (I didn't recognize it at the time I was updating the code) and that should not be too hard to implement. But the transpose cases for the 2D slices and the case for the sparse matrix being the right operand is going to be tricky to avoid a data copy. Guess I have a weekend project now ...

08 Oct 2010 Matt J

James - I appreciate you taking my comments into account in the latest rev, but I'm still seeing weird stuff. From the manual:

"If you combine sparse inputs with full nD (n > 2) inputs, MTIMESX will convert the sparse input to full since MATLAB does not
support a sparse nD result."

Pre-converting the input from sparse to full leads to very slow performance, certainly slower than regular MATLAB as the comparison below shows. Is it really the only convenient solution? Also, MATLAB not supporting sparse nD is again not the issue, since even for n=2, sparse*full is always full in regular MATLAB.

%fake data
N=300;M=200;
A=speye(N);
B=rand(N,N,M);
Bs=single(B);

mtimesx SPEED;

tic;
ans1=mtimesx(A,B);
toc;
%Elapsed time is 0.681635 seconds.

tic;
ans2=reshape( A*reshape(B,N,[]) , size(B) );
toc;
%Elapsed time is 0.159543 seconds.

isequal(ans1,ans2)%1

"If you combine double sparse and single inputs, MTIMESX will convert the single input to double since MATLAB does not support a single sparse result."

As above, even if MATLAB did support sparse*fullsingle you would expect the result to be full single, not sparse single. And also, this feature is showing very slow behavior as compared to regular MATLAB, for some reason

tic;
ans3=mtimesx(A,Bs);
toc;
%Elapsed time is 0.817832 seconds.

tic;
ans4=single( reshape( A*reshape(double(Bs),N,[]) ,size(B)) );
toc;
%Elapsed time is 0.402111 seconds.

isequal(ans3,ans4),%1

01 Oct 2010 James Tursa

@Matt J: Thanks for the comments. I will get to work on the suggested changes right away.

30 Sep 2010 Matt J

Just a couple more remarks regarding some of the limitations mentioned in the user manual,

"You cannot combine double sparse and single inputs, since MATLAB does not support a single sparse result."

It would be worthwhile to silently cast the single input to double prior to the matrix multiply. I've never understood why MATLAB doesn't implement the same. It only forces the user to do it manually.

"You also cannot combine sparse inputs with full nD (n > 2) inputs, since MATLAB does not support a sparse nD result."

Don't understand that. Native MATLAB produces a full result when combining sparse matrix multiplicands with full ones. No reason why you couldn't do the same for nD, as far as I can see...

30 Sep 2010 Matt J

I like it. The only thing I find a little peculiar is that the output of
mtimesx('MODE') reports the previous mode rather than the current one.

27 Aug 2010 Sebastiaan

Something like A(A<1e-8) = 0 should automatically remove small numbers from the sparsity pattern.

If this is too slow, have a look at spfun; it may be faster.

However, if you mean if you can remove small elements from the output matrix, you should write something of your own, partially multiplying the matrix and removing small elements. As a tip, use full column vectors of the matrix, this will make the multiplication much faster than square blocks, since sparse matrices are stored by columns. E.g.:
C = spalloc(size(A,1), size(B,2), 0);
for j=1:size(B,2)
R = A*B(:,j);
R(R<1e-8) = 0;
C(:,j) = R;
end
is (probably) a fast implementation. Try to take multiple columns for speed of course.

27 Aug 2010 Julio

Thanks Sebastiaan, I never thought to check the density of the resulting matrix and you are absolutely right: the memory blew up! By the way, is there any way to define an “operational zero” for sparse matrices operations in order to automatically make MATLAB assume a zero for numbers say < than 1e-8?
Thank you again.

27 Aug 2010 Sebastiaan

Could it not be possible that you are really out of memory? Your result is a 460191 by 460191 matrix, which is not necessarily sparse. If the sparsity pattern is poor, the result may even be a dense matrix. Storing a full matrix this size requires 1578GiB of memory, so you have to be sure that the result is ~2% sparse. If neither of the input matrices are <1% sparse, this is likely your problem.

27 Aug 2010 Julio

Thanks for your prompt reply. “mtimes” at the MATLAB prompt gives exactly the same error so, nothing wrong with “mtimesx”. Is it possible that this is a repetition of the “outer-product” problem? I am going to try partition the matrix “cb” in several chunks and then concatenate the results. Maybe it will work this way.
Thanks again.

26 Aug 2010 James Tursa

@ Julio: MTIMESX does not do generic sparse matrix multiplies internally, it just calls the MATLAB built-in function mtimes. i.e., it does the same thing as if you had just typed c21 * cb at the MATLAB prompt. Can you verify that c21 * cb works at the MATLAB prompt but mtimesx(c21,cb) does not work? Be sure that you start them both from the same state (e.g., clear all other big variables including ans before attempting the calculation). That would be surprising to me since it would indicate some type of mex limitation that I am currently unaware of.

26 Aug 2010 Julio

Hi, I am new using mtimesx but, so far, I think is great. Meanwile, I am trting to multiply 2 large sparse matrices "ci=mtimesx(c21,cb)". c21 is 460191 by 5579 and cb is 5579 by 460191. After several minutes I got a error message (similar to mtimes):
======================
??? Error using ==> mtimes
Out of memory. Type HELP MEMORY for your options.
Error in ==> mtimesx_sparse at 47
result = a * b;

Error in ==> mtimesx at 277
[varargout{1:nargout}] = mtimesx(varargin{:});
======================

I don't think that the problem is lack of memory (I have a 64 bit system with 8 GB of RAM and the memory usage never went above 5 GB). Couls you please comment on this.

Thanks in advance.

Julio

10 Aug 2010 James Tursa

Also in today's release that I forgot to mention below: Added custom inline code for small matrix multiplies where all of the first two dimensions are <= 4 (in 'SPEED' mode).

22 Jul 2010 D Sen

Thanks! I am seeing about a 3.25x improvement in speed for my application.

21 Jul 2010 James Tursa

@ D Sen: In addition, you would need to reshape audio_dft to be 32 x 1 x 4097.

21 Jul 2010 James Tursa

@ D Sen: The E array slicing is not set up the way mtimesx needs. mtimesx needs the array slices to be contiguous in memory (a requirement for the underlying BLAS calls). So your E array would need to be a 25 x 32 x 4097 array, making the 25 x 32 slices contiguous in memory (the 2D slices need to be the first two dimensions). The way you currently have it set up, all of your 25 x 32 array slices of E are individually scattered throughout memory and are not the first two dimensions, and mtimesx cannot use them that way.

21 Jul 2010 D Sen

I have been trying to use this to multiply a 3D matrix with a 2D matrix.

The matlab code which works is as follows:

C = zeros ( 25, 4097 );
for( i_ = 1:4097 )
C(:, i_) = squeeze( E( i_, :, : ) ) * audio_dft( :, i_ );
end

Here E and audio_dft have the following sizes:

size(E)

ans =

4097 25 32

size( audio_dft)

ans =

32 4097

I get:

mtimesx( E, audio_dft )
??? Error using ==> mtimesx
Inner matrix dimensions must agree.

10 May 2010 Val Schmidt

Never mind, I found my problem. I have not successfully built mtimesx on Snow Leopard 10.6 with 64 bit MATLAB.

To do so I edited
/Applications/MATLAB_R2009b.app/bin/gccopts.sh
In the section for "maci64" I changed "SDKROOT" and "MACOSX_DEPLOYMENT_TARGET" to reflect a change in OS from 10.5 to 10.6.

Then I selected this opts file with
mex -setup
selecting "1" at the prompt.

Then I compiled everything with:
mex -v -DDEFINEUNIX -largeArrayDims mtimesx.c -lmwblas -lmwlapack

(look out for the 2 D's in -DDEFINEUNIX - that was my error)

10 May 2010 Val Schmidt

I'm trying to compile on 64bit MATLAB on Mac OSX 10.6 without success. The only changes I've made to the default have been to modify the MATLABROOT/bin/gccopts.sh file such that the SDK and OS Version are set to the 10.6 variety. (and then ran mex -setup)

It seems the linker cannot find the linear algebra packages, but it is not clear to me why. The output is below and I would be thankful for ideas on what to try next.

thanks

---------------snip------------
-> gcc-4.0 -O -Wl,-twolevel_namespace -undefined error -arch x86_64 -Wl,-syslibroot,/Developer/SDKs/MacOSX10.6.sdk -mmacosx-version-min=10.6 -bundle -Wl,-exported_symbols_list,/Applications/MATLAB_R2009b.app/extern/lib/maci64/mexFunction.map -o "mtimesx.mexmaci64" mtimesx.o -L/Applications/MATLAB_R2009b.app/bin/maci64 -lmx -lmex -lmat -lstdc++

Undefined symbols:
"_dsyr2k", referenced from:
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
"_dgemm", referenced from:
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
"_dgemv", referenced from:
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
"_sgemm", referenced from:
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
"_sgemv", referenced from:
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
"_dsyrk", referenced from:
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
"_ssyrk", referenced from:
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
"_ssyr2k", referenced from:
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
"_sdot", referenced from:
_floatcomplexdotproduct in mtimesx.o
_floatcomplexdotproduct in mtimesx.o
_floatcomplexdotproduct in mtimesx.o
_floatcomplexdotproduct in mtimesx.o
_floatcomplexdotproduct in mtimesx.o
"_ddot", referenced from:
_doublecomplexdotproduct in mtimesx.o
_doublecomplexdotproduct in mtimesx.o
_doublecomplexdotproduct in mtimesx.o
_doublecomplexdotproduct in mtimesx.o
_doublecomplexdotproduct in mtimesx.o
ld: symbol(s) not found
collect2: ld returned 1 exit status

mex: link of ' "mtimesx.mexmaci64"' failed.

03 May 2010 Roland Kruse

The code is suprisingly fast for multiplication of a sparse matrix by a skalar. I really wonder why Matlab is so slow in this (very simple) case.

26 Apr 2010 Val Schmidt

James,
This is code is proving to be a FANTASTIC speed up to my work. I am so grateful that I thought I would post and comment.

First, I am working on R2008b on an intel mac. I compiled your code with

mex -DEFINEUNIX mtimesx.c

and although I received a handful of warnings about redefined pointers, it compiled fine without error. I also tried compiling the code with a few extra optimizations, out of curiosity:

mex -DEFINEUNIX CFLAGS = '$CFLAGS -O3 -funroll-loops -fexpensive-optimizations -ffast-math' mtimesx.c

The result wasn't significantly faster than the original on my own processing test (I did not take the time to run it through your tests, however.).

I did run mtimesx_test_ddspeed initially to compare with MATLAB native routines, and while I didn't capture the results, I found that most things native MATLAB could do as fast or faster than mtimesx, with the notable exception of math involving complex conjugates, in which mtimesx was far, far faster.

However, the fantastic benefit for me was the native handling of 4D matrices by mtimesx. In my own code in which I am rotating several (8.3) million 3D vectors through a 3x3 rotation matrices, mtimesx increased the speed of my calculations 2100%!

Thanks so much for your time and effort. It is most gratefully appreciated.

-Val

23 Apr 2010 James Tursa

@ veltz: Thanks for your comments. Regarding openMP, I think the short answer is no. I will give my reasons shortly, but please be advised I don't do any openMP stuff myself so my comments might not be correct.

The basic loop for nd stuff is this loop:

for( ip=0; ip<p; ip++ ) {

However, this is just the loop for the blocks in the C result. The blocks from the A and B arrays are calculated inside the loop based on the singleton expansion code (which depends on the dimensions of A and B) and are not known ahead of time ... at the end of one iteration the calculation for the A and B blocks for the next iteration takes place (the C block pointer just advances by the same amount each time). So it is not set up for a parallel loop. I would have to change the code and create three new arrays of pointers so that ahead of time all of the individual block starting points are known. *Then* maybe it could be put in a parallel loop.

But I don't think this would do much good. The multi-threaded stuff in MATLAB seems to happen at the BLAS/LAPACK level, so mtimesx is already getting the benefit of multi-threading for those calls without doing anything special in the mtimesx code itself to get it. For example, if you compare a simple matrix multiply timing at the MATLAB prompt using two large matrices, you can see about a 1.6x improvement with 2 threads vs 1 thread. If I code up a very simple mex routine that calls dgemm and does nothing else (no pragma omp etc) I see the exact same timing benefit when switching the number of threads used. So the c-mex code gets the multi-threaded benefit by virtue of the BLAS/LAPACK routines used. Wrapping another layer of threading on top of that will likely not result in any timing benefit, and may (speaking from ignorance here) in fact conflict with the multi-threading that is already going on inside the BLAS/LAPACK routines themselves.

21 Apr 2010 veltz

First, thank you for this marvellous code :)
Is there a way to use openMP in your code ? For example, when doing
C = zeros(2,6,4,5);
for m=1:4
for n=1:5
C(:,:,m,n) = A(:,:,m,n) * B(:,:,m,n);
end
end
we could call un pragma omp parallel for. As the code is pretty big, where does such loop appear in your code ?

01 Apr 2010 Jon C

Debian 64-bit compiled right with:
mex CFLAGS="\$CFLAGS -std=c99" -DDEFINEUNIX -largeArrayDims -lmwblas mtimesx.c
(the functional syntax of mex did not parse the arguments properly)

The GCC -ansi switch included by the default mexopts apparently refers to C89 (which does not recognize // -- C++ style comments). I wonder what is wrong with C99 as a default in mexopts?

01 Apr 2010 Jon C  
22 Mar 2010 Joshua Dillon

Very useful code. Good work.

The following snippet may be useful to those compiling on Ubuntu:
libblas='/usr/lib/libblas.so';
if exist(libblas,'file')~=2
system('sudo aptitude install libblas-dev');
end

% This will compile the code on linux
mex('CFLAGS=-std=c99 -fPIC','-DDEFINEUNIX','-largeArrayDims','-lmwblas','mtimesx.c');

19 Mar 2010 James Tursa

@Fabio: Thanks very much for the 64-bit info. I was unaware that gcc did not like the // commenting style, which I prefer over the /* */. However, I will create additional files with changed comments and upload them in the next week or so. I would be very interested in seeing the details of what you had to change to get it to compile and the results of any benchmark tests you happen to run. Feel free to e-mail me at the address listed in the pdf file. Thanks!

19 Mar 2010 Fabio Veronese

Great job!
Thanks to mtimesx I changed a huge for loop to 3D matrix computation, reducing script execution time from 14 s to 1.5 s!

Tested and compiled on Gentoo Linux (64bit) 2.6.33 on amd64 system. Compiling with gcc 4.3.* (not the deprecated 4.2 as suggested by Mathworks) I had to delete all quotes from C code files. Probably "//" sequence is not friendly to newer gcc compiler, maybe you can replace it with " /*quote*/ " wich is correctly ignored. Also to successfully compile I had to use:

mex('-DDEFINEUNIX','-largeArrayDims', '-lmwblas','mtimesx.c')

wich is something different from default. Both '-lmwlapack', '-lmwblas' are the options suggested for compiling in UNIX systems (see "Building MEX-Files" in doc), but that was enough. I can say it works correctly, but I didn't run the benchmarks files.

Hope this helps!
Feel free to ask if some testing on LINUX 64bit is needed. I'm going to run tests soon.

24 Feb 2010 Eric Larrieux

Thanks for the quick fix!

23 Feb 2010 James Tursa

Confirmed bug. The bug happens during the following conditions in MATLAB mode:

(row vector) * (non-square matrix)

(non-square matrix transposed) * (column vector)

Sorry about that! This was not caught in testing because I only used square matrices for these particular tests. The bug occurs because the BLAS routines DGEMV and SGEMV do not switch the dimension arguments when a transpose is involved. I had erroneously assumed that they did because that is how the BLAS routines DGEMM and SGEMM arguments work for general matrix multiplies. I have no idea why the interface for these routines is not set up the same. In any event, I will upload a fix tonight. In the meantime, you can manually put the fix in as follows:

Edit the mtimesx_RealTimesReal.c file.

Global replace the following text:

xGEMV(PTRANSA, M, K

with this text:

xGEMV(PTRANSA, &m1, &n1

And then also global replace the following text:

xGEMV(PTRANSB, N, K

with this:

xGEMV(PTRANSB, &m2, &n2

Those replacements will force the DGEMV and SGEMV calls to use the original dimensions of the first matrix argument instead of the switched dimensions.

23 Feb 2010 Eric Larrieux

Thanks! Also, FYI: I am running R2009a...

23 Feb 2010 James Tursa

I will look into it.

23 Feb 2010 Eric Larrieux

Hi I running matlab on a 32-bit windows machine and I am getting incorrect (all 0) results when i perform some simple 3D matrix multiplication in "matlab' mode only ('speed' mode seeps yield the correct results in all the cases i have tested so far). i.e.

>> % define a and b
>> a=2*ones(1,2,2)
a(:,:,1) =
2 2
a(:,:,2) =
2 2
>> b=3*ones(2,3,2)
b(:,:,1) =
3 3 3
3 3 3
b(:,:,2) =
3 3 3
3 3 3
>>
>> % verify result using normal matrix multiply
>> a(:,:,1)*b(:,:,1)
ans =
12 12 12
>> mtimes(a(:,:,1),b(:,:,1))
ans =
12 12 12
>>
>> % test results using mtimesx
>> mtimesx(a(:,:,1),b(:,:,1),'matlab')
ans =
0 0 0
>> mtimesx(a(:,:,1),b(:,:,1),'speed')
ans =
12 12 12

I just downloaded and installed the package yesterday, so I believe the outer-product bug discussed about should be fixed. Does anyone have any idea what is causing this error? Thanks!

-eric

16 Feb 2010 James Tursa

@newpolaris: Thanks for the input! It's hard for me to debug the 64-bit stuff since I don't have a system to test with, so comments like yours are very much appreciated. I have uploaded a new set of files with the largearraydims typo in the build routine fixed.

12 Feb 2010 newpolaris ?

grate! but mis-typing exist on mtimesx_build.m for 64bit Windows

on line 247 -> largearraydims = '-largearraydims';
it cause error like,
Warning: MEX could not find the library "argearraydims"
specified with -l option on the path specified
with the -L option.
... mex mtimesx.c build completed ... you may now use mtimesx.

so it muse be replace by "largearraydims = '-largeArrayDims';"

12 Feb 2010 newpolaris ?  
12 Feb 2010 newpolaris ?

in my case, result is all zero matrix :)

11 Jan 2010 Tobias Wood

Cannot thank you enough for this. Got a huge speed improvement (10 seconds down to 0.6) for a script using this. I'm staggered it's not in the core of Matlab.

06 Dec 2009 Bruno Luong

It must have some gut to tackle what Matlab pretends to do the best: matrix computation.

The package is nicely presented. This is no doubt a professional quality code using the junk of state-of-art BLAS, an accompaniment PDF document to explain the “why” and the “how” in great detail.

As with any well written MEX-based package, there is an installation file to make conveniently the compilation before the package can be used. Just type the name and the job is done. The main path (no subfolders) of the package is left to user to add to Matlab preference.

Obviously there are 8 separate files that can be used to benchmark with Matlab. Their names tell something that not obvious for new comers, but for those who are familiar with Matlab, there is no doubt: It tests all 4 combination of double/single and one for equality test, another for speed test. I choose the “dd” one (double/double) likely the most used combination, and run the “equal” test. It crashes. Too bad, then despite of the crash I took my chance to run the speed test. Well, a bunch of tests are tic/toc against Matlab, with clear outputs indicating if it’s slower or faster than Matlab and how much in percent the runtimes are comparable. Something strange happened: in some cases, the MTIMESX show few hundred times faster than Matlab. I told: “It cannot be true; why Matlab performs so poorly?”. I decided to go step-by-step the speed-test function, and discovered there may be another bug. I contacted the author, and bingo, I get his explanation and solution few hours later. The bug (just one line missing) is due to last minute change before submission. Followed James’s instruction, I modified the code, rebuilt the Mex file, and run the test code. The results look much more reasonable. And I rerun the “equal” test, there is no longer crash.

One more word on my setup before reading the next: Windows Vista 32-bit, 2009B, and an old loyal VS C++ 6.0. It is important because MTIMESX performance will depend greatly on the setup, and this is clearly stated by the author.

The main function MTIMESX can operate on two modes: “MATLAB” (default) and “SPEED”. The first one tries to be as much as compatible with Matlab, and the second one is all about – as the name suggests - speed oriented. In all my first two tests I run “MATLAB” modes, the last one will compare two modes.

PERTINENT:

I. 2D Arrays
Speed tests run with all possible combination of cases: complex/real; normal, transposed, conjugate, conjugate-transposed; then combinations of matrix/vector/scalar; all that apply for first or second matrix arguments. Each argument might have then 2*4*3=24 states, not counting a vector can be column or row. Combine both, the test run probably no less than 24^2=576 cases, and for 2D arrays only (ND arrays supported, we’ll be back on that).
Most of the time, MTIMESX (in Matlab mode) seems to perform more or less a comparable speed with Matlab, except for few cases where speed advantage is clearly significant. Here is one typical of such case:

>> n=2000;
>> A=rand(n)+1i*rand(n);
>> B=rand(1,n)+1i*rand(1,n);
>> tic; C=conj(A)*B.'; toc
Elapsed time is 0.094847 seconds.

>> tic; C=mtimesx(A,'G',B,'T'); toc
Elapsed time is 0.034242 seconds.

That about 3 times faster than Matlab. Taking a closer look, it seems Matlab is penalized by carrying out explicitly the transpose and conjugate before the product as show in the next

>> A1=conj(A);
>> B1=B.';
>> tic; C=A1*B1; toc
Elapsed time is 0.030122 seconds.

whereas MTIMESX performs those operations *on-the-fly* during the product. Very good. But I would argue that the speed gain is less in practice. In fact if the product is to be carried out once, careful developers usually manage to build conj(A) and B.’ and not A and B. In this case all the speed gain of MTIMESX is irrelevant. If A and conj(A) are used to perform product once each, then there is speed gain is about 1.5 (3/2). If they are used many times, then the developer might perform CONJ once and store it somewhere before product function MTIMES is invoked, and the speed gain get even closer to 1. Still there is clearly an advantage in MTIMEX, for speed and it avoid storing extra matrix conj(A).

My impression is this is the only (and significant) factor that contributes to the speed gain: performing CONJUGATE on the fly. Recent Matlab (2008A, i.e., v.7.6) benefits smart TRANSPOSE on the fly, according to Tim-Davies.

II. ND-arrays
The nicest feature of MTIMESX is the capability to perform multiple matrix products ranged in a ND-array. Previously there is such function available on Matlab File Exchange coded by Peter Boettcher also using BLAS, but it seems this package has now a minimum support, and users has trouble to compile it on 64-platform. So this new feature is very welcome.
When a Matlab developer wants to make product of multiple matrix of the same size, up to date he has 4 alternatives
1. Use for-loop
2. use a product based on sparse diagonal block (for example function SliceMultiProd available in http://www.mathworks.com/matlabcentral/fileexchange/24260-multiple-same-size-linear-solver
3. Use the package MULTIPROD where the element-wise product and sum are carried out, http://www.mathworks.com/matlabcentral/fileexchange/8773-multiple-matrix-multiplications-with-array-expansion-enabled
4. Use BLAS based, such as Peter Boettcher’s ndfun, and now James Tursa’s MTIMESX

Here is a code I use to benchmark the four approaches. Note that in small matrix size (of less than said 5) are often encountered in practice.

function mprodbench(m,n)
% Bench mark four approaches of product of n (m x m) matrices

A=rand(m,m,n);
B=rand(m,m,n);

C=zeros(size(A,1),size(B,2),size(B,3));
tic
for k=1:size(B,3)
C(:,:,k) = A(:,:,k)*B(:,:,k);
end
t1=toc;

tic
C=SliceMultiProd(A,B);
t2=toc;

tic
C=multiprod(A,B);
t3=toc;

tic
C=mtimesx(A,B);
t4=toc;

fprintf('Matrix size=%d, number matrices=%d\n', m, n);
fprintf('\tfor-loop = %f, rel speed gain = (%f)\n', t1, t1/t1);
fprintf('\tSliceMultiProd = %f, rel speed gain = (%f)\n', t2, t1/t2);
fprintf('\tmultiprod = %f, rel speed gain = (%f)\n', t3, t1/t3);
fprintf('\tmtimesx

end

And here is the results:

> mprodbench(3,1e4)
Matrix size=3, number matrices=10000
for-loop = 0.055119, rel speed gain = (1.000000)
SliceMultiProd = 0.014130, rel speed gain = (3.900815)
multiprod = 0.007815, rel speed gain = (7.052806)
mtimesx = 0.006185, rel speed gain = (8.911272)

>> mprodbench(10,1e4)
Matrix size=10, number matrices=10000
for-loop = 0.074665, rel speed gain = (1.000000)
SliceMultiProd = 0.146336, rel speed gain = (0.510228)
multiprod = 0.151051, rel speed gain = (0.494303)
mtimesx = 0.022569, rel speed gain = (3.308258)

That shows MTIMESX is the best and consistent approach of multiple same-size matrix product. It supports inputs with any multiple-trailing dimensions.

III. SPEED VS MATLAB MODES

From my test, speed mode is faster about 30-50% typically for tensorial product (column vector multiplied row vector). Beside that case, the speed gain is probably not spectacular.

CONCLUSION:
- For whom? Developers who are interested in optimizing speed and want to take advantage the last ounce of performance from their computers. I have not tested in older version of Matlab but MTIMES, but it will be more surely pertinent as Matlab makes constant progress such as smart matrix transposition. Those who run on latest Matlab (from 2008A) has limited benefits besides saving extra conjugate and transpose operations of their matrices. The syntax of MTIMESX is simple, though quite not lean as Matlab operator is the price to pay.
- MTIMESX is clearly a best approach for multiple-product of matrices. Go for it!

Fully justified, until Mathlab catching up.

Bruno

05 Dec 2009 James Tursa

As a workaround prior to Monday you can make this change to the mtimesx_RealTimesReal.c file near line 222:

scalarmultiply = 0; // <-- add this line
if( m1 == 1 && n1 == 1 ) {
scalarmultiply = 1;

05 Dec 2009 James Tursa

There is a bug in the outer product code. Will be fixed as soon as TMW posts updated version. Probably Monday Dec 7.

Updates
04 Dec 2009

Fixed bug for (scalar) * (sparse) when the scalar contains an inf or NaN (in which case the zeros do not stay zero). Slight update to pdf doc.

07 Dec 2009

Fixed bug in scalar multiply code that was causing incorrect results & crashes.

10 Dec 2009

Added singleton expansion capability for multi-dimensional matrix multiplies.

11 Dec 2009

Fixed a bug for empty transa or transb inputs. Now treats these the same as 'N'. Also simplified the nD singleton expansion code a bit.

07 Jan 2010

Added the multi-dimensional test routine mtimesx_test_nd.m for speed and equality tests. Added its description to the pdf file.

16 Feb 2010

Fixed a typo in the build routine for 64-bit systems, changed -largearraydims to -largeArrayDims

23 Feb 2010

Fixed a bug for some of the (row vector) * (matrix) and (matrix transposed) * (column vector) operations in MATLAB mode that involved incorrect dimensions in the dgemv and sgemv calls.

10 Aug 2010

Added capability for nD scalar multiplies (i.e., 1x1xN * MxKxN).
Replaced the buggy mxRealloc API routine with custom code.
Updated mtimesx_test_nd.m file.

07 Oct 2010

Added OpenMP support for custom code
Expanded sparse * single and sparse * nD support
Fixed (nD complex scalar)C * (nD array) bug

23 Feb 2011

Fixed typos in the dsyrk, dsyr2k, ssyrk, and ssyr2k BLAS function prototypes. (These typo fixes should not change any results)

Contact us