Code covered by the BSD License

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

Highlights from MTIMESX - Fast Matrix Multiply with Multi-Dimensional Support

4.83784
4.8 | 41 ratings Rate this file 144 Downloads (last 30 days) File Size: 256 KB File ID: #25977 Version: 1.10

MTIMESX - Fast Matrix Multiply with Multi-Dimensional Support

James Tursa (view profile)

30 Nov 2009 (Updated )

Beats MATLAB 300% - 400% in some cases ... really!

File Information
Description

MTIMESX is a fast general purpose matrix and scalar multiply routine that has the following features:

- Supports multi-dimensional (nD, n>2) arrays directly
- Supports Transpose, Conjugate Transpose, and Conjugate pre-operations
- Supports singleton expansion
- Utilizes BLAS calls, custom C loop code, or OpenMP multi-threaded C loop code
- Can match MATLAB results exactly or approximately as desired
- Can meet or beat MATLAB for speed in most cases

MTIMESX has six basic operating modes:

- BLAS: Always uses BLAS library calls
- LOOPS: Always uses C loops if available
- LOOPSOMP: Always uses OpenMP multi-threaded C loops if available
- MATLAB: Fastest BLAS or LOOPS method that matches MATLAB exactly (default)
- SPEED: Fastest BLAS or LOOPS method even if it doesn't match MATLAB exactly
- SPEEDOMP: Fastest BLAS, LOOPS, or LOOPOMP method even if it doesn't match MATLAB exactly

MTIMESX inputs can be:

single
double
double sparse

The general syntax is (arguments in brackets [ ] are optional):

mtimesx( [directive] )
mtimesx( A [,transa] ,B [,transb] [,directive] )

Where transa, transb, and directive are the optional inputs:

transa = A character indicating a pre-operation on A:
transb = A character indicating a pre-operation on B:
The pre-operation can be any of:
'N' or 'n' = No pre-operation (the default if trans_ is missing)
'T' or 't' = Transpose
'C' or 'c' = Conjugate Transpose
'G' or 'g' = Conjugate (no transpose)
directive = One of the modes listed above, or other directives

Examples:

C = mtimesx(A,B) % performs the calculation C = A * B
C = mtimesx(A,'T',B) % performs the calculation C = A.' * B
C = mtimesx(A,B,'g') % performs the calculation C = A * conj(B)
C = mtimesx(A,'c',B,'C') % performs the calculation C = A' * B'
mtimesx('SPEEDOMP','OMP_SET_NUM_THREADS(4)') % sets SPEEDOMP mode with number of threads = 4

For nD cases, the first two dimensions specify the matrix multiply involved. The remaining dimensions are duplicated and specify the number of individual matrix multiplies to perform for the result. i.e., MTIMESX treats these cases as arrays of 2D matrices and performs the operation on the associated parings. For example:

If A is (2,3,4,5) and B is (3,6,4,5), then

mtimesx(A,B) would result in C(2,6,4,5), where C(:,:,i,j) = A(:,:,i,j) * B(:,:,i,j), i=1:4, j=1:5

which would be equivalent to the MATLAB m-code:

C = zeros(2,6,4,5);
for m=1:4
for n=1:5
C(:,:,m,n) = A(:,:,m,n) * B(:,:,m,n);
end
end

The first two dimensions must conform using the standard matrix multiply rules taking the transa and transb pre-operations into account, and dimensions 3:end must match exactly or be singleton (equal to 1). If a dimension is singleton then it is virtually expanded to the required size (i.e., equivalent to a repmat operation to get it to a conforming size but without the actual data copy). This is equivalent to a bsxfun capability for matrix multiplication.

Acknowledgements
MATLAB release MATLAB 7.4 (R2007a)
Other requirements A C compiler, such as the built-in lcc compiler. (You don't have to know anything about C to use mtimesx ... it is self-building)
Tags for This File   Please login to tag files.
Comments and Ratings (155)
19 Jun 2016 armita mani

armita mani (view profile)

I am trying to use mtimesx on matlab 2016a, windows 10 64-bit, but I get the error " A C/C++ compiler has not been selected with mex -setup error" and understand I should change mexopts = [prefdir '\mexopts.bat']; in mtimesx_build.m to mexopts = [prefdir '\mex_C++_win64.xml']; but in path C:\Users\..\AppData\Roaming\MathWorks\MATLAB\R2016a this xml dosent exit, how could I complime this library?
Thank you

Comment only
14 Jun 2016 Simon Leglaive

Simon Leglaive (view profile)

26 May 2016 Andreas

Andreas (view profile)

@Vladislav @Iliya @James Tursa Did you mean to set the flag in the .xml file like this?

set COMPFLAGS=-c -openmp \$CFLAGS \$DEFINES \$MATLABMEX

For me it still crashes when I try and it feels hopeless. I have tried everything to make it work.

Comment only
19 May 2016 Hanse

Hanse (view profile)

06 May 2016 David Belo

David Belo (view profile)

Fantastic work, James. Have you tried implementing matrix inversion on each slice as you alluded to in the past? I'm currently using multinv which works well but I wonder if there is a performance gain to be had using a similar approach as you've taken in mtimesx.

26 Apr 2016 Levent Karacan

Levent Karacan (view profile)

07 Apr 2016 Phlip

Phlip (view profile)

30 Mar 2016 Dev-iL

Dev-iL (view profile)

@Vladislav - See comment by Martin Lindfors from 26 May 2015. My solution is not something new and was suggested before.

Comment only
16 Mar 2016 Vladislav Yordanov

Vladislav Yordanov (view profile)

@Iliya

I have the same issue. I can use the MTIMESX with Matlab 2010 and it works out great. Many thanks to James Tursa.
However I have the same issue as Tyler with Matlab 2014, could you please elaborate in detail how to fix it. I read your comment but I have no idea how to point to the xml.

Regards

29 Feb 2016 Dev-iL

Dev-iL (view profile)

@Tyler - On newer MATLAB versions, you don't need the mexopts.bat file but rather mex_C_win64.xml (or an XML with a similar name that fits your system).

Change the mtimesx_build.m to point to an .xml file and it should be ok. I manually added the "/openmp" to the "set COMPFLAGS ..." line in the xml. After this, compilation completed correctly and running mtimesx didn't crash my MATLAB (2015a).

Good luck!

@JamesTursa - I humbly suggest you modified the build script to also be compatible with newer MATLAB versions.

Comment only
05 Jan 2016 Tyler Millhouse

Tyler Millhouse (view profile)

I downloaded this mexopts.bat file, and the compiler now works.

https://gist.github.com/dgleich/1579415

Unfortunately, when I run mtimesx, Matlab crashes.

Comment only
05 Jan 2016 Tyler Millhouse

Tyler Millhouse (view profile)

Hi James, thanks for putting this together. I keep getting this message: 'A C/C++ compiler has not been selected with mex -setup'. This is strange since I have run 'mex -setup' and 'mex --setup C++'. Mex selected the MinGW64 compiler. Do I need a different compiler? Thanks again!

Comment only
24 Dec 2015 Fritz

Fritz (view profile)

Is there any news on the cumprod implementation for MxMxK?

Thank you

23 Dec 2015 James Tursa

James Tursa (view profile)

@Michal: For generic 2D matrix multiply, there is no advertised speedup since mtimesx is calling exactly the same routine as MATLAB. Speedup would be if you were doing 2D page multiplies with nD matrices.

Comment only
23 Sep 2015 Michal Kvasnicka

Michal Kvasnicka (view profile)

For MATLAB R2015b:
+++++++++++++++++++++++

>> a = rand(5000);
>> b = rand(5000);
>> tic, c2 = mtimesx(a,b,'speedomp'); toc
Elapsed time is 1.858891 seconds.
>> tic, c1 = a*b; toc
Elapsed time is 1.956402 seconds.
>> isequal(c1,c2)

ans =

1

So, where is the announced speedup???

03 Jun 2015 Nick

Nick (view profile)

26 May 2015 Martin Lindfors

Martin Lindfors (view profile)

I fixed the issues by replacing line 166 in mtimesx_build.m from

mexopts = [prefdir '\mexopts.xml'];

to

mexopts = [prefdir '\mex_C++_win64.xml'];

Comment only
24 Apr 2015 Philip

Philip (view profile)

I am having the same issues as Octavian. I am using 64-bit R2015a on 64-bit Windows 7. I have setup the compiler Microsoft Visual C++ 2013. However, when I run mtimesx_build, it produces the error:

Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex -setup.

I believe this may be related to Matlab switching to using xml files for mex from bat files. Thus mtimesx_build cannot find mexopts.bat. Any help resolving this issue would be greatly appreciated.

Comment only
17 Apr 2015 Octavian

Octavian (view profile)

Dear James,
I am very interested in using mtimesx for ND array operations, but I am new to compiling. When testing mtimesx, I get the following error:

Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex -setup.

I have a Windows 7 64-bit laptop. Which C/C++ compiler you would recommend to install
Thank you,
Octavian

Comment only
16 Apr 2015 Jeppe

Jeppe (view profile)

@Heather,

I'm running MATLABR2014b on a Macbook with Yosemite OS X. I installed Xcode from the AppStore and defined the blas:

blas_lib='/Applications/MATLAB_R2014b.app/bin/maci64/libmwblas.dylib'
mex('-DDEFINEUNIX','-largeArrayDims','mtimesx.c',blas_lib)

After running this in the command window I could use mtimesx.m

Hope this helps

Jeppe

Comment only
20 Mar 2015 Heather

Heather (view profile)

Thanks James for your great work; it is a really useful function. I am having trouble compiling mtimesx.c on my new macbook though. I cannot see from the comments below that anyone has managed to solve this.

My BLAS library is located at /usr/local/lib/ and is called libblas.a
Following the instructions given when I tried to call the mtimesx function in my matlab code I defined

blas_lib = '/usr/local/lib/libblas.a’

and then tried to manually compile the mex routine by

mex('-DDEFINEUNIX','-largeArrayDims','mtimesx.c',blas_lib)

after which I see `Building with 'Xcode with Clang’ followed by several lines of output and ultimately:

Error using mex
Undefined symbols for architecture x86_64:
"__gfortran_st_write", referenced from:
_xerbla_ in libblas.a(xerbla.o)
"__gfortran_st_write_done", referenced from:
_xerbla_ in libblas.a(xerbla.o)
"__gfortran_stop_string", referenced from:
_xerbla_ in libblas.a(xerbla.o)
"__gfortran_string_len_trim", referenced from:
_xerbla_ in libblas.a(xerbla.o)
"__gfortran_transfer_character_write", referenced from:
_xerbla_ in libblas.a(xerbla.o)
"__gfortran_transfer_integer_write", referenced from:
_xerbla_ in libblas.a(xerbla.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Can anybody help me? Thank you and best wishes,
Heather

17 Feb 2015 Luca Viviani

14 Jan 2015 noa

noa (view profile)

Is mtimesx working with mac? I have ios 10.9 and could not compile, got an the error - “Undefined symbols for architecture x86_64:...“.
Thank you.

Comment only
07 Jan 2015 David H

David H (view profile)

13 Dec 2014 Royi Avital

Royi Avital (view profile)

Can anyone compare the performance to MATLAB R2014b?

Thank You.

Comment only
11 Dec 2014 Octavian

Octavian (view profile)

Is mtimesx working on linux? I have ubuntu 12.04, and it cannot compile. Also, does it work with gpuArrays?
Thanks,

Octavian

Comment only
06 Nov 2014 Matt J

Matt J (view profile)

Hi James,
I had a question about how the flags 't' and 'c' are used by sparse matrix arguments. I assume that the flags are there to circumvent explicit transposes and its associated overhead. Yet, it doesn't appear that sparse mtimesx operations make use of them. In the test below, I see basically the same execution speeds in all 3 versions. Is it true that sparse operations can't make use of the flags, and is it deliberate/inevitable? It would be a great benefit to be able to avoid transposing tall sparse matrices because of the high memory consumption that comes with that.

J=sprand(3192027,3225,.0001);

tic;
mtimesx(J,'t',J);
toc

tic;
mtimesx(J.',J);
toc

tic;
J.'*J;
toc

Comment only
04 Jul 2014 Julien

Julien (view profile)

From my understanding, the file "mex_C_win64.xml" replaces "mexopts.bat".
"Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex -setup". I replaced line 169 by : mexopts = [prefdir '\mex_C_win64.xml'];. It did work with a few warnings...

Comment only
18 Jun 2014 Matt J

Matt J (view profile)

Thanks, James. This thread describes where the information has moved to

I can post those xml files somewhere, if you like. However, is there no way to get mtimesx to compile simply with the "mex" command, maybe with a small sacrifice in performance?

Comment only
07 Jun 2014 James Tursa

James Tursa (view profile)

Hmmm ... Well, I don't have access to R2014 so I will have to code in the blind on this. My understanding is that MATLAB will in real time make a guess as to the correct compiler to use based on source code extension. So the information that was in mexopts.bat (user had already selected a compiler) is no longer available for my automatic build stuff. I will put out an update shortly, but may have to ask for info direct from the user for compiler stuff. I will try to get it out there sometime next week.

Comment only
03 Jun 2014 Matt J

Matt J (view profile)

Thought I'd ping again. It doesn't look like mtimesx_build is compatible with R2014, since it looks for mexopts.bat, which is now gone. Can we hope to get a version that supports R2014? James?

Comment only
29 May 2014 Matt J

Matt J (view profile)

I have the same problem as Safdar in R2014a. Even after running mex -setup, it appears as though mtimesx_build cannot find mexopts.bat. Possibly, the locations of relevant files/directories may have shifted?

Comment only
22 May 2014 Stefan

Stefan (view profile)

23 Apr 2014 Safdar

Safdar (view profile)

Sorry for the multiple posts. It seems something went wrong when submitting the post. Sorry

Comment only
23 Apr 2014 Safdar

Safdar (view profile)

I got the following message

... Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c
... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex -setup

Error in mtimesx (line 271)
mtimesx_build;

Comment only
23 Apr 2014 Safdar

Safdar (view profile)

Hi James,

I unzipped the file in my working folder and tries to use mtimesx, I got the following message

... Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c
... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex -setup

Error in mtimesx (line 271)
mtimesx_build;

Comment only
23 Apr 2014 Safdar

Safdar (view profile)

Hi James,

I unzipped the file in my working folder and tries to use mtimesx, I got the following message

... Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c
... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex -setup

Error in mtimesx (line 271)
mtimesx_build;

Comment only
23 Apr 2014 Safdar

Safdar (view profile)

Hi James,
I unzipped the folder in my working path and then tried to used mtimesx, I got the following error

... Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in D:\Documents\Projects\mtimesx_20110223\mtimesx.c
... Found file mtimesx_RealTimesReal.c in D:\Documents\Projects\mtimesx_20110223\mtimesx_RealTimesReal.c
Error using mtimesx_build (line 169)
A C/C++ compiler has not been selected with mex -setup

Error in mtimesx (line 271)
mtimesx_build;

Comment only
26 Mar 2014 Alireza

Alireza (view profile)

@Evan: have you tired blas_lib = 'lmwblas'?

Comment only
09 Mar 2014 Evan

Evan (view profile)

I would like to second Matthieu's question. For blas_lib, I've tried both

blas_lib = '/Applications/MATLAB_R2013b.app/bin/maci64/lapack.spec';

and

blas_lib = '/Applications/MATLAB_R2013b.app/bin/maci64/blas.spec';

With both, I get:

ld: warning: ignoring file /Applications/MATLAB_R2013b.app/bin/maci64/blas.spec, file was built for unsupported file format

Anyone have any ideas?

Comment only
05 Dec 2013 Marc Crapeau

Marc Crapeau (view profile)

Works well for me on Linux.

If it can helps, I have installed the library libblas.so in my home following the version "shared library" of this page: http://gcc.gnu.org/wiki/GfortranBuild

Then I have compiled the mex file with the command return by the mtimesx routine when the automatic installation failed:
blas_lib = 'the_actual_path_and_name_of_your_systems_BLAS_library';
mex('-DDEFINEUNIX','-largeArrayDims','mtimesx.c',blas_lib)

Gained a factor 2 in speed for my 3D matrices multiplications, and a clearer matlab code :)

22 Aug 2013 Hannes

Hannes (view profile)

Hi all,

I tried to locate compliers via mex -setup, but got the answer => No supported SDK or compiler was found on this computer.
I installed Microsoft Windows SDK 7.1 but still not able to work. Does somebody know the path and install directory or what the problem is?
thanks for help

Comment only
18 Jul 2013 WK

WK (view profile)

I don't know why, I did the test again in the linux cluster and the windows one, it works in the parallel computation now. So strange!

Comment only
18 Jul 2013 WK

WK (view profile)

James, by the way, what version of matlab last time you test this package? I place it in my labtop (windows 64) and linux cluster running matlab 2013, only few operations that mtimesx will beat matlab and it only beat is for less than 10%. I saw that this package was uploaded in 2011 so just wonder if matlab 2013 made lots of change to boost the performance.

Comment only
17 Jul 2013 James Tursa

James Tursa (view profile)

@WK: I don't have the parallel toolbox, so am unable to give much advice here. My understanding is that each worker in the parallel environment is a separate process, which I would expect would mean that MTIMESX could be used. But since I don't have the toolbox I can't verify this. Have you actually run some tests and gotten wrong results?

Comment only
17 Jul 2013 WK

WK (view profile)

I just get it compiled and installed in my 64-bit matlab with windows SDK7 as well as the unix cluster. I find that it doesn't work in spmd, doesn't it?

Comment only
15 Jul 2013 WK

WK (view profile)

Thanks for the great library. I have two questions. Do I have to compile it to get library? So it means the library is really depending on the machine, right? Also, our lab has a multi-node cluster, each node are equipped with 8 cores. I think this library is based on multi-threading, if I run the matlab code in spmd and in each core, I run your library so will it conflict? Sorry, I didn't have any experience in compiling mex before, I tried but I don't have MSVC installed, I only have mingw.

Comment only
25 Jun 2013 JiaDa

How about announce another function for fast matrix inverse with multi-dimensional support? I wrote a version but I think your version will far fast than mine. :P

Comment only
25 Jun 2013 JiaDa

04 May 2013 Matthieu

Matthieu (view profile)

Hello,

I'm trying to use mtimesx on mac OS 10.8 with matlab R2011b but I don't understand how to compile it.

I tryed several solutions described in comments but none worked.

blas_lib = '/Applications/MATLAB_R2011b.app/bin/maci64'

blas_lib =

/Applications/MATLAB_R2011b.app/bin/maci64

>> mex('-DDEFINEUNIX','mtimesx.c',blas_lib)
mtimesx.c: In function 'mexFunction':
mtimesx.c:592: warning: assignment discards qualifiers from pointer target type
ld: can't map file, errno=22 for architecture x86_64
collect2: ld returned 1 exit status

mex: link of ' "mtimesx.mexmaci64"' failed.

Error using mex (line 206)
Unable to complete successfully.

Can you help me ?

Comment only
15 Mar 2013 James Tursa

James Tursa (view profile)

@Agustin: It looks like you put the files in the lcc compiler directory. I would advise that you put the files in a work directory instead (not any of the "official" sys or bin etc directories that are installed with MATLAB) and try again. Make sure this work directory is on the MATLAB path. As a side note, in general it is *not* a good idea to put your own files into any of the "official" directories that are installed with MATLAB ... doing so risks breaking built-in MATLAB functions and capabilities.

Comment only
15 Mar 2013 agustin darrosa

agustin darrosa (view profile)

thanks james but i have this problem

Build routine for mtimesx
... Checking for PC
... Finding path of mtimesx C source code files
... Found file mtimesx.c in C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.c
... Found file mtimesx_RealTimesReal.c in C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx_RealTimesReal.c
... Opened the mexopts.bat file in C:\Users\Gusa\AppData\Roaming\MathWorks\MATLAB\R2008a\mexopts.bat
... Reading the mexopts.bat file to find the compiler and options used.
... LCC is the selected compiler
... OpenMP compiler not detected ... you may want to check this website:
http://openmp.org/wp/openmp-compilers/
... Using BLAS library lib_blas = 'C:\Program Files\MATLAB\R2008a\extern\lib\win32\lcc\libmwblas.lib'
... Now attempting to compile ...

mex('C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.c',lib_blas,'-DCOMPILER=LCC')

it can´t find C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.exp
it cant´find C:\Program Files\MATLAB\R2008a\sys\lcc\mtimesx\mtimesx.lib
... mex mtimesx.c build completed ... you may now use mtimesx.

then i introduce:

C = mtimesx(a,b)
??? Undefined function or method 'mtimesx' for input arguments of type 'double'.

what can i do?

Comment only
12 Mar 2013 James Tursa

James Tursa (view profile)

@Agustin: For Windows, generally you just put all the files in a directory on the MATLAB path and then type mtimesx at the prompt. For other systems, see other posts below.

Comment only
12 Mar 2013 agustin darrosa

agustin darrosa (view profile)

Hi

How can i introduce this "function" in matlab??

I tried it but i have errors

Sorry for my ignorance

Comment only
08 Mar 2013 James Tursa

James Tursa (view profile)

@Tonio: No, MTIMESX does not do this. But from your description it sounds like something like this might work for you:

a = cell array of matrices
b = cell array of vectors
c = cellfun(@mtimes,a,b,'UniformOutput',false)

Comment only
06 Mar 2013 Tonio

Tonio (view profile)

Can I used that for multiplying an (m,n,k) * (n,k) = (m,k) where each k 'slice' needs to be multiplied together and they may contain variables number of elements, i.e. 'm' can vary in each slice? In my code each slice is stored in a cell in matlab.

Comment only
04 Feb 2013 James Tursa

James Tursa (view profile)

@Robert: Yes, MTIMESX is CPU based, not GPU.

Comment only
04 Feb 2013 Robert

Robert (view profile)

Hello James,

nice work. A basically like to multiply n matrices B (400,400) with one Matrix A (400,400). I used arrayfun before, but your code is about 2.5 times faster. Soon I will get a Tesla NVidia graphic card and I would like to ask, is there any possibility to run the calculation on the GPU instead on the CPU?
I tried to call mtimesx with tww GPU Arrays but as my computation times is still fast (on my slow GPU) I guess he was not using my GPU right now.
Perhaps I also missed this, but to my understanding mtimesx is using the CPU only, or??

Thanks

Robert

23 Jan 2013 Charles

Charles (view profile)

@Michael Thanks, that did it.

Comment only
22 Jan 2013 Michael Völker

Michael Völker (view profile)

Charles,

for my own record and to make it machine-searchable, I wrote earlier how I was able to compile mtimesx. Taking the liberties to quote myself:

> On a 64Bit Debian based Linux I managed to compile it with
> mex -DDEFINEUNIX CFLAGS="\\$CFLAGS -march=native" -largeArrayDims -lmwblas -lmwlapack -lgomp mtimesx.c
> using gcc-4.5.

Can you try that, or did you already try?

Michael

Comment only
22 Jan 2013 Charles

Charles (view profile)

@Sebastian Thanks for the response. I think I wasn't linking to the right file. Now, I seem to be linking to the blas library, but I'm getting a new error. I am using the code "mex -DEFINEUNIX 'mtimesx.c' -lmwblas" but I get an error that reads "mtimesx.c:592: warning: assignment discards qualifiers from pointer target type ". I saw that someone else had gotten this error before and the code was fixed, so I downloaded the files again but I still am getting the error. When I tried to use mtimesx in my code I get an error and matlab crashes. I haven't had any problems with other mex files, so it seems to be specific to mtimesx.

Any help is appreciated. Thanks in advance.

Comment only
21 Jan 2013 Sebastiaan

Sebastiaan (view profile)

@Charles: under Unix, the BLAS library linked to is normally mwblas , not blas_lib.

Comment only
20 Jan 2013 Charles

Charles (view profile)

This program is great. Really useful. I am having great success with it on my PC. However, I cannot seem to create the mex file on a UNIX computer. I am defining the BLAS library as blas_lib. Then I enter the code "mex('-DDEFINEUNIX','mtimesx.c',blas_lib)". However, I get a string of errors that are all similar to "mtimesx.c;(.text+0x76a1)undefined reference to `saxpy_'". It says this for saxpy, sger, sdot, daxpy, dger, ddot, sgemv,ssyrk, ssyr2k, sgemm, dgemv, dsyrk and dgemm. Can anyone please help me with creating the .mexa64 file? Thanks.

30 Oct 2012 James Tursa

James Tursa (view profile)

@KSIDHU: Why can't you use .* and ./ directly? What are the exact dimensions of your two matrices? If the matrices need singleton expansion, then look at the function bsxfun. What is the overall operation you are trying to do?

Comment only
29 Oct 2012 KSIDHU

KSIDHU (view profile)

Hi
How can I perform elementwise multiplication and division of two matrices?
I have looked through the manual but cannot find any information on it.
Is there a way to check if BLAS multi-threading is available? Is this better than OpenMP multi-thread

I need to perform .*, ./ on matrices with numel ~3000x100,000 so any tips would be great, and also is there a multi threaded sum function for large matrices (along dim 1 or 2).

MATLAB 2010a
KS

Comment only
13 Sep 2012 James Tursa

James Tursa (view profile)

I like the reasoning you both have spelled out. I will make logical*other conform to the other, and make logical*logical output a double by default but have an option to do logical AND/OR operations and output a logical instead. Thanks for your inputs.

Comment only
13 Sep 2012 John

John (view profile)

I like Michael's reasoning and I think he is right about the 'double' result. So in general I think that the double should be the default. However, given that i enjoy being efficient with data, I would appreciate a 'logical' option.

Comment only
12 Sep 2012 Michael Völker

Michael Völker (view profile)

Damn, the E-Mail notification stopped to work, so I didn't know you replied to my message.

My spontaneous thought was that (logical matrix) * (logical matrix) should *of course* result in a double output, since matrix multiplication involves a sum, so we should expect to get results different than 0 or 1.

So, as you already guessed, I would like to have an auto-adjust-precision feature.
This is what I would expect:

[inputA] * [inputB] ==> [output]
------------------------------------------
logical | logical | double
logical | single | single
logical | double | double
single | logical | single
double | logical | double

(I hope it does not get formatted too ugly...)

I also like John's idea to make the behaviour more optional.

Comment only
11 Sep 2012 John

John (view profile)

Personally, I do not understand why it should be a 'double' result. So, my vote goes towards a logical result.

However, would it be possible to add this as an option? (e.g.: mtimesx(a,b,'logical') will produce a logical result, if a and b are logical). This way the advanced user can be efficient with data, whereas the less experienced user will still get results that they are familiar with....

Comment only
28 Aug 2012 James Tursa

James Tursa (view profile)

@John: OK, that's two votes. Automatic logical conversion goes into the next version.

Q: Should a (logical matrix) * (logical matrix) give a double matrix result (ala MATLAB), or a logical matrix result (i.e., essentially treating the * - operations as AND OR operations)?

Comment only
28 Aug 2012 John

John (view profile)

Excellent code!

I experienced the same 'error' as Michael Völker, so I think automatically converting the logical/sparse data to double might be a good idea as an enhancement!

Nevertheless, great work! Saved me a lot of time/coding.

16 Aug 2012 James Tursa

James Tursa (view profile)

@Michael Völker: The error message you are getting is not a bug. As stated in the doc, using arguments that are not double or single will cause MTIMESX to invoke the built-in mtimes function to do the matrix multiply. If MATLAB will do it then great ... you get that result. But if MATLAB will not do it then you will get whatever error message MATLAB puts out. This is the intended behavior. You can convert the logical to double first, of course, to avoid the error message. If you would like MTIMESX to do this automatically for logical inputs then I can take that as an enhancement request (seems reasonable to me).

Comment only
16 Aug 2012 Michael Völker

Michael Völker (view profile)

James,

I found a bug with logical input.
Try:

foo = mtimesx( true(2,2), randn(1,1,5) );

And you probably get this error message:

Error using *
Inputs must be 2-D, or at least one input must be scalar.
To compute elementwise TIMES, use TIMES (.*) instead.

Error in mtimesx_sparse (line 47)
result = a * b;

Michael

Comment only
28 Jul 2012 Sam T

Sam T (view profile)

James:

I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!

Lately, I have been trying to use MTIMESX code to multiply two big n-dimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:

A = 3 x 3 x 3
B = 3 x 2

and C = A*B;
such that dimension of C = 3 x 3 x 2

where C(i, j, l) = A(i, j, k) * B(k, l);
i.e.
C(:, :, 1) = A(:, :, k) * B(k, 1);
C(:, :, 2) = A(:, :, k) * B(k, 2);

Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).

This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.

I am wondering if there is a way around it? I look forward to hearing from you.

Thanks for your help.

Comment only
25 Jul 2012 James Tursa

James Tursa (view profile)

@Sam T: I don't think I get it yet. In your example you have:

C(:, :, 1) = A(:, :, k) * B(k, 1);

A(:,:,k) is 3x3
B(k,1) is 1x1

Are you trying to do an nD scalar*matrix multiply here? MTIMESX can do that (with some modifications of the inputs), but I am not quite sure that is really what you want. Can you clarify? Maybe write out an explicit for loop showing how C is to be filled in its entirety?

Comment only
24 Jul 2012 Sam T

Sam T (view profile)

James:

I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!

Lately, I have been trying to use MTIMESX code to multiply two big n-dimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:

A = 3 x 3 x 3
B = 3 x 2

and C = A*B;
such that dimension of C = 3 x 3 x 2

where C(i, j, l) = A(i, j, k) * B(k, l);
i.e.
C(:, :, 1) = A(:, :, k) * B(k, 1);
C(:, :, 2) = A(:, :, k) * B(k, 2);

Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).

This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.

I am wondering if there is a way around it? I look forward to hearing from you.

Thanks for your help.

Comment only
24 Jul 2012 Sam T

Sam T (view profile)

James:

I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!

Lately, I have been trying to use MTIMESX code to multiply two big n-dimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:

A = 3 x 3 x 3
B = 3 x 2

and C = A*B;
such that dimension of C = 3 x 3 x 2

where C(i, j, l) = A(i, j, k) * B(k, l);
i.e.
C(:, :, 1) = A(:, :, k) * B(k, 1);
C(:, :, 2) = A(:, :, k) * B(k, 2);

Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).

This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.

I am wondering if there is a way around it? I look forward to hearing from you.

Thanks for your help.

24 Jul 2012 Sam T

Sam T (view profile)

James:

I would like to thank you for coming up with such a brilliant piece of work. An excellent and well written code!!

Lately, I have been trying to use MTIMESX code to multiply two big n-dimensional matrices (A = 400x400x400 and B = 400x200). But for simplistic purposes, consider the following example:

A = 3 x 3 x 3
B = 3 x 2

and C = A*B;
such that dimension of C = 3 x 3 x 2

where C(i, j, l) = A(i, j, k) * B(k, l);
i.e.
C(:, :, 1) = A(:, :, k) * B(k, 1);
C(:, :, 2) = A(:, :, k) * B(k, 2);

Now, if I use mtimesx(A, B), then resultant matrix is (3 x 2 x 3) as compared (3 x 3 x 2).

This is because first two dimensions specify the matrix multiply involved. In this case it becomes 1st and 2nd dimension of matrix A while it should ideally be 2nd and 3rd dimension of matrix A to get the desired result.

I am wondering if there is a way around it? I look forward to hearing from you.

Thanks for your help.

02 Jul 2012 Boaz Schwartz

Boaz Schwartz (view profile)

01 Jun 2012 Clay Fulcher

Clay Fulcher (view profile)

mtimesx is an awesome code. Besides speeding up my structural dynamics and signal processing applications, it is much easier to apply than using loops. I'm also using a multi-D matrix inverter called multinv that can be downloaded from this forum. The two codes together allow me to operate on matrices of functions as easily as matrices of single values.

30 May 2012 Clay Fulcher

Clay Fulcher (view profile)

@ James Tursa: Thanks James! I greatly appreciate it!

Comment only
14 May 2012 James Tursa

James Tursa (view profile)

@ Clay Fulcher: Some options to possibly get more speed:

1) NDFUN by Peter Boettcher. E.g., see this thread:

2) FEX submissions. E.g., see this submission:

http://www.mathworks.com/matlabcentral/fileexchange/31222

3) Wait a couple of months for the next MTIMESX submission, when this capability will be included (along with in-place operations and nD matrix multiply versions of PROD and CUMPROD).

Comment only
13 May 2012 Clay Fulcher

Clay Fulcher (view profile)

James, I need to invert a 3D array of function values. I have to step through each slice of the array in a loop, inverting each slice. Do you know of any way to do this more efficiently?

Comment only
08 Feb 2012 Jonathan Sullivan

Jonathan Sullivan (view profile)

@ James
The array sizes I'm working with are much larger what I posted. The size of A is 100x33x3. The size of B is 33x1x3x5000x5. That might be why you had trouble recreating it. I'm running on Windows 7, 64 bit, MATLAB 2011b, compiler is cl. If it makes a difference, some of the values in the first matrix are -inf.

Comment only
08 Feb 2012 James Tursa

James Tursa (view profile)

@ Jonathan: Yes, weird indeed. For slices 100x100 in size, MTIMESX will always use BLAS dgemm calls regardless of the method setting (you can verify this by using the 'DEBUG' setting). That is, the exact same code path gets executed ragardless of whether 'MATLAB', 'SPEED', or 'SPEEDOMP' is set. That setting should have had no effect on this particular outcome. Also, I have been unable to reproduce this error on a 32-bit WinXP system. Can you provide me some specifics on your MATLAB version, machine OS, compiler used, etc? Does it crash if you restart MATLAB and immediatly try the example?

Comment only
08 Feb 2012 Jonathan Sullivan

Jonathan Sullivan (view profile)

@ James
So the problem, at least in part, is caused by using the 'SPEEDOMP' flag. I removed the flag, and now it works. Weird, huh?

Comment only
08 Feb 2012 James Tursa

James Tursa (view profile)

@ Jonathan Sullivan: Your first example without repmat should have worked. Looks like I introduced a bug in the latest release. I will look into this right away ...

Comment only
08 Feb 2012 Jonathan Sullivan

Jonathan Sullivan (view profile)

James,

Fantastic code. It has come in handy. It is well documented, incredibly fast, and extremely useful, especially for N-D arrays.

For the N-D case, this code would be even more powerful if the singleton dimension capability were expanded. Currently, all the dimensions from 3:end must be either the same size (A to B), or must all be singleton for one of the variables.

For example
A = rand(100,100,3);
B = rand(100,100,3,5);
C = mtimesx(A,B); % Does not work. Crashes MATLAB.

You can get around this by using repmat:
A = rand(100,100,3);
B = rand(100,100,3,5);
C = mtimesx(repmat(A,[1 1 1 5]),B); % Works, but slow

Unfortunately, explicitly expanding out large arrays has overhead which is more than I'd like.

Overall code. A+.

30 Oct 2011 Michael Völker

Michael Völker (view profile)

James, thank you for your hint to using 'lapack' from the FEX.
Indeed it produces the desired result, by calling

lapack('D=DSDOT(i,s,i,s,i)', length(A), A, 1, A, 1)

with 'A' from my example above. Unfortunately, this is ~5 times slower than calling mtimesx() with double input.

On double input, calling the equivalent 'DDOT' routine from the lapack-package takes more than 10 times longer than your mtimesx (wow, by the way...) to compute the very same result.
The difference appears to be even worse for complex input.

So I am still hoping that you might somehow get this feature in your code and maintain the speed...

Comment only
19 Oct 2011 James Tursa

James Tursa (view profile)

@ Michael Völker: FYI, another way to use the DSDOT routine is to use this submission:

http://www.mathworks.com/matlabcentral/fileexchange/16777-lapack

Not sure if it works on 64-bit systems, but you might give it a try.

Comment only
19 Oct 2011 James Tursa

James Tursa (view profile)

@ Michael Völker: Thanks for the comments. I will look into your request. There is a BLAS routine called DSDOT that does what you request. I will look into how easily I can incorporate this into MTIMESX for BLAS specific results. For the SPEED modes MTIMESX sometimes uses custom code to calculate the dot product (e.g., see function RealKindDotProduct), and in fact the accumulation is done in double precision regardless of input type (calculations themselves are single for single inputs), so it might not be too difficult to adapt this to return the double precision output if requested instead of always converting back to single for single inputs. Thanks for the suggestion.

Comment only
19 Oct 2011 Michael Völker

Michael Völker (view profile)

Incredibly great work. Thank you for the elaborate code, the very useful features, the intuitive syntax and behaviour and the docu.

On a 64Bit Debian based Linux I managed to compile it with
mex -DDEFINEUNIX CFLAGS="\\$CFLAGS -march=native" -largeArrayDims -lmwblas -lmwlapack -lgomp mtimesx.c
using gcc-4.5.

After using mtimesx for a while I currently miss only one feature, which is to define the precision of the output independently from the input variables.
This would be useful when calculating the sum of squares of a huge single-precision vector where the correct result exceeds realmax('single') and prior casting takes a lot of time.

Example:
>> A = randn( 1e8, 1,'single'); % plenty of sane values
>> A(1) = 1e20; % only few outliers
>> sos = A' * A % or sos = mtimesx( A, 'C', A )

sos =

Inf

Doing A = double(A) beforehand solves the problem:
>> sos = A' * A

sos =

1.000000040081755e+40

but
>> tic, A = double(A); toc
Elapsed time is 0.472007 seconds.

In fact, that casting to double takes much more time than the actual computation in mtimesx. Since I do this repeatedly in an iteration, the overhead sums up to minutes.

So my wish would be like this:
C = mtimesx( A, B, 'double' ); % C is double even if A or B are single

Nevertheless, thank you so much for this code.

21 Jun 2011 James Tursa

James Tursa (view profile)

@Gary: Follow-up, ndfun as I recall only works for real variables. If you have complex variables then the only efficient option I could suggest would be a custom mex routine.

Comment only
21 Jun 2011 James Tursa

James Tursa (view profile)

@Gary: For this type of matrix product you should be using something like ndfun('mprod',a) by Peter Boettcher. You can find the source code here:

http://www.mit.edu/~pwb/matlab/

If you need a 64-bit version of this you can see the hack I posted on this thread:

Using ndfun will avoid all the array slice copies you are doing in the above example code.

Comment only
21 Jun 2011 Gary

Gary (view profile)

Thanks for the answer, but I think you misunderstood my problem or me your answer.
I have eg 10000 matrices and want to calculate the product. By dividing into 2*5000 I can use the advantage of your nD multiply. Here is the code:

a=rand(n,n,L)

tic
b=a(:,:,1);
for i=2:L
b=a(:,:,i)*b;
end
toc

c=a;
tic
while(L~=1)
L2=floor(L/2)*2;
d=c(:,:,end);
c = mtimesx(c(:,:,2:2:L2),c(:,:,1:2:L2-1));
if(L2~=L)
c(:,:,end+1) = d;
end
L=size(c,3);
end
toc

So will this never be faster than normal multiplies?

Comment only
20 Jun 2011 James Tursa

James Tursa (view profile)

@Gary: MTIMESX is generally not going to be any faster at generic matrix multiplies than MATLAB since it is calling exactly the same BLAS routines as MATLAB. The exceptions are small (4x4 or less) multiplies, dot products, outer products, and some matrix-vector multiplies since they can sometimes be done faster with inline code which MTIMESX uses in the SPEED or LOOP modes. Also, the nD multiply case is almost always faster with MTIMESX than the equivalent loop(s) in MATLAB for any size since array slice copies are avoided.

Comment only
20 Jun 2011 Gary

Gary (view profile)

Hi,
I am trying to compute the product of many matrices (M1*M2*...). I used mtimesx to multiply half of all matrices by the other half until the result(e.g. a=M1*M2, b=M3*M4, a*b). It works very well when the matrix have small size like 4x4 but when for larger size it becomes slower than Matlab. Is that expected?

Comment only
30 May 2011 Tom

Tom (view profile)

Was a bit to fast, sorry. Someone already asked this! It works fine now!

Great job!

05 May 2011 dm

dm (view profile)

Great submission! Find it VERY useful for my work!

Also, I think you can change

"Beats MATLAB 300% - 400% in some cases ... really!"

in the title, and multiply the numbers with at least a factor of 25 (based on the results from mtimesx_test_nd).

14 Apr 2011 James Tursa

James Tursa (view profile)

@ Dan Scholnik: Have you tried running the mtimesx_test_nd file yet? That should invoke small OpenMP test cases. For larger test cases it may in fact be the case that later and/or 64-bit versions of MATLAB are hard to improve on for speed. Alas, I don't have a 64-bit system to tinker with.

At the moment I am working on incorporating in-place operations in MTIMESX and that will be in the next major release, but I will put your dense triangular suggestion on the list of potential future improvements.

Comment only
14 Apr 2011 Dan Scholnik

Dan Scholnik (view profile)

Compiled for Matlab 2011a on Fedora 14 x86_64 using

mex CFLAGS='\$CFLAGS -fopenmp' LDFLAGS='\$LDFLAGS -fopenmp' -largeArrayDims -O -DUNIX mtimesx.c -lmwblas

Not sure all of that was needed, but it works. Compiler was gcc 4.5.1.

mtimesx('OPENMP') returns 1 and mtimesx('OMP_GET_NUM_PROCS') returns 4 (I have a dual-core i7 w/ hyperthreadding), however mtimesx never reports using anything but the BLAS no matter which flags I pass in. Best I can tell the flags have no effect. The matlab-provided mwblas is multithreaded, so perhaps there's no improving on it. Speeds were generally not much different than the matlab multiply, except (as mentioned previously) for cases where the matlab transpose or conjugate operations themselves consumed significant portions of the total time.

I also tried using the fedora-provided BLAS via "-lblas" but it appears to be single-threaded and very slow. Here again mtimesx never used multiple processors that I could tell.

A possible feature suggestion, albeit one that deviates from the general-purpose nature of the current code: How about routines for dense triangular-matrix multiplication? Matlab doesn't seem to have any optimizations for this case (although it does for back-substitution, which is common with triangular matrices).

Comment only
05 Apr 2011 pink

pink (view profile)

wahyoe_slipnot@yahoo.co.id
wahyoe

Comment only
05 Apr 2011 pink

pink (view profile)

yes, it can work, thanks a lot james
I wanted to ask
1. if the form below, how do I use mtimesx
for i = 1: m
A (:,:, i) = B (:,:, i). '* Ex (:,:, i) * C (:,:, i);
end
2. If I compile into an application (exe) files that I have to enter into deploytool, because I saw a lot of files in it
thanks
Wahyoe

03 Apr 2011 James Tursa

James Tursa (view profile)

@ wahyoe Unggul: Did you get the build process to work?

Comment only
28 Mar 2011 James Tursa

James Tursa (view profile)

@ wahyoe Unggol: The error message is telling you that you need to select a C compiler first. So do this:

mex -setup
(Then press Enter)
(Then enter the number of a C compiler such as lcc)
(Then press Enter again)

Once this is done, try running mtimesx again and see if it builds OK.

To answer your specific question I would need to know the dimensions of each variable. I.e., what is the size of RT and Ex?

Comment only
28 Mar 2011 pink

pink (view profile)

hi all
I tried running "mtimesx_test_sdspeed.m", but an error occurs

? Error using ==> mtimesx_build at 169
AC / C + + compiler has not been selected with Mex-setup

Error in ==> mtimesx at 271
mtimesx_build;

Error in ==> mtimesx_test_sdspeed at 100
compver = [computer ',' version ', mtimesx mode' mtimesx ', the median of'
num2str (n) 'runs'];

I wanted to ask, how the use of "mtimesx" in the form like this
for i = 1:2000
Keg (:,:, i) = RT (:,:, i). '* Ex (:,:, i) * RT (:,:, i);
end
and I want to ask whether in addition to speed, can also reduce the data to be more efficient?

23 Feb 2011 James Tursa

James Tursa (view profile)

@ Daniel Lyddy: Confirmed typo in the prototypes for these functions. Change the int * to mwSignedIndex *. So

void xSYRK(char *, char *, mwSignedIndex *, mwSignedIndex *, RealKind *, RealKind *,
mwSignedIndex *, RealKind *, RealKind *, int*);
void xSYR2K(char *, char *, int*, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *,
RealKind *, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *);

becomes

void xSYRK(char *, char *, mwSignedIndex *, mwSignedIndex *, RealKind *, RealKind *,
mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *);
void xSYR2K(char *, char *, mwSignedIndex *, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *,
RealKind *, mwSignedIndex *, RealKind *, RealKind *, mwSignedIndex *);

Thanks for pointing this out. That being said, these typos in all likelihood should not affect the actual results since in both cases 64-bit pointers were being passed and the underlying data is in fact mwSignedIndex in spite of the prototype. I will upload fixed files soon.

Comment only
22 Feb 2011 Daniel Lyddy

Daniel Lyddy (view profile)

I am getting some warnings about incompatible pointer types when compiling on a 64-bit Centos Linux 5.5 system. For example, in line 4909 referred to below, argument 10 is 'LDC' which is #defined to &ldc which is a pointer to mwSignedIndex. In the blas/lapack dsyrk function, argument 10 is also called 'LDC', but it is an integer. Pointers on 64-bit systems are typically larger than 32 bit ints ... maybe these blas/lapack arguments need to be cast to something else?

mex('-DDEFINEUNIX','-largeArrayDims','mtimesx.c',blas_lib)
mtimesx.c: In function ‘mexFunction’:
mtimesx.c:592: warning: assignment discards qualifiers from pointer target type
In file included from mtimesx.c:1303:
mtimesx_RealTimesReal.c: In function ‘DoubleTimesDouble’:
mtimesx_RealTimesReal.c:4209: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4211: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4212: warning: passing argument 3 of ‘dsyr2k_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4225: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4227: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4228: warning: passing argument 3 of ‘dsyr2k_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4241: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4243: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4257: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4259: warning: passing argument 10 of ‘dsyrk_’ from incompatible pointer type
In file included from mtimesx.c:1481:
mtimesx_RealTimesReal.c: In function ‘FloatTimesFloat’:
mtimesx_RealTimesReal.c:4209: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4211: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4212: warning: passing argument 3 of ‘ssyr2k_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4225: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4227: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4228: warning: passing argument 3 of ‘ssyr2k_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4241: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4243: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4257: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type
mtimesx_RealTimesReal.c:4259: warning: passing argument 10 of ‘ssyrk_’ from incompatible pointer type

Comment only
05 Jan 2011 Bruno Luong

Bruno Luong (view profile)

I'm not using deploytool, just MCC command.

Comment only
05 Jan 2011 James Tursa

James Tursa (view profile)

@ Bruno Luong: Sorry I am not going to be much help here since I do not have a Windows 7 platform to test with. Are you using deploytool, and then the result can't use mtimesx?

Comment only
05 Jan 2011 Bruno Luong

Bruno Luong (view profile)

There is an annoying compatibility issue with deploying MTIMESX on Window7 platform, The MCR issues an error message "mex file in incorrect". It works fine under Matlab or other OS. I just can't use it when the issue is not resolved.

Comment only
16 Nov 2010 James Tursa

James Tursa (view profile)

@ Matt J: Issue mtimesx without any arguments to find out what the current calculation mode is. If you enter any of the OpenMP directives but mtimesx was not compiled with an OpenMP enabled compiler then mtimesx will revert to the nearest functionality. So SPEEDOMP will revert to SPEED, LOOPSOMP will revert to LOOPS, and the omp_etc functions will revert to substitute functions instead of the actual OpenMP functions. Your above example looks like what would happen if mtimesx was not compiled with an OpenMP enabled compiler. To see how mtimesx thinks it was compiled, issue the mtimesx('OPENMP') command and it will return a 1 (yes OpenMP) or 0 (No OpenMP).

Comment only
16 Nov 2010 Matt J

Matt J (view profile)

James, the latest version of mtimesx doesn't display at the command line any of the new modes (so of course I don't know if they're being used).

>> mtimesx SPEEDOMP

ans =

SPEED

Comment only
28 Oct 2010 Kai

Kai (view profile)

14 Oct 2010 Jan Simon

Jan Simon (view profile)

Thanks James! See this thread for applying MTIMESX to calculate a fast 2-norm for the matrix along a specified dimension:

08 Oct 2010 James Tursa

James Tursa (view profile)

@Matt J: I will look into it. Rather than pre-converting I may be able to fake out a call back to MATLAB with a pointer to the interior of the nD array and fudged up dimensions. Highly against the mex rules, but it might work. I should have taken advantage of the simple reshape method you use for the non-transposed cases (I didn't recognize it at the time I was updating the code) and that should not be too hard to implement. But the transpose cases for the 2D slices and the case for the sparse matrix being the right operand is going to be tricky to avoid a data copy. Guess I have a weekend project now ...

Comment only
08 Oct 2010 Matt J

Matt J (view profile)

James - I appreciate you taking my comments into account in the latest rev, but I'm still seeing weird stuff. From the manual:

"If you combine sparse inputs with full nD (n > 2) inputs, MTIMESX will convert the sparse input to full since MATLAB does not
support a sparse nD result."

Pre-converting the input from sparse to full leads to very slow performance, certainly slower than regular MATLAB as the comparison below shows. Is it really the only convenient solution? Also, MATLAB not supporting sparse nD is again not the issue, since even for n=2, sparse*full is always full in regular MATLAB.

%fake data
N=300;M=200;
A=speye(N);
B=rand(N,N,M);
Bs=single(B);

mtimesx SPEED;

tic;
ans1=mtimesx(A,B);
toc;
%Elapsed time is 0.681635 seconds.

tic;
ans2=reshape( A*reshape(B,N,[]) , size(B) );
toc;
%Elapsed time is 0.159543 seconds.

isequal(ans1,ans2)%1

"If you combine double sparse and single inputs, MTIMESX will convert the single input to double since MATLAB does not support a single sparse result."

As above, even if MATLAB did support sparse*fullsingle you would expect the result to be full single, not sparse single. And also, this feature is showing very slow behavior as compared to regular MATLAB, for some reason

tic;
ans3=mtimesx(A,Bs);
toc;
%Elapsed time is 0.817832 seconds.

tic;
ans4=single( reshape( A*reshape(double(Bs),N,[]) ,size(B)) );
toc;
%Elapsed time is 0.402111 seconds.

isequal(ans3,ans4),%1

Comment only
01 Oct 2010 James Tursa

James Tursa (view profile)

@Matt J: Thanks for the comments. I will get to work on the suggested changes right away.

Comment only
30 Sep 2010 Matt J

Matt J (view profile)

Just a couple more remarks regarding some of the limitations mentioned in the user manual,

"You cannot combine double sparse and single inputs, since MATLAB does not support a single sparse result."

It would be worthwhile to silently cast the single input to double prior to the matrix multiply. I've never understood why MATLAB doesn't implement the same. It only forces the user to do it manually.

"You also cannot combine sparse inputs with full nD (n > 2) inputs, since MATLAB does not support a sparse nD result."

Don't understand that. Native MATLAB produces a full result when combining sparse matrix multiplicands with full ones. No reason why you couldn't do the same for nD, as far as I can see...

Comment only
30 Sep 2010 Matt J

Matt J (view profile)

I like it. The only thing I find a little peculiar is that the output of
mtimesx('MODE') reports the previous mode rather than the current one.

27 Aug 2010 Sebastiaan

Sebastiaan (view profile)

Something like A(A<1e-8) = 0 should automatically remove small numbers from the sparsity pattern.

If this is too slow, have a look at spfun; it may be faster.

However, if you mean if you can remove small elements from the output matrix, you should write something of your own, partially multiplying the matrix and removing small elements. As a tip, use full column vectors of the matrix, this will make the multiplication much faster than square blocks, since sparse matrices are stored by columns. E.g.:
C = spalloc(size(A,1), size(B,2), 0);
for j=1:size(B,2)
R = A*B(:,j);
R(R<1e-8) = 0;
C(:,j) = R;
end
is (probably) a fast implementation. Try to take multiple columns for speed of course.

Comment only
27 Aug 2010 Julio

Julio (view profile)

Thanks Sebastiaan, I never thought to check the density of the resulting matrix and you are absolutely right: the memory blew up! By the way, is there any way to define an “operational zero” for sparse matrices operations in order to automatically make MATLAB assume a zero for numbers say < than 1e-8?
Thank you again.

Comment only
27 Aug 2010 Sebastiaan

Sebastiaan (view profile)

Could it not be possible that you are really out of memory? Your result is a 460191 by 460191 matrix, which is not necessarily sparse. If the sparsity pattern is poor, the result may even be a dense matrix. Storing a full matrix this size requires 1578GiB of memory, so you have to be sure that the result is ~2% sparse. If neither of the input matrices are <1% sparse, this is likely your problem.

Comment only
27 Aug 2010 Julio

Julio (view profile)

Thanks for your prompt reply. “mtimes” at the MATLAB prompt gives exactly the same error so, nothing wrong with “mtimesx”. Is it possible that this is a repetition of the “outer-product” problem? I am going to try partition the matrix “cb” in several chunks and then concatenate the results. Maybe it will work this way.
Thanks again.

Comment only
26 Aug 2010 James Tursa

James Tursa (view profile)

@ Julio: MTIMESX does not do generic sparse matrix multiplies internally, it just calls the MATLAB built-in function mtimes. i.e., it does the same thing as if you had just typed c21 * cb at the MATLAB prompt. Can you verify that c21 * cb works at the MATLAB prompt but mtimesx(c21,cb) does not work? Be sure that you start them both from the same state (e.g., clear all other big variables including ans before attempting the calculation). That would be surprising to me since it would indicate some type of mex limitation that I am currently unaware of.

Comment only
26 Aug 2010 Julio

Julio (view profile)

Hi, I am new using mtimesx but, so far, I think is great. Meanwile, I am trting to multiply 2 large sparse matrices "ci=mtimesx(c21,cb)". c21 is 460191 by 5579 and cb is 5579 by 460191. After several minutes I got a error message (similar to mtimes):
======================
??? Error using ==> mtimes
Out of memory. Type HELP MEMORY for your options.
Error in ==> mtimesx_sparse at 47
result = a * b;

Error in ==> mtimesx at 277
[varargout{1:nargout}] = mtimesx(varargin{:});
======================

I don't think that the problem is lack of memory (I have a 64 bit system with 8 GB of RAM and the memory usage never went above 5 GB). Couls you please comment on this.

Julio

Comment only
10 Aug 2010 James Tursa

James Tursa (view profile)

Also in today's release that I forgot to mention below: Added custom inline code for small matrix multiplies where all of the first two dimensions are <= 4 (in 'SPEED' mode).

Comment only
22 Jul 2010 D Sen

D Sen (view profile)

Thanks! I am seeing about a 3.25x improvement in speed for my application.

Comment only
21 Jul 2010 James Tursa

James Tursa (view profile)

@ D Sen: In addition, you would need to reshape audio_dft to be 32 x 1 x 4097.

Comment only
21 Jul 2010 James Tursa

James Tursa (view profile)

@ D Sen: The E array slicing is not set up the way mtimesx needs. mtimesx needs the array slices to be contiguous in memory (a requirement for the underlying BLAS calls). So your E array would need to be a 25 x 32 x 4097 array, making the 25 x 32 slices contiguous in memory (the 2D slices need to be the first two dimensions). The way you currently have it set up, all of your 25 x 32 array slices of E are individually scattered throughout memory and are not the first two dimensions, and mtimesx cannot use them that way.

Comment only
21 Jul 2010 D Sen

D Sen (view profile)

I have been trying to use this to multiply a 3D matrix with a 2D matrix.

The matlab code which works is as follows:

C = zeros ( 25, 4097 );
for( i_ = 1:4097 )
C(:, i_) = squeeze( E( i_, :, : ) ) * audio_dft( :, i_ );
end

Here E and audio_dft have the following sizes:

size(E)

ans =

4097 25 32

size( audio_dft)

ans =

32 4097

I get:

mtimesx( E, audio_dft )
??? Error using ==> mtimesx
Inner matrix dimensions must agree.

Comment only
10 May 2010 Val Schmidt

Val Schmidt (view profile)

Never mind, I found my problem. I have not successfully built mtimesx on Snow Leopard 10.6 with 64 bit MATLAB.

To do so I edited
/Applications/MATLAB_R2009b.app/bin/gccopts.sh
In the section for "maci64" I changed "SDKROOT" and "MACOSX_DEPLOYMENT_TARGET" to reflect a change in OS from 10.5 to 10.6.

Then I selected this opts file with
mex -setup
selecting "1" at the prompt.

Then I compiled everything with:
mex -v -DDEFINEUNIX -largeArrayDims mtimesx.c -lmwblas -lmwlapack

(look out for the 2 D's in -DDEFINEUNIX - that was my error)

Comment only
10 May 2010 Val Schmidt

Val Schmidt (view profile)

I'm trying to compile on 64bit MATLAB on Mac OSX 10.6 without success. The only changes I've made to the default have been to modify the MATLABROOT/bin/gccopts.sh file such that the SDK and OS Version are set to the 10.6 variety. (and then ran mex -setup)

It seems the linker cannot find the linear algebra packages, but it is not clear to me why. The output is below and I would be thankful for ideas on what to try next.

thanks

---------------snip------------
-> gcc-4.0 -O -Wl,-twolevel_namespace -undefined error -arch x86_64 -Wl,-syslibroot,/Developer/SDKs/MacOSX10.6.sdk -mmacosx-version-min=10.6 -bundle -Wl,-exported_symbols_list,/Applications/MATLAB_R2009b.app/extern/lib/maci64/mexFunction.map -o "mtimesx.mexmaci64" mtimesx.o -L/Applications/MATLAB_R2009b.app/bin/maci64 -lmx -lmex -lmat -lstdc++

Undefined symbols:
"_dsyr2k", referenced from:
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
"_dgemm", referenced from:
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
"_dgemv", referenced from:
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
"_sgemm", referenced from:
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
"_sgemv", referenced from:
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
"_dsyrk", referenced from:
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
_DoubleTimesDouble in mtimesx.o
"_ssyrk", referenced from:
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
"_ssyr2k", referenced from:
_FloatTimesFloat in mtimesx.o
_FloatTimesFloat in mtimesx.o
"_sdot", referenced from:
_floatcomplexdotproduct in mtimesx.o
_floatcomplexdotproduct in mtimesx.o
_floatcomplexdotproduct in mtimesx.o
_floatcomplexdotproduct in mtimesx.o
_floatcomplexdotproduct in mtimesx.o
"_ddot", referenced from:
_doublecomplexdotproduct in mtimesx.o
_doublecomplexdotproduct in mtimesx.o
_doublecomplexdotproduct in mtimesx.o
_doublecomplexdotproduct in mtimesx.o
_doublecomplexdotproduct in mtimesx.o
collect2: ld returned 1 exit status

mex: link of ' "mtimesx.mexmaci64"' failed.

Comment only
03 May 2010 Roland Kruse

Roland Kruse (view profile)

The code is suprisingly fast for multiplication of a sparse matrix by a skalar. I really wonder why Matlab is so slow in this (very simple) case.

26 Apr 2010 Val Schmidt

Val Schmidt (view profile)

James,
This is code is proving to be a FANTASTIC speed up to my work. I am so grateful that I thought I would post and comment.

First, I am working on R2008b on an intel mac. I compiled your code with

mex -DEFINEUNIX mtimesx.c

and although I received a handful of warnings about redefined pointers, it compiled fine without error. I also tried compiling the code with a few extra optimizations, out of curiosity:

mex -DEFINEUNIX CFLAGS = '\$CFLAGS -O3 -funroll-loops -fexpensive-optimizations -ffast-math' mtimesx.c

The result wasn't significantly faster than the original on my own processing test (I did not take the time to run it through your tests, however.).

I did run mtimesx_test_ddspeed initially to compare with MATLAB native routines, and while I didn't capture the results, I found that most things native MATLAB could do as fast or faster than mtimesx, with the notable exception of math involving complex conjugates, in which mtimesx was far, far faster.

However, the fantastic benefit for me was the native handling of 4D matrices by mtimesx. In my own code in which I am rotating several (8.3) million 3D vectors through a 3x3 rotation matrices, mtimesx increased the speed of my calculations 2100%!

Thanks so much for your time and effort. It is most gratefully appreciated.

-Val

23 Apr 2010 James Tursa

James Tursa (view profile)

@ veltz: Thanks for your comments. Regarding openMP, I think the short answer is no. I will give my reasons shortly, but please be advised I don't do any openMP stuff myself so my comments might not be correct.

The basic loop for nd stuff is this loop:

for( ip=0; ip<p; ip++ ) {

However, this is just the loop for the blocks in the C result. The blocks from the A and B arrays are calculated inside the loop based on the singleton expansion code (which depends on the dimensions of A and B) and are not known ahead of time ... at the end of one iteration the calculation for the A and B blocks for the next iteration takes place (the C block pointer just advances by the same amount each time). So it is not set up for a parallel loop. I would have to change the code and create three new arrays of pointers so that ahead of time all of the individual block starting points are known. *Then* maybe it could be put in a parallel loop.

Comment only
21 Apr 2010 veltz

veltz (view profile)

First, thank you for this marvellous code :)
Is there a way to use openMP in your code ? For example, when doing
C = zeros(2,6,4,5);
for m=1:4
for n=1:5
C(:,:,m,n) = A(:,:,m,n) * B(:,:,m,n);
end
end
we could call un pragma omp parallel for. As the code is pretty big, where does such loop appear in your code ?

Comment only
01 Apr 2010 Jon C

Jon C (view profile)

Debian 64-bit compiled right with:
mex CFLAGS="\\$CFLAGS -std=c99" -DDEFINEUNIX -largeArrayDims -lmwblas mtimesx.c
(the functional syntax of mex did not parse the arguments properly)

The GCC -ansi switch included by the default mexopts apparently refers to C89 (which does not recognize // -- C++ style comments). I wonder what is wrong with C99 as a default in mexopts?

Comment only
01 Apr 2010 Jon C

Jon C (view profile)

22 Mar 2010 Joshua Dillon

Joshua Dillon (view profile)

Very useful code. Good work.

The following snippet may be useful to those compiling on Ubuntu:
libblas='/usr/lib/libblas.so';
if exist(libblas,'file')~=2
system('sudo aptitude install libblas-dev');
end

% This will compile the code on linux
mex('CFLAGS=-std=c99 -fPIC','-DDEFINEUNIX','-largeArrayDims','-lmwblas','mtimesx.c');

19 Mar 2010 James Tursa

James Tursa (view profile)

@Fabio: Thanks very much for the 64-bit info. I was unaware that gcc did not like the // commenting style, which I prefer over the /* */. However, I will create additional files with changed comments and upload them in the next week or so. I would be very interested in seeing the details of what you had to change to get it to compile and the results of any benchmark tests you happen to run. Feel free to e-mail me at the address listed in the pdf file. Thanks!

Comment only
19 Mar 2010 Fabio Veronese

Fabio Veronese (view profile)

Great job!
Thanks to mtimesx I changed a huge for loop to 3D matrix computation, reducing script execution time from 14 s to 1.5 s!

Tested and compiled on Gentoo Linux (64bit) 2.6.33 on amd64 system. Compiling with gcc 4.3.* (not the deprecated 4.2 as suggested by Mathworks) I had to delete all quotes from C code files. Probably "//" sequence is not friendly to newer gcc compiler, maybe you can replace it with " /*quote*/ " wich is correctly ignored. Also to successfully compile I had to use:

mex('-DDEFINEUNIX','-largeArrayDims', '-lmwblas','mtimesx.c')

wich is something different from default. Both '-lmwlapack', '-lmwblas' are the options suggested for compiling in UNIX systems (see "Building MEX-Files" in doc), but that was enough. I can say it works correctly, but I didn't run the benchmarks files.

Hope this helps!
Feel free to ask if some testing on LINUX 64bit is needed. I'm going to run tests soon.

24 Feb 2010 Eric Larrieux

Eric Larrieux (view profile)

Thanks for the quick fix!

Comment only
23 Feb 2010 James Tursa

James Tursa (view profile)

Confirmed bug. The bug happens during the following conditions in MATLAB mode:

(row vector) * (non-square matrix)

(non-square matrix transposed) * (column vector)

Sorry about that! This was not caught in testing because I only used square matrices for these particular tests. The bug occurs because the BLAS routines DGEMV and SGEMV do not switch the dimension arguments when a transpose is involved. I had erroneously assumed that they did because that is how the BLAS routines DGEMM and SGEMM arguments work for general matrix multiplies. I have no idea why the interface for these routines is not set up the same. In any event, I will upload a fix tonight. In the meantime, you can manually put the fix in as follows:

Edit the mtimesx_RealTimesReal.c file.

Global replace the following text:

xGEMV(PTRANSA, M, K

with this text:

xGEMV(PTRANSA, &m1, &n1

And then also global replace the following text:

xGEMV(PTRANSB, N, K

with this:

xGEMV(PTRANSB, &m2, &n2

Those replacements will force the DGEMV and SGEMV calls to use the original dimensions of the first matrix argument instead of the switched dimensions.

Comment only
23 Feb 2010 Eric Larrieux

Eric Larrieux (view profile)

Thanks! Also, FYI: I am running R2009a...

Comment only
23 Feb 2010 James Tursa

James Tursa (view profile)

I will look into it.

Comment only
23 Feb 2010 Eric Larrieux

Eric Larrieux (view profile)

Hi I running matlab on a 32-bit windows machine and I am getting incorrect (all 0) results when i perform some simple 3D matrix multiplication in "matlab' mode only ('speed' mode seeps yield the correct results in all the cases i have tested so far). i.e.

>> % define a and b
>> a=2*ones(1,2,2)
a(:,:,1) =
2 2
a(:,:,2) =
2 2
>> b=3*ones(2,3,2)
b(:,:,1) =
3 3 3
3 3 3
b(:,:,2) =
3 3 3
3 3 3
>>
>> % verify result using normal matrix multiply
>> a(:,:,1)*b(:,:,1)
ans =
12 12 12
>> mtimes(a(:,:,1),b(:,:,1))
ans =
12 12 12
>>
>> % test results using mtimesx
>> mtimesx(a(:,:,1),b(:,:,1),'matlab')
ans =
0 0 0
>> mtimesx(a(:,:,1),b(:,:,1),'speed')
ans =
12 12 12

I just downloaded and installed the package yesterday, so I believe the outer-product bug discussed about should be fixed. Does anyone have any idea what is causing this error? Thanks!

-eric

Comment only
16 Feb 2010 James Tursa

James Tursa (view profile)

@newpolaris: Thanks for the input! It's hard for me to debug the 64-bit stuff since I don't have a system to test with, so comments like yours are very much appreciated. I have uploaded a new set of files with the largearraydims typo in the build routine fixed.

Comment only
12 Feb 2010 newpolaris ?

newpolaris ? (view profile)

grate! but mis-typing exist on mtimesx_build.m for 64bit Windows

on line 247 -> largearraydims = '-largearraydims';
it cause error like,
Warning: MEX could not find the library "argearraydims"
specified with -l option on the path specified
with the -L option.
... mex mtimesx.c build completed ... you may now use mtimesx.

so it muse be replace by "largearraydims = '-largeArrayDims';"

12 Feb 2010 newpolaris ?

newpolaris ? (view profile)

12 Feb 2010 newpolaris ?

newpolaris ? (view profile)

in my case, result is all zero matrix :)

11 Jan 2010 Tobias Wood

Tobias Wood (view profile)

Cannot thank you enough for this. Got a huge speed improvement (10 seconds down to 0.6) for a script using this. I'm staggered it's not in the core of Matlab.

06 Dec 2009 Bruno Luong

Bruno Luong (view profile)

It must have some gut to tackle what Matlab pretends to do the best: matrix computation.

The package is nicely presented. This is no doubt a professional quality code using the junk of state-of-art BLAS, an accompaniment PDF document to explain the “why” and the “how” in great detail.

As with any well written MEX-based package, there is an installation file to make conveniently the compilation before the package can be used. Just type the name and the job is done. The main path (no subfolders) of the package is left to user to add to Matlab preference.

Obviously there are 8 separate files that can be used to benchmark with Matlab. Their names tell something that not obvious for new comers, but for those who are familiar with Matlab, there is no doubt: It tests all 4 combination of double/single and one for equality test, another for speed test. I choose the “dd” one (double/double) likely the most used combination, and run the “equal” test. It crashes. Too bad, then despite of the crash I took my chance to run the speed test. Well, a bunch of tests are tic/toc against Matlab, with clear outputs indicating if it’s slower or faster than Matlab and how much in percent the runtimes are comparable. Something strange happened: in some cases, the MTIMESX show few hundred times faster than Matlab. I told: “It cannot be true; why Matlab performs so poorly?”. I decided to go step-by-step the speed-test function, and discovered there may be another bug. I contacted the author, and bingo, I get his explanation and solution few hours later. The bug (just one line missing) is due to last minute change before submission. Followed James’s instruction, I modified the code, rebuilt the Mex file, and run the test code. The results look much more reasonable. And I rerun the “equal” test, there is no longer crash.

One more word on my setup before reading the next: Windows Vista 32-bit, 2009B, and an old loyal VS C++ 6.0. It is important because MTIMESX performance will depend greatly on the setup, and this is clearly stated by the author.

The main function MTIMESX can operate on two modes: “MATLAB” (default) and “SPEED”. The first one tries to be as much as compatible with Matlab, and the second one is all about – as the name suggests - speed oriented. In all my first two tests I run “MATLAB” modes, the last one will compare two modes.

PERTINENT:

I. 2D Arrays
Speed tests run with all possible combination of cases: complex/real; normal, transposed, conjugate, conjugate-transposed; then combinations of matrix/vector/scalar; all that apply for first or second matrix arguments. Each argument might have then 2*4*3=24 states, not counting a vector can be column or row. Combine both, the test run probably no less than 24^2=576 cases, and for 2D arrays only (ND arrays supported, we’ll be back on that).
Most of the time, MTIMESX (in Matlab mode) seems to perform more or less a comparable speed with Matlab, except for few cases where speed advantage is clearly significant. Here is one typical of such case:

>> n=2000;
>> A=rand(n)+1i*rand(n);
>> B=rand(1,n)+1i*rand(1,n);
>> tic; C=conj(A)*B.'; toc
Elapsed time is 0.094847 seconds.

>> tic; C=mtimesx(A,'G',B,'T'); toc
Elapsed time is 0.034242 seconds.

That about 3 times faster than Matlab. Taking a closer look, it seems Matlab is penalized by carrying out explicitly the transpose and conjugate before the product as show in the next

>> A1=conj(A);
>> B1=B.';
>> tic; C=A1*B1; toc
Elapsed time is 0.030122 seconds.

whereas MTIMESX performs those operations *on-the-fly* during the product. Very good. But I would argue that the speed gain is less in practice. In fact if the product is to be carried out once, careful developers usually manage to build conj(A) and B.’ and not A and B. In this case all the speed gain of MTIMESX is irrelevant. If A and conj(A) are used to perform product once each, then there is speed gain is about 1.5 (3/2). If they are used many times, then the developer might perform CONJ once and store it somewhere before product function MTIMES is invoked, and the speed gain get even closer to 1. Still there is clearly an advantage in MTIMEX, for speed and it avoid storing extra matrix conj(A).

My impression is this is the only (and significant) factor that contributes to the speed gain: performing CONJUGATE on the fly. Recent Matlab (2008A, i.e., v.7.6) benefits smart TRANSPOSE on the fly, according to Tim-Davies.

II. ND-arrays
The nicest feature of MTIMESX is the capability to perform multiple matrix products ranged in a ND-array. Previously there is such function available on Matlab File Exchange coded by Peter Boettcher also using BLAS, but it seems this package has now a minimum support, and users has trouble to compile it on 64-platform. So this new feature is very welcome.
When a Matlab developer wants to make product of multiple matrix of the same size, up to date he has 4 alternatives
1. Use for-loop
2. use a product based on sparse diagonal block (for example function SliceMultiProd available in http://www.mathworks.com/matlabcentral/fileexchange/24260-multiple-same-size-linear-solver
3. Use the package MULTIPROD where the element-wise product and sum are carried out, http://www.mathworks.com/matlabcentral/fileexchange/8773-multiple-matrix-multiplications-with-array-expansion-enabled
4. Use BLAS based, such as Peter Boettcher’s ndfun, and now James Tursa’s MTIMESX

Here is a code I use to benchmark the four approaches. Note that in small matrix size (of less than said 5) are often encountered in practice.

function mprodbench(m,n)
% Bench mark four approaches of product of n (m x m) matrices

A=rand(m,m,n);
B=rand(m,m,n);

C=zeros(size(A,1),size(B,2),size(B,3));
tic
for k=1:size(B,3)
C(:,:,k) = A(:,:,k)*B(:,:,k);
end
t1=toc;

tic
C=SliceMultiProd(A,B);
t2=toc;

tic
C=multiprod(A,B);
t3=toc;

tic
C=mtimesx(A,B);
t4=toc;

fprintf('Matrix size=%d, number matrices=%d\n', m, n);
fprintf('\tfor-loop = %f, rel speed gain = (%f)\n', t1, t1/t1);
fprintf('\tSliceMultiProd = %f, rel speed gain = (%f)\n', t2, t1/t2);
fprintf('\tmultiprod = %f, rel speed gain = (%f)\n', t3, t1/t3);
fprintf('\tmtimesx

end

And here is the results:

> mprodbench(3,1e4)
Matrix size=3, number matrices=10000
for-loop = 0.055119, rel speed gain = (1.000000)
SliceMultiProd = 0.014130, rel speed gain = (3.900815)
multiprod = 0.007815, rel speed gain = (7.052806)
mtimesx = 0.006185, rel speed gain = (8.911272)

>> mprodbench(10,1e4)
Matrix size=10, number matrices=10000
for-loop = 0.074665, rel speed gain = (1.000000)
SliceMultiProd = 0.146336, rel speed gain = (0.510228)
multiprod = 0.151051, rel speed gain = (0.494303)
mtimesx = 0.022569, rel speed gain = (3.308258)

That shows MTIMESX is the best and consistent approach of multiple same-size matrix product. It supports inputs with any multiple-trailing dimensions.

III. SPEED VS MATLAB MODES

From my test, speed mode is faster about 30-50% typically for tensorial product (column vector multiplied row vector). Beside that case, the speed gain is probably not spectacular.

CONCLUSION:
- For whom? Developers who are interested in optimizing speed and want to take advantage the last ounce of performance from their computers. I have not tested in older version of Matlab but MTIMES, but it will be more surely pertinent as Matlab makes constant progress such as smart matrix transposition. Those who run on latest Matlab (from 2008A) has limited benefits besides saving extra conjugate and transpose operations of their matrices. The syntax of MTIMESX is simple, though quite not lean as Matlab operator is the price to pay.
- MTIMESX is clearly a best approach for multiple-product of matrices. Go for it!

Fully justified, until Mathlab catching up.

Bruno

05 Dec 2009 James Tursa

James Tursa (view profile)

As a workaround prior to Monday you can make this change to the mtimesx_RealTimesReal.c file near line 222:

scalarmultiply = 0; // <-- add this line
if( m1 == 1 && n1 == 1 ) {
scalarmultiply = 1;

Comment only
05 Dec 2009 James Tursa

James Tursa (view profile)

There is a bug in the outer product code. Will be fixed as soon as TMW posts updated version. Probably Monday Dec 7.

Comment only
04 Dec 2009 1.1

Fixed bug for (scalar) * (sparse) when the scalar contains an inf or NaN (in which case the zeros do not stay zero). Slight update to pdf doc.

07 Dec 2009 1.2

Fixed bug in scalar multiply code that was causing incorrect results & crashes.

10 Dec 2009 1.3

Added singleton expansion capability for multi-dimensional matrix multiplies.

11 Dec 2009 1.4

Fixed a bug for empty transa or transb inputs. Now treats these the same as 'N'. Also simplified the nD singleton expansion code a bit.

07 Jan 2010 1.5

Added the multi-dimensional test routine mtimesx_test_nd.m for speed and equality tests. Added its description to the pdf file.

16 Feb 2010 1.6

Fixed a typo in the build routine for 64-bit systems, changed -largearraydims to -largeArrayDims

23 Feb 2010 1.7

Fixed a bug for some of the (row vector) * (matrix) and (matrix transposed) * (column vector) operations in MATLAB mode that involved incorrect dimensions in the dgemv and sgemv calls.

10 Aug 2010 1.8

Added capability for nD scalar multiplies (i.e., 1x1xN * MxKxN).
Replaced the buggy mxRealloc API routine with custom code.
Updated mtimesx_test_nd.m file.

07 Oct 2010 1.9

Added OpenMP support for custom code
Expanded sparse * single and sparse * nD support
Fixed (nD complex scalar)C * (nD array) bug

23 Feb 2011 1.10

Fixed typos in the dsyrk, dsyr2k, ssyrk, and ssyr2k BLAS function prototypes. (These typo fixes should not change any results)