Solution:
As of MATLAB 7.4 (R2007a), MATLAB supports multithreaded computation for a number of linear algebra functions (e.g. matrix multiply), element-wise numerical functions (e.g. cos), and expressions that are combinations of element-wise functions (e.g. y=4*x*(sin(x) + x^3)). These functions automatically execute on multiple threads and you do not need to explicitly specify commands to create threads in your code.
For a function or expression to execute faster (speed up) on multiple CPUs, the following conditions must be true:
1) The operations in the algorithm carried out by the function are easily partitioned into sections that can be executed concurrently, and with little communication or few sequential operations required. This is the case for all element-wise operations. Matrix operations using BLAS are threaded only if the BLAS library used on the system is itself threaded.
2) The data size is large enough so that any advantages of concurrent execution outweigh the time required to partition the data and manage separate execution threads. For example, most functions speed up only when the array is greater than several thousand elements.
3) The operation is not memory-bound where the processing time is dominated by memory access time, as is the case for simple operations such as element-wise addition. As a general rule, more complex functions speed up better than simple functions.
The results (graph) in the attached resolution documents were obtained by running tests on two different machines: one, a dual-core system and, the other, a dual processor system. Since the increase in speed is also a function of processor memory and cache architecture, as well as its operating system, we cannot guarantee that you will experience the same results.
The following functions exhibit an increase in speed by a factor of 1.2 - 2.0 on the 2-CPU machines tested, processing double precision arrays with more elements than those described. Note that this list is not exhaustive, and represents testing on particular computers with a particular version of MATLAB.
Element Wise Functions and Expressions:
------------------------------------------------------------------------------------------------
Functions that speed up for double arrays > 20k elements
1) Trigonometric: ACOS(x), ACOSH(x), ASIN(x), ASINH(x), ATAN(x), ATAND(x), ATANH(x), COS(x), COSH(x), SIN(x), SINH(x), TAN(x), TANH(x)
2) Exponential: EXP(x), POW2(x), SQRT(x)
3) Operators: x.^y
For Example: 3*x.^3+2*x.^2+4*x +6, sqrt(tan(x).*sin(x).*3+8);
Functions that speed up for double arrays > 200k elements
4) Trigonometric: HYPOT(x,y), TAND(x)
5) Complex: ABS(x)
6) Rounding and remainder: UNWRAP(x), CEIL(x), FIX(x), FLOOR(x), MOD(x,N), ROUND(x)
7) Basic and array operations: LOGICAL(X), ISINF(X), ISNAN(X), INT8(X), INT16(X), INT32(X)
Linear Algebra Functions:
------------------------------------------------------------------------------------------------
Functions that speed up for double arrays > 40k elements (200 square)
1)Operators: X*Y (Matrix Multiply), X^N (Matrix Power)
2)Reduction Operations : MAX and MIN (Three Input), PROD, SUM
3) Matrix Analysis: DET(X), RCOND(X), HESS(X), EXPM(X)
4) Linear Equations: INV(X), LSCOV(X,x), LINSOLVE(X,Y), A\b (backslash)
5) Matrix Factorizations: LU(X), QR(X) for sparse matrix inputs
6) Other Operations: FFT and IFFT of multiple columns of data, FFTN, IFFTN, SORT, BSXFUN, GAMMA, GAMMALN, ERF,ERFC,ERFCX,ERFINV,ERFCINV, FILTER
------------------------------------------------------------------------------------------------
Speed up of more than 2 on a 2-CPU system can be observed for certain data sizes due to cache effects. This is because data of a certain size may be too large to fit into the cache of one processor but half the data does fit into the cache of each of the two processors. See the June 2007 News and Notes article Maximizing Code Performance by Optimizing Memory Access:
http://www.mathworks.com/company/newsletters/news_notes/june07/patterns.html
Example plot of speed up by data size:
(Please refer to the attached resolution document.)
The graph shows the speed up of an example function, ACOS(x), for varying size double arrays on the 2 CPU machine A.
Tests were carried out on the following machines:
------------------------------------------------------------------------------------------------
1) Dual core machine A: Intel Core Duo 1.83GHz T2400 processor, Lenovo T60 ThinkPad laptop, 2MB L2 cache, 2GB RAM, 32-bit Windows XP
2) Dual processor machine B: Dual 2.1GHz AMD Opteron 248 desktop, 1MB L2 Cache, 1GB RAM, 64-bit Linux.
Functions that have been multithreaded in MATLAB 7.8 (R2009a)
FFT, FFT2, FFTN, IFFT, IFFT2, IFFTN, PROD, SUM, MAX, MIN
Functions that have been multi-threaded in MATLAB 7.9(R2009b)
SORT, BSXFUN, MLDIVIDE for sparse matrix input, QR for sparse matrix input,FILTER, GAMMA, GAMMALN, ERF, ERFC,ERFCX,ERFINV, ERFCINV