Because the matrix multiplication is associative; the product can be carried with different order, leading to the same result up to round-off error, MMTIMES usings "optimal" order of binary product to reduce the computational effort (probably accuracy is also improved)

The function assumes the cost of the product of (m x n) with (n x p) matrices is (m*n*p). This assumption is typically true for full matrix.

Dear Derek,
Great educational writing, I enjoy reading it a lot.
Yes, the complexity of the implemented code is not optimal. It's still perfectly alright for product of less than 10 matrices, which is sufficient in most of the practical use. When I have time I'll take a look at methods based on non-overlapping polygonal partition.

I have found that most mathematicians and 'numerical' people do not know that the <<complexity>> of matrix chain multiplication is not associative. Here is a Matlab exercise I gave to students 5 years ago to demonstrate this point:

My mathematical friends liked this exercise and were very impressed that the solution to the optimum ordering involved Catalan Numbers. Also, they pointed out that these numbers did not come from Catalonia.

I agree with Ged Ridgway's comments and thank him for the pointer to a more efficient algorithm.

I'm glad someone has implemented this. Two comments:

Apparently an O(N*log(N)) algorithm is available:
http://en.wikipedia.org/wiki/Matrix_chain_multiplication#An_Even_More_Efficient_Algorithm
though I don't know how this compares to the current O(N^3) one for typical values of N (e.g. the constants could make N*log(N) worse until N gets above a certain level).

More importantly (for me, anyway), I think the computation of the ordering should be a separate function, and a set ordering should then be usable in a multiplication function. E.g. if I have some code which repeatedly computes a chain multiplication in a loop, where the matrices' sizes are always the same and are known in advance, but their values vary, then much better than using mmtimes(A,B,C,...) would be to find the correct ordering first, and then hard-code that ordering in the loop. Similarly, if their sizes are constant but not known in advance, then a two-function division of labour would allow the ordering to be computed on the first iteration of the loop, and then reused without recomputation in successive iterations.

Updates

21 Jun 2010

quicker top-down algorithm

23 Jun 2010

Treat the case of scalar matrices

16 Aug 2010

Update, optimal order can be input/output according to Ged Ridgway's suggestion