From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: How to make the function 'norm' treat its input as vectors?
Date: Thu, 14 Oct 2010 19:35:03 +0000 (UTC)
Organization: Boeing Co
Lines: 38
Message-ID: <i97m17$er4$>
References: <i938ok$smb$> <i93kdo$o8f$> <i94def$3h4$> <i94mh3$9c3$> <i94v8c$r8p$> <i958vn$cjb$> <i95ak3$srq$> <i96neg$hsc$> <i96r2q$d0f$> <i96sss$c0c$> <i970h5$dlh$> <i979nj$arb$> <i97g3s$eij$> <i97imb$4cl$> <i97j77$9db$>
Reply-To: <HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: 1287084903 15204 (14 Oct 2010 19:35:04 GMT)
NNTP-Posting-Date: Thu, 14 Oct 2010 19:35:03 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 756104
Xref: comp.soft-sys.matlab:678542

"Matt J " <mattjacREMOVE@THISieee.spam> wrote in message <i97j77$9db$>...
> "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <i97imb$4cl$>...
> > "Matt J " <mattjacREMOVE@THISieee.spam> wrote in message <i97g3s$eij$>...
> > 
> > > 
> > > The speed of MTIMESX won't matter. If you take norms along anything but columns using mtimesx, you will need to first permute/transpose the data, 
> > 
> > Is that true? Are you sure any explicit transposition is carried out? May be James can confirm it.
> ======
> According to my best understanding of how MTIMESX works, yes. The partitioning of  an nD array into submatrices by mtimesx is always in memory-contiguous blocks. Since rows of a matrix are not contiguous, I don't see how you can get mtimesx(A,B) to do operations between corresponding rows of A and B.

Sorry I am late!  I just noticed Jan Simon's post to my mtimesx FEX submission, which pointed me to this thread. So I am just now reading this thread for the first time and am not yet up to speed on the issues. So I will start by making some general comments about mtimesx as it applies to this calculation in one of Bruno's posts:

> b=mtimesx(reshape(A,[m 1 n]),'t',reshape(A,[m 1 n]));

The reshape function of course happens at the MATLAB level so this is transparent to mtimesx. The reshapes are pretty quick since they result in a shared data copy of A. So mtimesx will get these inputs

A1(m,1,n) T * A2(m,1,n)

So the end result is an nD dot product calculation of the columns. How mtimesx does this depends on the calculation mode chosen:

'BLAS': Uses calls to DDOT in a loop for each column dot product.
'MATLAB':  Uses calls to DDOT in a loop for each column dot product.
'SPEED': Uses custom C coded loops or DDOT calls, depending on which method it thinks may be faster (depends on complexity of inputs, whether it is a symmetric case with A1 & A2 actually pointing to same data area, etc.)
'LOOPS': Uses custom C coded loops.
'LOOPSOMP': Uses multi-threaded C coded loops if m is large enough to warrant the extra overhead, else uses C coded loops.
'SPEEDOMP': Makes a guess as to which of 'BLAS','LOOPS', or 'LOOPSOMP' is likely to be fastest and uses that.

For this dot product of columns case, there is of course no need to physically transpose any input since it is mainly a dimension bookkeeping issue (a mx1 vector in memory is the same as it's transpose in memory).

The multi-threaded OpenMP stuff in mtimesx is very new, and was only recently added a couple of weeks ago. I have not yet implemented everything that I plan to. For example, in the above calculation mtimesx will only use OpenMP if the value of m is sufficiently large to warrant the extra overhead of of multi-threading the individual dot products. i.e., it is only multi-threading the first two dimensions of the calculation. What about the case for small m and large n? Obviously in that case one should not attempt to multi-thread the dot product calculation itself, but instead multi-thread on the third index. That is a future enhancement that I am currently working on but is *not* yet implemented in the current version of mtimesx.

What about cases where a transpose operation involves a matrix and not a vector? In that case it is not just a bookkeeping issue ... there is a real transpose involved. In these cases mtimesx will typically just call a BLAS routine to do the work with appropriate inputs to indicate the transpose ... no physical transpose of the inputs is done a priori, it is simply done as part of the matrix multiply inside the BLAS routine itself.

What about taking dot products of rows instead of columns? This is a different problem because of the contiguous data issue that has already been pointed out earlier in this thread. For the contiguous column case it was simple because the inputs could be reshaped into nD "vectors". Not so for the row case. It will hinge on whether or not the problem can be reformulated into a matrix multiply. I don't know how to do this for the general case, so at first look I think I agree with Matt that trying to use a matrix multiply for this will not work.

James Tursa