Path: news.mathworks.com!newsfeed-00.mathworks.com!nlpi057.nbdc.sbc.com!prodigy.net!news.glorb.com!postnews.google.com!37g2000yqp.googlegroups.com!not-for-mail
From: kem <kemelmi@gmail.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: making an evaluation of a VERY long expression much faster
Date: Sun, 5 Jul 2009 01:38:59 -0700 (PDT)
Organization: http://groups.google.com
Lines: 136
Message-ID: <daa801bc-7d74-4b4a-88c2-75288f711ad2@37g2000yqp.googlegroups.com>
References: <64605448-23d8-4aab-946f-a66103ac62b6@n11g2000yqb.googlegroups.com> 
	<932c3692-54a8-453b-bcf5-e6e0f69ca69f@h18g2000yqj.googlegroups.com> 
	<h2pn8h$t2l$1@fred.mathworks.com>
NNTP-Posting-Host: 132.77.4.43
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1246783139 23568 127.0.0.1 (5 Jul 2009 08:38:59 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Sun, 5 Jul 2009 08:38:59 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: 37g2000yqp.googlegroups.com; posting-host=132.77.4.43; 
	posting-account=UKxCVgoAAAABvWaCgw-9-SAknbQcOdS3
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; 
	Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727),gzip(gfe),gzip(gfe)
X-HTTP-Via: 1.1 wisweb (NetCache NetApp/6.0.6)
Xref: news.mathworks.com comp.soft-sys.matlab:552857


On Jul 5, 11:12 am, "James Tursa" <aclassyguywithakno...@hotmail.com>
wrote:
> kem <keme...@gmail.com> wrote in message <932c3692-54a8-453b-bcf5-e6e0f69ca...@h18g2000yqj.googlegroups.com>...
>
> > Hi Arun and James,
> > Thanks for your response.
>
> > Here's the expression I am evaluating:
>
> > E=ux.*vy.^2.*c.*wx+2.*ux.*b.*wy.^2+ux.*c.*wx+2.*ux.*B.*wy+vy.^2.*c.*wx
> > +vx.^2.*a.*wy.^2+ux.^2.*b.*wy.^2+ux.^2.*g.*vy.^2+vy.*B.*wy+2.*vy.*c.*wx
> > +2.*ux.*vy.*B.*zy+2.*ux.*vy.*c.*wx+ux.*wx.*A.*wy+ux.*vy.^2.*c.*zx
> > +ux.*wx.*A.*zy+ux.*wy.*A.*zx+4.*ux.*wy.*b.*zy+2.*ux.*vy.*B.*wy
> > +ux.^2.*B.*wy+2.*ux.^2.*g.*vy+2.*ux.*vy.*c.*zx-ux.*vx.*c.*zy-
> > ux.*vx.*A.*zy.^2-ux.*vx.*c.*wy+2.*ux.*g.*vy.^2+4.*ux.*g.*vy
> > +uy.^2.*g.*vx.^2+uy.^2.*b.*wx.^2+wx.*A.*wy
> > +vy.^2.*a.*wx.^2+2.*vy.*a.*wx.^2+ux.*vy.*wx.*A.*wy+ux.*vy.*A.*zx.*zy
> > +ux.*vy.*wy.*A.*zx-ux.*vx.*A.*wy.^2+ux.*vy.*wx.*A.*zy-ux.*uy.*B.*zx-
> > ux.*uy.*B.*wx-ux.*uy.*vy.*B.*zx-
> > ux.*uy.*vy.*B.*wx-2.*ux.*vx.*wy.*A.*zy-2.*ux.*uy.*wx.*b.*wy-2.*ux.*uy.*b.*z­x.*zy-
> > ux.*uy.*vx.*B.*wy-ux.*vx.*vy.*c.*wy-
> > ux.*vx.*vy.*c.*zy-2.*ux.*uy.*vx.*g.*vy-2.*ux.*uy.*vx.*g-
> > ux.*uy.*vx.*B.*zy-2.*ux.*uy.*b.*wy.*zx-2.*ux.*uy.*wx.*b.*zy
> > +uy.^2.*vx.*B.*zx+uy.^2.*vx.*B.*wx+2.*uy.^2.*b.*wx.*zx
> > +2.*vy.^2.*wx.*a.*zx+2.*vx.^2.*a.*wy.*zy+4.*vy.*wx.*a.*zx+vy.*wx.*A.*wy
> > +vy.*wy.*A.*zx+vy.*wx.*A.*zy-vx.*A.*wy.^2-vx.*c.*wy+2.*ux.^2.*wy.*b.*zy
> > +ux.^2.*vy.*B.*zy-uy.*B.*wx-
> > uy.*A.*wx.^2+ux.^2.*vy.*B.*wy-2.*vx.*wx.*a.*wy-2.*vx.*wx.*a.*zy-
> > uy.*vy.*A.*wx.^2-uy.*vy.*B.*zx-uy.*vy.*A.*zx.^2-
> > uy.*vy.*B.*wx-2.*vx.*wy.*A.*zy-2.*vx.*wy.*a.*zx-2.*uy.*wx.*b.*wy-
> > uy.*vx.*c.*zx-uy.*vx.*B.*wy-vx.*vy.*c.*wy-
> > vx.*vy.*c.*zy-2.*uy.*vx.*g.*vy-uy.*vx.*B.*zy-2.*uy.*b.*wy.*zx-
> > uy.*vx.*c.*wx-2.*vx.*vy.*wy.*a.*zx-2.*vx.*vy.*a.*zx.*zy-2.*vx.*vy.*wx.*a.*w­y-2.*uy.*vx.*g-2.*uy.*vy.*wx.*A.*zx-
> > uy.*vx.*vy.*c.*wx+uy.*vx.*wy.*A.*zx-uy.*vx.*vy.*c.*zx+uy.*vx.*A.*zx.*zy
> > +uy.*vx.*wx.*A.*wy
> > +uy.*vx.*wx.*A.*zy-2.*uy.*wx.*A.*zx-2.*vx.*vy.*wx.*a.*zy
> > +uy.*vx.^2.*c.*wy-2.*uy.*wx.*b.*zy+uy.*vx.^2.*c.*zy;
>
> > Each of the variables is a 200x200 matrix. and I am calculating these
> > before the evaluation of this expression, in each iteration of the
> > optimization. Essentially I'd like to minimize this expression.
>
> > I was thinking to write this line in mex, will it be evaluated faster?
>
> Well, I have to admit the subject title you picked for this thread is *very* accurate ... that is indeed a very long expression! It has a lot of repeated variables, and a lot of intermediate variables, which potentially makes it a good candidate for speedup in a custom mex routine, particularly if you are using a good optimizing C compiler. To give you an example, I just coded up the first line in your expression. Here is the build process I used:
>
>     ux = rand(200,200);
>     vy = rand(200,200);
>     c  = rand(200,200);
>     wx = rand(200,200);
>     b  = rand(200,200);
>     wy = rand(200,200);
>     B  = rand(200,200);
>
> Here is the mex routine (bare bones, no argument checking):
>
> #include "mex.h"
> void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
> {
>     double *ux, *vy, *c, *wx, *b, *wy, *B, *E;
>     mwSize i, n;
>     n = mxGetNumberOfElements(prhs[0]);
>     ux = mxGetPr(prhs[0]);
>     vy = mxGetPr(prhs[1]);
>     c  = mxGetPr(prhs[2]);
>     wx = mxGetPr(prhs[3]);
>     b  = mxGetPr(prhs[4]);
>     wy = mxGetPr(prhs[5]);
>     B  = mxGetPr(prhs[6]);
>     plhs[0] = mxCreateDoubleMatrix(mxGetM(prhs[0]), mxGetN(prhs[0]), mxREAL);
>     E = mxGetPr(plhs[0]);
>     for( i=0; i<n; i++ ) {
>         E[i] = ux[i]*vy[i]*vy[i]*c[i]*wx[i]+
>                2.*ux[i]*b[i]*wy[i]*wy[i]+
>                ux[i]*c[i]*wx[i]+
>                2.*ux[i]*B[i]*wy[i]+
>                vy[i]*vy[i]*c[i]*wx[i];
>     }
>
> }
>
> And here are the results (WinXP, MSVC8):
>
> >> tic;ux.*vy.^2.*c.*wx+2.*ux.*b.*wy.^2+ux.*c.*wx+2.*ux.*B.*wy+vy.^2.*c.*wx;to­c
>
> Elapsed time is 0.006098 seconds.
>
> >> tic;dottimestest(ux,vy,c,wx,b,wy,B);toc
>
> Elapsed time is 0.001312 seconds.
>
> So there is quite a bit of speedup here. We have eliminated the *very* many intermediate variables created with the MATLAB expression and all of the associated extra allocation/deallocation and conversion between 64-bit double and 80-bit extended precision. In essence, the compiler is very likely doing the entire expression in 80-bit extended precision and only one conversion each way. Also presumably the compiler is taking advantage of common sub-expressions and eliminating some of the computations. So for your case, yes it looks like a mex routine will yield a significant speedup if you are willing to put in the effort to build it. Using my example as a starting point it probably will not take you too long. One piece of advice ... although it will look ugly be sure to put the entire expression on one logical line. Don't use temporary variables to store sub-expressions and then use
> those variables to sum up later on. By doing everything on one logical line (i.e., it can be on several physical lines like my example above but don't put a semi-colon until the very end) you will give the compiler the freedom to do everything in 80-bit extended precision (if you are on a PC) and this will minimize extra unnecessary conversions and loss of precision between 64-bit double and 80-bit extended. But be advised that you will in all likelihood *not* get exactly the same result as the MATLAB expression. This is to be expected because you are not doing all of the unnecessary multiple 64-bit / 80-bit conversions for the intermediate variables like the MATLAB expression does. The results will probably only differ in a few trailing bits which will probably not be significant (max relative difference likely in the 1e-15 range), but in any event the mex routine will be the more
> accurate of the two because of the reasons spelled out above.
>
> James Tursa- Hide quoted text -
>
> - Show quoted text -

many thanks!! will try it now.