Asked by Knut
on 6 Sep 2011

I want to calculate a reasonable short vector many times with variable inputs a and len

%y = [a a^2 a^3...a^(len)]

My methods so far are the following:

y1 = cumprod(a(ones(len,1))) y2 = a.^(1:len)

y1 seems to be faster than y2 for len>11 or so. Any suggestions for doing this even faster? The knowledge that y(n) = y(n-1)*y(1) should be exploitable somehow, I just dont see how.

*No products are associated with this question.*

Answer by James Tursa
on 6 Sep 2011

Or this variation, which I don't see in the current posts:

y = cumprod(a.*ones(1,n));

I think the MATLAB JIT may recognize this formulation and not actually do the multiply, but just populate the result ala repmat.

And if this calculation is dominating your overall runtime, you can always turn to a simple mex function to do it and avoid the imtermediate array formulations.

Answer by Oleg Komarov
on 6 Sep 2011

time = zeros(100,3); for len = 1:100 a = rand(1); o = ones(1,len); v = 1:len;

tic for s = 1:1000; y1 = cumprod(a(o)); end time(len,1) = toc;

tic for s = 1:1000; y2 = a.^(v); end time(len,2) = toc;

tic for s = 1:1000; y3 = bsxfun(@power,a,v); end time(len,3) = toc; end

plot(time) legend('cumprod','.^','bsxfun')

**EDIT** cumprod, power, explog, cumprodx, cumprodpow

James' `cumprodpow` is the fastest.

num_iter = 3000; time = zeros(100,5); for len = 1:100 a = rand(1); tic for s = 1:num_iter y1 = cumprod(a(ones(1,len))); end time(len,1) = toc; pause(0.001) tic for s = 1:num_iter y2 = a.^(1:len); end time(len,2) = toc; pause(0.001) tic for s = 1:num_iter y3 = exp(log(a)*(1:len)); end time(len,3) = toc; pause(0.001) tic for s = 1:num_iter y4 = cumprodx(a,len); end time(len,4) = toc; pause(0.001) tic for s = 1:num_iter y5 = cumprod(a.*ones(1,len)); end time(len,5) = toc; end plot(time) legend('cumprod','power','explog','cumprodx','cumprodpow')

Andrei Bobrov
on 6 Sep 2011

Firstly, +1;

secondly, small additions:

time = zeros(100,5);

for len = 1:100

a = rand(1);

o = ones(1,len);

v = 1:len;

tic

for s = 1:1000; y1 = cumprod(a(o)); end

time(len,1) = toc;

tic

for s = 1:1000; y2 = a.^(v); end

time(len,2) = toc;

tic

for s = 1:1000; y3 = bsxfun(@power,a,v); end

time(len,3) = toc;

tic

for s = 1:1000; y4 = arrayfun(@(x)a^x,v); end

time(len,4) = toc;

tic

for s = 1:1000, for v = len:-1:1, y5(v) = a^v; end; end

time(len,5) = toc;

end

plot(time)

legend('cumprod','power','bsxfun','arrayfun','for..end')

Answer by Derek O'Connor
on 6 Sep 2011

Nobody has followed Knut's suggestion:

y(n) = y(n-1)*y(1)

It is in fact a special case of Horner's Method for evaluating a polynomial

yn(x) = a0 + a1*x + a2*x^2 + ... + an*x^n.

The time behavior of the two methods below is very erratic but the loop or Horner method always seems to beat the CumProd method by 30% to 50% up to about n = 150. After that, CumProd gets better and better. At n = 1000, Horner's time is about 5*CumProd time.

If you have the time try [yc,yh,time] = CumProdvsHorner(1000, 10000);

Has anyone got an explanation for the spikes in the execution times?

function [yc,yh,yx,time] = CumProdvsHorner(nmax,ns);

% Calculation of y = [a a^2 ... a^n] % USE: [yc,yh,yx,time] = CumProdvsHorner(300,1000);

nmeths=3; time = zeros(nmax,nmeths); ntime = zeros(nmax,nmeths);

for n = 2:nmax x = rand; xsave = x; yc = zeros(1,n); yh = zeros(1,n); yx = zeros(1,n);

tc = tic; for s = 1:ns yc = cumprod(x.*ones(1,n)); end time(n,1) = toc(tc);

tr = tic; x = xsave; for s = 1:ns yh(1) = x; for k = 2:n yh(k) = x*yh(k-1); end end time(n,2) = toc(tr);

tx = tic; for s = 1:ns yx = cumprodx(x,n); end time(n,3) = toc(tx);

end % for n

figure plot(20:nmax,time(20:nmax,:)) legend('CumProd','Horner','CumProdx') xlabel('n'); ylabel('Time (secs)') title('Calculate y(x,n) = [x x^2 x^3 ... x^n]')

for m = 1:nmeths ntime(:,m) = time(:,m)./(1:nmax)'; end

figure plot(20:nmax,ntime(20:nmax,:)) legend('CumProd','Horner','CumProdx') xlabel('n'); ylabel('Normalized Time T(n)/n') title('Calculate y(x,n) = [x x^2 x^3 ... x^n]') % % END CumProdvsHorner

EDIT: Added plots for normalized times, renamed some variables.

Show 2 older comments

Derek O'Connor
on 7 Sep 2011

@proecsm

Take a look at the normalized times with the updated function using:

[yc,yh,time] = CumProdvsHorner(1000,1000);

CumProd's behavior is very strange. As Walter says, this may be due to switching in and out of BLAS.

Answer by James Tursa
on 7 Sep 2011

Here is the quick and dirty mex solution:

// File cumprodx.c // c = cumprodx(a,n) does equivalent of c = cumprod(a.*ones(1,n)) // a = real double scalar // n = real double scalar with integral value > 0 // Caution: No argument checking #include "mex.h" void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { double a = mxGetScalar(prhs[0]); mwSize i, n = mxGetScalar(prhs[1]); double *pr; plhs[0] = mxCreateDoubleMatrix(1,n,mxREAL); pr = mxGetPr(plhs[0]); pr[0] = a; for( i=1; i<n; i++ ) { pr[i] = pr[i-1] * a; } }

Derek O'Connor
on 7 Sep 2011

James,

For [yc,yh,yx,time] = CumProdvsHorner(1000,10000);

Horner beats cumprod up to n = 150, cumprodx up to n = 250.

After that cumprod beats cumprodx but they seem to be converging to equality at about n = 1500 - 2000.

James Tursa
on 7 Sep 2011

Derek O'Connor
on 7 Sep 2011

A good point. I had overlooked the function call overhead, which

can be a significant part of the time to do the necessary (n-1)

multiplications.

However, the timings serve as a warning not to forget the simpler,

loopy methods, especially with (an improving?) JIT compiler.

Vectorization has become something of a fetish with some Matlab

programmers, which blinds them to simpler methods.

Answer by Knut
on 7 Sep 2011

Thank you for all your help. There seems to be many different ways to accomplish this, each with different speed-tradeoffs. The mex-file might run faster with single-presicion and optimization flags, but I did not bother.

For lengths < ~80 elements, 'Horners Method' seems to be the best choice. I did notice some strange "warm-up" effects: increasing the number of iterations caused the relative performance to shift quite a lot. Is this the JIT playing games on me?

num_iter = 100; len_span = [1:50 100:100:2000 1e4 3e4 1e5 3e5 1e6]; num_iter_span = 1e6./len_span; time = zeros(length(len_span),7); for len_idx = 1:length(len_span) num_iter = num_iter_span(len_idx); len = len_span(len_idx); eps = 1e-7; a = rand(1);

tic for s = 1:num_iter; y1 = cumprod(a*ones(1, len)); end time(len_idx,1) = toc; assert(max(abs(y1-y1))<eps);

tic for s = 1:num_iter; y2 = a.^(1:len); end time(len_idx,2) = toc; assert(max(abs(y1-y2))<eps);

tic for s = 1:num_iter; y3 = bsxfun(@power,a,1:len); end time(len_idx,3) = toc; assert(max(abs(y1-y3))<eps);

tic for s = 1:num_iter, for v = len:-1:1 y4(v) = a^v; end; end time(len_idx,4) = toc; assert(max(abs(y1-y4))<eps);

tic for s = 1:num_iter, y5(1) = a; for v = 2:len y5(v) = a*y5(v-1); end; end time(len_idx,5) = toc; assert(max(abs(y1-y5))<eps);

tic for s = 1:num_iter, y6 = cumprodx(a,len); end time(len_idx,6) = toc; assert(max(abs(y1-y6))<eps);

tic for s = 1:num_iter, y7 = exp(log(a)*(1:len)); end time(len_idx,7) = toc; assert(max(abs(y1-y7))<eps); end

figure semilogx(len_span, time) xlabel('vector length') ylabel('time/vector length') grid on axis tight legend('cumprod(a([1 1,...]))','power','bsxfun','for..end', 'Horners Method', 'mex', 'log')

Knut
on 7 Sep 2011

Because this is similar to how I use the code in my application:

len_vect = [9 20 19 42 50];

a_vect = [0.7 0.8 0.9 1 1.1];

for t = 1:length(len_vect)

y = a_vect(t).^(1:len_vect(t));

end

As both a and len change for each iteration, it makes sense to include them within the tic and toc: I dont see how I can hide the cost of creating the vectors outside the loop?

Oleg Komarov
on 7 Sep 2011

Anyways, in this case I find that James' cumprod(a.*ones(1,len)) is the fastest, always.

Opportunities for recent engineering grads.

## 0 Comments