Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Powers are slow, multiplies fast; optimized badly?

Subject: Powers are slow, multiplies fast; optimized badly?

From: Irl

Date: 19 Jul, 2010 17:49:04

Message: 1 of 11

MATLAB seems to implement x^2 as a multiply but x^3, or higher power, via something much slower (exp of number times log?). For large enough n, exp n log will be faster than multiplying n times, but the internal code generator seems to err wildly on the side of using the slow method, whatever it is, rather than just repeated multiplies. But it does do x^2, apparently, as x*x. Is there a way to make it use repeated multiplies for higher exponents? I wrote a Zernike polynomial program and it's very tiresome to go change all the powers to multiply strings.
Example program, including comments giving the execution times:
function TestIntOpt(NumEl)
if nargin<1; NumEl = 1e7; end
% Commented results are for 1e7 as input argument
tt=rand(1,NumEl);
tic
x = tt; %zero ms
T1=toc;fprintf('Simple assign, %.0f elements: %.0f ms\n',NumEl,1000*T1);
tic
x = tt.^2; %takes 70 msec
T1=toc;fprintf('Second power (square): %.0f ms\n',1000*T1);
tic
x = tt.^3; %takes 1200 msec
T1=toc;fprintf('Third power (cube): %.0f ms\n',1000*T1);
tic
x = tt.^2.*tt; %takes 100 msec
T1=toc;fprintf('Cube via square times first power: %.0f ms\n',1000*T1);
tic
x = tt.^10; %takes 1200 msec
T1=toc;fprintf('Tenth power: %.0f ms\n',1000*T1);
tic
x = tt.*tt.*tt.*tt.*tt.*tt.*tt.*tt.*tt.*tt; %takes 170 msec
T1=toc;fprintf('Tenth power via nine multiplies: %.0f ms\n',1000*T1);
tic
x = exp(10*log(tt)); %takes 660 msec
T1=toc;fprintf('Tenth power via exp 10 log: %.0f ms\n',1000*T1);

Subject: Powers are slow, multiplies fast; optimized badly?

From: Walter Roberson

Date: 19 Jul, 2010 18:25:46

Message: 2 of 11

Irl wrote:
> MATLAB seems to implement x^2 as a multiply but x^3, or higher power,
> via something much slower (exp of number times log?). For large enough
> n, exp n log will be faster than multiplying n times, but the internal
> code generator seems to err wildly on the side of using the slow method,
> whatever it is, rather than just repeated multiplies. But it does do
> x^2, apparently, as x*x. Is there a way to make it use repeated
> multiplies for higher exponents?

What you say about the speed agrees with my tests of 2007a. When I noticed
this, I implemented a routine that tested the power and some conditions on the
base value and used the most efficient routine (e.g., sometimes
multiplication, sometimes realpow(), falling back to .^ for cases uncommon
enough not to be worth special handling in my usage.)

However, after a while of using that, I encountered some subtle accuracy
issues that took a while to track down (partly because I "knew" the "right
answer"). In time I abandoned this approach and went back to .^ when I realized:

For floating point values, repeated multiplication to form x^n is not the same
as exp(n*log(x)) due to differences in the propagation of errors (with
repeated multiplication being less accurate.)

Subject: Powers are slow, multiplies fast; optimized badly?

From: James Tursa

Date: 19 Jul, 2010 20:24:04

Message: 3 of 11

Walter Roberson <roberson@hushmail.com> wrote in message <i225i6$anh$1@canopus.cc.umanitoba.ca>...
> Irl wrote:
> > MATLAB seems to implement x^2 as a multiply but x^3, or higher power,
> > via something much slower (exp of number times log?). For large enough
> > n, exp n log will be faster than multiplying n times, but the internal
> > code generator seems to err wildly on the side of using the slow method,
> > whatever it is, rather than just repeated multiplies. But it does do
> > x^2, apparently, as x*x. Is there a way to make it use repeated
> > multiplies for higher exponents?
>
> What you say about the speed agrees with my tests of 2007a. When I noticed
> this, I implemented a routine that tested the power and some conditions on the
> base value and used the most efficient routine (e.g., sometimes
> multiplication, sometimes realpow(), falling back to .^ for cases uncommon
> enough not to be worth special handling in my usage.)
>
> However, after a while of using that, I encountered some subtle accuracy
> issues that took a while to track down (partly because I "knew" the "right
> answer"). In time I abandoned this approach and went back to .^ when I realized:
>
> For floating point values, repeated multiplication to form x^n is not the same
> as exp(n*log(x)) due to differences in the propagation of errors (with
> repeated multiplication being less accurate.)

For low powers, e.g. 2 or 3 etc., this is likely not an issue. The direct multiplication will be faster and just as accurate. For higher powers, the propagation of errors could probably be controlled with a smarter multiplication scheme (e.g., binary decomposition of the power), but I don't know at what point the speed becomes a wash. I would have to run some tests. e.g., a modified version of this FEX submission to use times instead of mtimes could do it somewhat optimally:

http://www.mathworks.com/matlabcentral/fileexchange/25782-mpower2-a-faster-matrix-power-function

James Tursa

Subject: Powers are slow, multiplies fast; optimized badly?

From: Bruno Luong

Date: 19 Jul, 2010 21:56:04

Message: 4 of 11

"Irl " <irl_smith@raytheonRemoveTheseWords.com> wrote in message <i2236g$oco$1@fred.mathworks.com>...
> MATLAB seems to implement x^2 as a multiply but x^3, or higher power, via something much slower (exp of number times log?). For large enough n, exp n log will be faster than multiplying n times, but the internal code generator seems to err wildly on the side of using the slow method, whatever it is, rather than just repeated multiplies. But it does do x^2, apparently, as x*x. Is there a way to make it use repeated multiplies for higher exponents? I wrote a Zernike polynomial program and it's very tiresome to go change all the powers to multiply strings.

Usually when using Zernike polynomials, not only a specific power is needed, but all powers in the range are needed. In this case, you might compute them at once (for each variable). This function can accomplish that, using the same idea of binary decomposition mentioned by James. Put the following in the same mfile, then call this example in command line:

 p = allpow(1.1,100);
1.1^100 / p(end)

function powers = allpow(x, pmax)
% function powers = allpow(x, pmax)
% compute all powers of order(0,1,...pmax) of scalar x, return the
% result in powers (having length=pmax+1)

powers= zeros(1, pmax+1);
powers(1) = 0;
powers(2) = x;
for k=2:pmax
    [p2 r] = bindecomp(k);
    powers(k+1) = powers(p2+1)*powers(r+1);
end
end %% allpow

function [p2 r] = bindecomp(n)
% [p2 r] = bindecomp(n)
% Given interger n>1, find p2 and r such that:
% p2 is power-of-two
% p2+r=n
% p2<n and r<n
previouspower2 = floorlog2(n);
if previouspower2==n
    p2 = n/2;
else
    p2 = previouspower2;
end
r = n-p2;
end % bindecomp

function previouspower2 = floorlog2(n)
% function previouspower2 = floorlog2(n)
% Given n integer >= 1
% Find the larger power-of-two smaller or equal to n
previouspower2 = 1;
while n>1
    n = floor(n/2);
    previouspower2 = 2*previouspower2;
end
end

% Bruno

Subject: Powers are slow, multiplies fast; optimized badly?

From: Steve Amphlett

Date: 20 Jul, 2010 06:46:05

Message: 5 of 11

"Irl " <irl_smith@raytheonRemoveTheseWords.com> wrote in message <i2236g$oco$1@fred.mathworks.com>...
> MATLAB seems to implement x^2 as a multiply but x^3, or higher power, via something much slower (exp of number times log?). For large enough n, exp n log will be faster than multiplying n times, but the internal code generator seems to err wildly on the side of using the slow method, whatever it is, rather than just repeated multiplies. But it does do x^2, apparently, as x*x. Is there a way to make it use repeated multiplies for higher exponents? I wrote a Zernike polynomial program and it's very tiresome to go change all the powers to multiply strings.
> Example program, including comments giving the execution times:
> function TestIntOpt(NumEl)
> if nargin<1; NumEl = 1e7; end
> % Commented results are for 1e7 as input argument
> tt=rand(1,NumEl);
> tic
> x = tt; %zero ms
> T1=toc;fprintf('Simple assign, %.0f elements: %.0f ms\n',NumEl,1000*T1);
> tic
> x = tt.^2; %takes 70 msec
> T1=toc;fprintf('Second power (square): %.0f ms\n',1000*T1);
> tic
> x = tt.^3; %takes 1200 msec
> T1=toc;fprintf('Third power (cube): %.0f ms\n',1000*T1);
> tic
> x = tt.^2.*tt; %takes 100 msec
> T1=toc;fprintf('Cube via square times first power: %.0f ms\n',1000*T1);
> tic
> x = tt.^10; %takes 1200 msec
> T1=toc;fprintf('Tenth power: %.0f ms\n',1000*T1);
> tic
> x = tt.*tt.*tt.*tt.*tt.*tt.*tt.*tt.*tt.*tt; %takes 170 msec
> T1=toc;fprintf('Tenth power via nine multiplies: %.0f ms\n',1000*T1);
> tic
> x = exp(10*log(tt)); %takes 660 msec
> T1=toc;fprintf('Tenth power via exp 10 log: %.0f ms\n',1000*T1);

Related...

If you are writing MEX files (or just normal programs/functions) that use powers, consider switching to C++ and use std::pow() in place of C's very restrictive pow() and powf() functions. It has overloads for integral exponents, which can choose the optimum method (multiple multiplications vs logarithms vs ???).

I cannot comment on what FORTRAN and its modern incarnations do for intergral exponents.

I'm guessing that Matlab _probably_ uses C's pow() and powf() internally.

Subject: Powers are slow, multiplies fast; optimized badly?

From: Bruno Luong

Date: 20 Jul, 2010 06:49:04

Message: 6 of 11

I dig out the old thread where the binary decomposition power algorithm is provided (need to change '*' to '.*' to adapt to the context):

http://www.mathworks.com/matlabcentral/newsreader/view_thread/264887#692226

Bruno

Subject: Powers are slow, multiplies fast; optimized badly?

From: Bruno Luong

Date: 20 Jul, 2010 11:10:05

Message: 7 of 11

Here is a benchmark with the binary method for integer power (Matlab 2010A 64, Vista):
 
function TestIntOpt(NumEl)
if nargin<1; NumEl = 1e7; end

tt=rand(1,NumEl);
k = 10;

tic
x = tt.^k; %takes 1159msec
T1=toc;fprintf('Tenth power: %.0f ms\n',1000*T1);

tic
x = exp(k*log(tt)); %takes 560 msec
T1=toc;fprintf('Tenth power via exp 10 log: %.0f ms\n',1000*T1);

tic
x = intpow(tt,k); %takes 265 msec
T1=toc;fprintf('Tenth power via intpow: %.0f ms\n',1000*T1);

end % TestIntOpt

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function d = intpow(a, n)
% d = intpow(a, n)
% power with integer exponent
% return the array d = a.^n

% binary decomposition
nb = dec2bin(n)-'0';
ak = a;
if nb(end)
    d = a;
else
    d = 1;
end
for nbk = nb(end-1:-1:1)
    ak = ak.*ak;
    if nbk
        d = d.*ak;
    end
end

end % intpow

% Bruno

Subject: Powers are slow, multiplies fast; optimized badly?

From: dpb

Date: 20 Jul, 2010 12:38:12

Message: 8 of 11

Steve Amphlett wrote:
...

> Related...
>
> If you are writing MEX files (or just normal programs/functions) that
> use powers, consider switching to C++ and use std::pow() in place of C's
> very restrictive pow() and powf() functions. It has overloads for
> integral exponents, which can choose the optimum method (multiple
> multiplications vs logarithms vs ???).
>
> I cannot comment on what FORTRAN and its modern incarnations do for
> intergral exponents.
...

All combinations of intrinsic numeric types are permitted with the **
operator but you cannot raise a negative real value to a real power.

However, the Standard does not prescribe the implementation so the
actual algorithm/optimization is "processor dependent" (where
"processor" in the Standard means in generic terms the compiler).

For example, some compilers may recognize the case of (-1.0)**2.0 and
replace the real exponent w/ the integer; others may not. Similarly for
the subject here of what may be done in speed optimization will be a
"quality of implementation" issue not mandated by the Standard.

--

Subject: Powers are slow, multiplies fast; optimized badly?

From: Bruno Luong

Date: 20 Jul, 2010 14:35:05

Message: 9 of 11

This is a more compact version of the function allpow() in post #4

function powers = allpow(x, pmax)
% function powers = allpow(x, pmax)
%
% compute all powers of exponent P = (0,1,...,PMAX) of vector X.
% return the result in 2d array POWERS having size [length(X) x PMAX+1)]

powers= zeros(1, pmax+1);

if pmax>0
    powers(:,2) = x;
    p2 = 1;
    r = 1;
    for k=2:pmax
        powers(:,k+1) = powers(:,p2+1).*powers(:,r+1);
        if (p2==r)
            p2 = 2*p2;
            r = 1;
        else
            r = r+1;
        end
    end % for-loop
end

end %% allpow

Subject: Powers are slow, multiplies fast; optimized badly?

From: Irl

Date: 20 Jul, 2010 15:13:06

Message: 10 of 11

Dear Bruno, Steve, Walter, et al. --
Thanks for the interesting and useful info. My main question was whether there was any way to control how the internal vectorized power function deals with small integer powers, and so far it appears that there is not. For my application, the powers are no higher than ten or so, and my little test program suggests that for that regime the repeated-multiply method, possibly augmented with a binary decomposition, will be the fastest. I can do the binary decomposition manually for this case, although the possibility of doing it automatically is a useful trick as well.
Again, thanks for the help.
Yours,
Irl
"Irl " <irl_smith@raytheonRemoveTheseWords.com> wrote in message <i2236g$oco$1@fred.mathworks.com>...
> MATLAB seems to implement x^2 as a multiply but x^3, or higher power, via something much slower (exp of number times log?). For large enough n, exp n log will be faster than multiplying n times, but the internal code generator seems to err wildly on the side of using the slow method, whatever it is, rather than just repeated multiplies. But it does do x^2, apparently, as x*x. Is there a way to make it use repeated multiplies for higher exponents? I wrote a Zernike polynomial program and it's very tiresome to go change all the powers to multiply strings.
> Example program, including comments giving the execution times:
> function TestIntOpt(NumEl)
> if nargin<1; NumEl = 1e7; end
> % Commented results are for 1e7 as input argument
> tt=rand(1,NumEl);
> tic
> x = tt; %zero ms
> T1=toc;fprintf('Simple assign, %.0f elements: %.0f ms\n',NumEl,1000*T1);
> tic
> x = tt.^2; %takes 70 msec
> T1=toc;fprintf('Second power (square): %.0f ms\n',1000*T1);
> tic
> x = tt.^3; %takes 1200 msec
> T1=toc;fprintf('Third power (cube): %.0f ms\n',1000*T1);
> tic
> x = tt.^2.*tt; %takes 100 msec
> T1=toc;fprintf('Cube via square times first power: %.0f ms\n',1000*T1);
> tic
> x = tt.^10; %takes 1200 msec
> T1=toc;fprintf('Tenth power: %.0f ms\n',1000*T1);
> tic
> x = tt.*tt.*tt.*tt.*tt.*tt.*tt.*tt.*tt.*tt; %takes 170 msec
> T1=toc;fprintf('Tenth power via nine multiplies: %.0f ms\n',1000*T1);
> tic
> x = exp(10*log(tt)); %takes 660 msec
> T1=toc;fprintf('Tenth power via exp 10 log: %.0f ms\n',1000*T1);

Subject: Powers are slow, multiplies fast; optimized badly?

From: Dan Scholnik

Date: 31 Dec, 2012 01:15:10

Message: 11 of 11

An update to this thread: I found that even intpow() above is not as well optimized as is a direct command-line call to repeated multiplies, even when grouped identically to those done in intpow. In the script output below, I compare using .^, intpow, a series of ungrouped multiplies, and a series of multiplies grouped to match intpow. You can see that grouping a series of multiplications does affect the computation, but does not seem to affect the time of the computation, and repeated multiplies are much faster than intpow even though it seems to be doing the same or even fewer multiplies. I've also shown peak and rms errors between the various methods, to give some idea of how the errors go with the exponent. From this sample it seems that ungrouped multiplies is slightly more accurate with respect to the .^ operator than is intpow().

This was done on both R2012b and R2013a_pre on 64 bit linux with a dual-core processor. I also did this for X=randn(1e8,1) with similar results, to rule out overhead issues.


X=randn(1e7,1);

tic; X2=X.^2; toc
Elapsed time is 0.036954 seconds.
tic; X3=X.^3; toc
Elapsed time is 0.619918 seconds.
tic; X7=X.^7; toc
Elapsed time is 0.621204 seconds.
tic; X8=X.^8; toc
Elapsed time is 0.614439 seconds.

tic; X2a=intpow(X,2);toc
Elapsed time is 0.072780 seconds.
tic; X3a=intpow(X,3);toc
Elapsed time is 0.080734 seconds.
tic; X7a=intpow(X,7);toc
Elapsed time is 0.122126 seconds.
tic; X8a=intpow(X,8);toc
Elapsed time is 0.106735 seconds.

tic; X2b=X.*X;toc
Elapsed time is 0.037896 seconds.
tic; X3b=X.*X.*X;toc
Elapsed time is 0.037768 seconds.
tic; X7b=X.*X.*X.*X.*X.*X.*X;toc
Elapsed time is 0.043093 seconds.
tic; X8b=X.*X.*X.*X.*X.*X.*X.*X;toc
Elapsed time is 0.046745 seconds.

tic; X2c=X.*X;toc
Elapsed time is 0.037059 seconds.
tic; X3c=X.*(X.*X);toc
Elapsed time is 0.036769 seconds.
tic; X7c=X.*(X.*X).*((X.*X).*(X.*X));toc
Elapsed time is 0.044292 seconds.
tic; X8c=((X.*X).*(X.*X)).*((X.*X).*(X.*X));toc
Elapsed time is 0.048338 seconds.

max(abs(X7-X7a))
ans = 2.9104e-11
max(abs(X7-X7b))
ans = 2.9104e-11
max(abs(X7a-X7b))
ans = 1.4552e-11
max(abs(X7a-X7c))
ans = 0

rms(X7-X7a)
ans = 6.6114e-14
rms(X7-X7b)
ans = 4.8226e-14
rms(X7a-X7b)
ans = 5.4687e-14

max(abs(X8-X8a))
ans = 2.3283e-10
max(abs(X8-X8b))
ans = 1.1642e-10
max(abs(X8a-X8b))
ans = 1.1642e-10
max(abs(X8a-X8c))
ans = 0

rms(X8-X8a)
ans = 3.3424e-13
rms(X8-X8b)
ans = 1.9447e-13
rms(X8a-X8b)
ans = 2.9954e-13

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us