Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Vectorizing data lookback?

Subject: Vectorizing data lookback?

From: Nicholas

Date: 31 Aug, 2008 05:44:02

Message: 1 of 13

I'm trying to develop a function that traverses a vector and
outputs a '1' if the current value is greater than the
previous N-values, '0' if it's not.

Programming this into a "for" loop is trivial:

for ii = n:length(data)

     if data(ii) > max( data(ii-n:ii-1) );
          output(ii) = 1;
     else
          output(ii) = 0;

end;

The problem is this approach takes forever and a day to
process. It takes ~2 orders of magnitude longer to run than
similar (but vectorized) functions I've written that operate
on the same exact data.

I've been trying to vectorize this function as well to
improve the speed, but I can't wrap my brain around it. Any
suggestions?

Subject: Vectorizing data lookback?

From: heiko_marx@hotmail.com

Date: 1 Sep, 2008 09:26:39

Message: 2 of 13

On 31 Aug., 07:44, "Nicholas "
<OtherPeopleTookAllTheGoodNa...@deleteme.gmail.com> wrote:
> I'm trying to develop a function that traverses a vector and
> outputs a '1' if the current value is greater than the
> previous N-values, '0' if it's not.
>
> Programming this into a "for" loop is trivial:
>
> for ii =3D n:length(data)
>
> =A0 =A0 =A0if data(ii) > max( =A0data(ii-n:ii-1) =A0);
> =A0 =A0 =A0 =A0 =A0 output(ii) =3D 1;
> =A0 =A0 =A0else
> =A0 =A0 =A0 =A0 =A0 output(ii) =3D 0;
>
> end;
>
> The problem is this approach takes forever and a day to
> process. It takes ~2 orders of magnitude longer to run than
> similar (but vectorized) functions I've written that operate
> on the same exact data.
>
> I've been trying to vectorize this function as well to
> improve the speed, but I can't wrap my brain around it. Any
> suggestions?

Hi.

I found that avoiding the max function in each loop reduces the time
by a good amount.

>> tic; for ii =3D n:length(data), if data(ii) > max(data(1:ii-1)); output(=
ii) =3D 1; else output(ii) =3D 0;end, end, toc
Elapsed time is 0.010647 seconds.
>> tic; m =3D max(data(1:n-1)); for ii =3D n:length(data), if data(ii) > m,=
 output2(ii) =3D 1; m =3D data(ii); else output2(ii) =3D 0;end, end, toc
Elapsed time is 0.001932 seconds.
>> isequal(output, output2)
ans =3D
     1

The times refer to a data set of 1000 random values, aand are of
course machine-dependent.

Hope this helps,
Heiko

Subject: Vectorizing data lookback?

From: Titus

Date: 1 Sep, 2008 14:03:21

Message: 3 of 13


"Nicholas " <OtherPeopleTookAllTheGoodNames@deleteme.gmail.com> schrieb im
Newsbeitrag news:g9db32$oq6$1@fred.mathworks.com...
> I'm trying to develop a function that traverses a vector and
> outputs a '1' if the current value is greater than the
> previous N-values, '0' if it's not.
>
> Programming this into a "for" loop is trivial:
>
> for ii = n:length(data)
>
> if data(ii) > max( data(ii-n:ii-1) );
> output(ii) = 1;
> else
> output(ii) = 0;
>
> end;
>
> The problem is this approach takes forever and a day to
> process. It takes ~2 orders of magnitude longer to run than
> similar (but vectorized) functions I've written that operate
> on the same exact data.
>
> I've been trying to vectorize this function as well to
> improve the speed, but I can't wrap my brain around it. Any
> suggestions?

Hi,

without really answering the question: did you preallocate your output?
Something like
output = zeros(size(data));
before entering the loop?

Titus

Subject: Vectorizing data lookback?

From: Nicholas

Date: 3 Sep, 2008 15:28:02

Message: 4 of 13

heiko_marx@hotmail.com wrote in message
<d3e32ef1-ba0b-4789-a5b1-220c64ee361c@25g2000hsx.googlegroups.com>...

> Hi.
>
> I found that avoiding the max function in each loop
reduces the time
> by a good amount.
>
> >> tic; for ii =3D n:length(data), if data(ii) >
max(data(1:ii-1)); output(=
> ii) =3D 1; else output(ii) =3D 0;end, end, toc
> Elapsed time is 0.010647 seconds.
> >> tic; m =3D max(data(1:n-1)); for ii =3D n:length(data),
if data(ii) > m,=
> output2(ii) =3D 1; m =3D data(ii); else output2(ii) =3D
0;end, end, toc
> Elapsed time is 0.001932 seconds.
> >> isequal(output, output2)
> ans =3D
> 1
>
> The times refer to a data set of 1000 random values, aand
are of
> course machine-dependent.
>
> Hope this helps,
> Heiko


Heiko-

If I'm reading your code correctly, it wouldn't work for my
application. It computes a single max value for the vector
m, and compares this value to data on a bar by bar basis.

In my application, I need to compare each data point to the
previous 'n' data points where n < length(data). In other
words, I have vectors that have ~750K data points. If I
specify n = 25, the function would take data point 26 from
that vector and see if it was greater than the previous 25
data points (1-25). It would then move to data point 27 and
see if was greater than the previous 25 data points (2-26).
Then it moves to point 28 and compares it to point 3-27...

Subject: Vectorizing data lookback?

From: Nicholas

Date: 3 Sep, 2008 15:30:19

Message: 5 of 13

"Titus" <titus.edelhofer@mathworks.de> wrote in message
<g9gsna$qll$1@fred.mathworks.com>...

> Hi,
>
> without really answering the question: did you preallocate
your output?
> Something like
> output = zeros(size(data));
> before entering the loop?
>
> Titus
>
>

Titus-

Yes I do preallocate the vectors. I've tried to do
everything I can to make the loop JIT compatible, but it's
still too slow.

Nick

Subject: Vectorizing data lookback?

From: Lars Barring

Date: 3 Sep, 2008 17:04:03

Message: 6 of 13

"Nicholas "
<OtherPeopleTookAllTheGoodNames@deleteme.gmail.com> wrote in
message <g9maib$9di$1@fred.mathworks.com>...

>
> Yes I do preallocate the vectors. I've tried to do
> everything I can to make the loop JIT compatible, but it's
> still too slow.

Maybe this FEX function for running extrema (max or min or
both)

http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=18551&objectType=file

will help you speed up the code. Probably you will have to
create a vector of the running max-values of the same length
as your input data.

hth
Lars

Subject: Vectorizing data lookback?

From: Alan B

Date: 3 Sep, 2008 18:40:19

Message: 7 of 13

"Lars Barring" <lars.barring@myworkplace.se> wrote in
message <g9mg23$ed4$1@fred.mathworks.com>...
> "Nicholas "
> <OtherPeopleTookAllTheGoodNames@deleteme.gmail.com> wrote in
> message <g9maib$9di$1@fred.mathworks.com>...
>
> >
> > Yes I do preallocate the vectors. I've tried to do
> > everything I can to make the loop JIT compatible, but it's
> > still too slow.
>
> Maybe this FEX function for running extrema (max or min or
> both)
>
>
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=18551&objectType=file
>
> will help you speed up the code. Probably you will have to
> create a vector of the running max-values of the same length
> as your input data.
>
> hth
> Lars

If you have the signal processing toolbox, you can use the
buffer function to transform a 1D signal into a (window
size)*(total number of windows) 2D array, then take the max
columnwise, and compare with the appropriately shifted 1D
signal. I'm not sure if that would improve the speed any,
and I'm pretty sure you'd have to break down the full vector
beforehand too... But its something to try.

If you don't have buffer() available, you can do something
similar with reshape, using a for loop over (window size)
iterations rather than (number of windows) iterations, which
may help things some.

Subject: Vectorizing data lookback?

From: Bruno Luong

Date: 3 Sep, 2008 19:11:02

Message: 8 of 13


>
> I've been trying to vectorize this function as well to
> improve the speed, but I can't wrap my brain around it. Any
> suggestions?

ndata=750e3;
nwin=25;
upper=10;

data=round(upper*rand(1,ndata));

% for loop
tic
output1=zeros(size(data));
for ii = nwin+1:length(data)
     if data(ii) > max( data(ii-nwin:ii-1) );
          output1(ii) = 1;
     else
          output1(ii) = 0;
     end
end
toc % Elapsed time is 9.436772 seconds.

% expanding vectorized
tic
[I J]=ndgrid(2:nwin+1,1:ndata);
temp = data(max(J-I+1,1));
temp(J<I)=Inf;
output2=all(bsxfun(@gt,data,temp),1);
toc % Elapsed time is 2.461195 seconds.

% check the results are identical
all(output1==output2)

% Bruno

Subject: Vectorizing data lookback?

From: Bruno Luong

Date: 3 Sep, 2008 19:50:18

Message: 9 of 13

Faster still:

ndata=750e3;
nwin=25;
upper=10;

data=upper*rand(1,ndata);

tic
output1=zeros(size(data));
for ii = nwin+1:length(data)
     if data(ii) > max( data(ii-nwin:ii-1) );
          output1(ii) = 1;
     else
          output1(ii) = 0;
     end
end
toc % Elapsed time is 9.467494 seconds.

tic
temp = data(max(bsxfun(@minus, (1:ndata),(1:nwin)'),1));
for i=1:size(temp,1)
    temp(i,1:i)=Inf;
end
output2=all(bsxfun(@gt,data,temp),1);
toc % Elapsed time is 1.530637 seconds.

all(output1==output2)

% Bruno

Subject: Vectorizing data lookback?

From: Lars Barring

Date: 3 Sep, 2008 21:38:02

Message: 10 of 13

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in
message <g9mppq$f7c$1@fred.mathworks.com>...
> Faster still:
> .....

and here is a third alternative that clocks in favourably

% clean up and start test from scratch
clear all
pack

%some test data
ndata=10e6; %%% some more data, may need to reduced
nwin=5;
nwin2=floor(nwin/2);
upper=10;
data=round(upper*rand(1,ndata));

% Bruno 1
tic
output1=zeros(size(data));
for ii = nwin+1:length(data)
     if data(ii) > max( data(ii-nwin:ii-1) );
          output1(ii) = 1;
     else
          output1(ii) = 0;
     end
end
toc


% Bruno 2 runs out of memory
%tic
%temp = data(max(bsxfun(@minus, (1:ndata),(1:nwin)'),1));
%for i=1:size(temp,1)
% temp(i,1:i)=Inf;
%end
%output2=all(bsxfun(@gt,data,temp),1);
%toc


% running Extreme: FEX #18551 by Dan Kominsky
tic
nwin2=floor(nwin/2); % for convenience
maxd=runningExtreme(data,nwin,'max');
% adjustments to handle centred running window
maxd(1:nwin2)=NaN;
maxd=[repmat(NaN,1,nwin2+1) maxd(1:end-nwin2-1)];
output3=data>maxd;
toc

%all(output1==output2)

all(output1==output3)

%%%%%


hth
Lars

Subject: Vectorizing data lookback?

From: Bruno Luong

Date: 3 Sep, 2008 22:03:01

Message: 11 of 13

"Lars Barring" <lars.barring@myworkplace.se> wrote in
message <g9n03q$3le$1@fred.mathworks.com>...

>
> % running Extreme: FEX #18551 by Dan Kominsky
>

... Ah very neat this van Herk's algorithm

Here is timings on my laptop with 1e6 data:

Elapsed time is 12.691225 seconds. % for loop
Elapsed time is 2.264592 seconds. % "Bruno 2"
Elapsed time is 0.796989 seconds. % Lars/Dan

Bruno

Subject: Vectorizing data lookback?

From: heiko_marx@hotmail.com

Date: 5 Sep, 2008 06:03:12

Message: 12 of 13

On 3 Sep., 17:28, "Nicholas "
<OtherPeopleTookAllTheGoodNa...@deleteme.gmail.com> wrote:
> heiko_m...@hotmail.com wrote in message
>
> <d3e32ef1-ba0b-4789-a5b1-220c64ee3...@25g2000hsx.googlegroups.com>...
>
> > Hi.
>
> > I found that avoiding the max function in each loop
> reduces the time
> > by a good amount.
>
> > >> tic; for ii =3D3D n:length(data), if data(ii) >
>
> max(data(1:ii-1)); output(=3D
>
>
>
> > ii) =3D3D 1; else output(ii) =3D3D 0;end, end, toc
> > Elapsed time is 0.010647 seconds.
> > >> tic; m =3D3D max(data(1:n-1)); for ii =3D3D n:length(data),
> if data(ii) > m,=3D
> > =A0output2(ii) =3D3D 1; m =3D3D data(ii); else output2(ii) =3D3D
> 0;end, end, toc
> > Elapsed time is 0.001932 seconds.
> > >> isequal(output, output2)
> > ans =3D3D
> > =A0 =A0 =A01
>
> > The times refer to a data set of 1000 random values, aand
> are of
> > course machine-dependent.
>
> > Hope this helps,
> >Heiko
>
> Heiko-
>
> If I'm reading your code correctly, it wouldn't work for my
> application. It computes a single max value for the vector
> m, and compares this value to data on a bar by bar basis.
>
> In my application, I need to compare each data point to the
> previous 'n' data points where n < length(data). In other
> words, I have vectors that have ~750K data points. If I
> specify n =3D 25, the function would take data point 26 from
> that vector and see if it was greater than the previous 25
> data points (1-25). It would then move to data point 27 and
> see if was greater than the previous 25 data points (2-26).
> Then it moves to point 28 and compares it to point 3-27...

Hi again.

What my algorithm does is assume that if the value for n =3D 25 is
greater than max(v(1:24)), then max(v(1:25)) =3D v(25). In that case you
don't have to find the max value from 1:n, but only from the last n
where the max was exceeded. Its some kind of maxhold, and it's more
efficient for large vectors. Think of the unnecessary steps you can
save if you know that of your 750k values the largest is 749998, and
compare max(v(1:750000)) to max(v(749998), v(750000)). And it's
equivalent to your code, as can be seen by the isequal result.

Try it! Do! :-)
Heiko

Subject: Vectorizing data lookback?

From: Bruno Luong

Date: 5 Sep, 2008 07:40:20

Message: 13 of 13

heiko_marx@hotmail.com wrote in message
<74aae18b-e091-4486-8e7c-d8a6e912952b@m44g2000hsc.googlegroups.com>...

>
> Try it! Do! :-)

Your algorithm does not work as specified:

n=25;
data=[10 zeros(1,98) 1];

% original algo
for ii =n+1:length(data)
    if data(ii) > max(data(ii-n:ii-1));
        output(ii) = 1;
    else
        output(ii) = 0;
    end
end

% Heiko's algo
m = max(data(1:n-1));
for ii = n:length(data)
    if data(ii) > m
        output2(ii) = 1;
        m = data(ii);
    else
        output2(ii) = 0;
    end
end

isequal(output, output2) % not equal

% Bruno

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us