Skip to Main Content Skip to Search
Login
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Thread Subject: finding consecutive numbers (runs)

Subject: finding consecutive numbers (runs)

From: William Dampier

Date: 12 Dec, 2007 18:37:39

Message: 1 of 8

I am trying to find a speedy way to find the number of
consecutive 1's in a vector of logicals in with the
following type of output.

input = [1 0 1 1 1 0 1 1 0 0 0 1 0 1]
desired1 = [1 0 3 2 1 0 2 1 0 0 0 1 0 1]
desired2 = [1 0 1 2 3 0 1 2 0 0 0 1 0 1] %the reverse direction

desired1=zeros(size(input));
spots=find(input);
    desired1(1:spots(1))=(spots(1)-1):-1:0;
    for k=2:length(spots)-1
       
desired1(spots(k-1):spots(k))=(spots(k)-spots(k-1)):-1:0;
    end


I understand to avoid for-loops but I didn't see a way
around it in this case. The real input is actually a matrix
of logicals in which I want the to do apply this to each row
individually (so if you see a vectorized solution I would
certainly be appreciative). Currently I find desired2 using
the same algorithm on fliplr(input).

real_input = [1 0 1 1 1 0 0 1 0 0 1 1 0;
                0 1 0 0 1 1 1 0 1 1 0 0 0];
real_desired = [1 0 3 2 1 0 0 1 0 0 2 1 0;
                0 1 0 0 3 2 1 0 2 1 0 0 0];

The real data is also a ~6000 X 9000 matrix so I'd like to
avoid creating too many temporary arrays.

Subject: Re: finding consecutive numbers (runs)

From: Roger Stafford

Date: 12 Dec, 2007 21:06:58

Message: 2 of 8

"William Dampier" <walldo2@gmail.com> wrote in message <fjp9pj$p8g
$1@fred.mathworks.com>...
> I am trying to find a speedy way to find the number of
> consecutive 1's in a vector of logicals in with the
> following type of output.
>
> input = [1 0 1 1 1 0 1 1 0 0 0 1 0 1]
> desired1 = [1 0 3 2 1 0 2 1 0 0 0 1 0 1]
> desired2 = [1 0 1 2 3 0 1 2 0 0 0 1 0 1] %the reverse direction
>
> desired1=zeros(size(input));
> spots=find(input);
> desired1(1:spots(1))=(spots(1)-1):-1:0;
> for k=2:length(spots)-1
>
> desired1(spots(k-1):spots(k))=(spots(k)-spots(k-1)):-1:0;
> end
>
>
> I understand to avoid for-loops but I didn't see a way
> around it in this case. The real input is actually a matrix
> of logicals in which I want the to do apply this to each row
> individually (so if you see a vectorized solution I would
> certainly be appreciative). Currently I find desired2 using
> the same algorithm on fliplr(input).
>
> real_input = [1 0 1 1 1 0 0 1 0 0 1 1 0;
> 0 1 0 0 1 1 1 0 1 1 0 0 0];
> real_desired = [1 0 3 2 1 0 0 1 0 0 2 1 0;
> 0 1 0 0 3 2 1 0 2 1 0 0 0];
>
> The real data is also a ~6000 X 9000 matrix so I'd like to
> avoid creating too many temporary arrays.
-----------------
  If x is an m by n matrix of your inputs, then the following y matrix below
should be the 'desired2' matrix you requested. The 'desired1' matrix can be
obtained by repeating the same process using fliplr(x) in place of x (the
reverse direction of what you mentioned.) I was too lazy to try for a
simultaneous solution of both matrices.

 [m,n] = size(x);
 y = [zeros(1,m);x.'];
 y = y(:);
 p = find(~y);
 y(p) = [0;1-diff(p)];
 y = reshape(cumsum(y),[],m).';
 y(:,1) = [];

Roger Stafford

Subject: Re: finding consecutive numbers (runs)

From: William Dampier

Date: 12 Dec, 2007 21:59:30

Message: 3 of 8

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid>
wrote in message <fjpihi$p2j$1@fred.mathworks.com>...
> "William Dampier" <walldo2@gmail.com> wrote in message
<fjp9pj$p8g
> $1@fred.mathworks.com>...
> > I am trying to find a speedy way to find the number of
> > consecutive 1's in a vector of logicals in with the
> > following type of output.
> >
> > input = [1 0 1 1 1 0 1 1 0 0 0 1 0 1]
> > desired1 = [1 0 3 2 1 0 2 1 0 0 0 1 0 1]
> > desired2 = [1 0 1 2 3 0 1 2 0 0 0 1 0 1] %the reverse
direction
> >
> > desired1=zeros(size(input));
> > spots=find(input);
> > desired1(1:spots(1))=(spots(1)-1):-1:0;
> > for k=2:length(spots)-1
> >
> > desired1(spots(k-1):spots(k))=(spots(k)-spots(k-1)):-1:0;
> > end
> >
> >
> > I understand to avoid for-loops but I didn't see a way
> > around it in this case. The real input is actually a matrix
> > of logicals in which I want the to do apply this to each row
> > individually (so if you see a vectorized solution I would
> > certainly be appreciative). Currently I find desired2 using
> > the same algorithm on fliplr(input).
> >
> > real_input = [1 0 1 1 1 0 0 1 0 0 1 1 0;
> > 0 1 0 0 1 1 1 0 1 1 0 0 0];
> > real_desired = [1 0 3 2 1 0 0 1 0 0 2 1 0;
> > 0 1 0 0 3 2 1 0 2 1 0 0 0];
> >
> > The real data is also a ~6000 X 9000 matrix so I'd like to
> > avoid creating too many temporary arrays.
> -----------------
> If x is an m by n matrix of your inputs, then the
following y matrix below
> should be the 'desired2' matrix you requested. The
'desired1' matrix can be
> obtained by repeating the same process using fliplr(x) in
place of x (the
> reverse direction of what you mentioned.) I was too lazy
to try for a
> simultaneous solution of both matrices.
>
> [m,n] = size(x);
> y = [zeros(1,m);x.'];
> y = y(:);
> p = find(~y);
> y(p) = [0;1-diff(p)];
> y = reshape(cumsum(y),[],m).';
> y(:,1) = [];
>
> Roger Stafford
>

Thanks a whole lot.

It took me a few minutes to realize that if I fliplr(input)
then I'll also have to fliplr the output as well. like this:

input=[1 0 1 1 1 0 1 1 0 0 0 1 0 1];


x=input;
[m,n] = size(x);
y = [zeros(1,m);x.'];
y = y(:);
p = find(~y);
y(p) = [0;1-diff(p)];
y = reshape(cumsum(y),[],m).';
y(:,1) = [];

x=fliplr(input);
[m,n] = size(x);
y2 = [zeros(1,m);x.'];
y2 = y2(:);
p = find(~y2);
y2(p) = [0;1-diff(p)];
y2 = reshape(cumsum(y2),[],m).';
y2(:,1) = [];

desired=[input; y; fliplr(y2)]

This shaves more than 40 seconds off of each of ~3000
iterations (dividing training/testing dataset) so the final
speed up is very large!

Thanks again






Subject: Re: finding consecutive numbers (runs)

From: Roger Stafford

Date: 13 Dec, 2007 05:17:55

Message: 4 of 8

"William Dampier" <walldo2@gmail.com> wrote in message <fjplk2$fgm
$1@fred.mathworks.com>...
> "Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid>
> wrote in message <fjpihi$p2j$1@fred.mathworks.com>...
> > .......
> > [m,n] = size(x);
> > y = [zeros(1,m);x.'];
> > y = y(:);
> > p = find(~y);
> > y(p) = [0;1-diff(p)];
> > y = reshape(cumsum(y),[],m).';
> > y(:,1) = [];
>
> Thanks a whole lot.
> It took me a few minutes to realize that if I fliplr(input)
> then I'll also have to fliplr the output as well. like this:
> .......
--------
  William, it has occurred to me that the 'fliplr' operations are not really
needed at all in the above processing if the appropriate steps are taken.
Besides these two 'fliplr' operations, one 'diff' and one 'find' operation can
also be eliminated. Consequently you may find that, hopefully, a little
execution time will be saved.

  Again, if x is the m by n array whose rows are to be processed in the
specified manner, then do this:

 m = size(x,1);
 y = [reshape([zeros(1,m);x.'],[],1);0];
 z = y;
 p = find(~y);
 d = 1-diff(p);
 y(p) = [0;d];
 y = reshape(cumsum(y(1:end-1)),[],m).';
 y(:,1) = [];
 z(p) = [d;0];
 z = reshape(cumsum(-z(1:end-1)),[],m).';
 z(:,end) = [];

The y array will then be the above-mentioned 'desired2' array, and z will be
the 'desired1' array, both of size m by n.

Roger Stafford

Subject: Re: finding consecutive numbers (runs)

From: William Dampier

Date: 13 Dec, 2007 06:53:40

Message: 5 of 8

> William, it has occurred to me that the 'fliplr'
operations are not really
> needed at all in the above processing if the appropriate
steps are taken.
> Besides these two 'fliplr' operations, one 'diff' and one
'find' operation can
> also be eliminated. Consequently you may find that,
hopefully, a little
> execution time will be saved.
>
> Again, if x is the m by n array whose rows are to be
processed in the
> specified manner, then do this:
>
> m = size(x,1);
> y = [reshape([zeros(1,m);x.'],[],1);0];
> z = y;
> p = find(~y);
> d = 1-diff(p);
> y(p) = [0;d];
> y = reshape(cumsum(y(1:end-1)),[],m).';
> y(:,1) = [];
> z(p) = [d;0];
> z = reshape(cumsum(-z(1:end-1)),[],m).';
> z(:,end) = [];
>
> The y array will then be the above-mentioned 'desired2'
array, and z will be
> the 'desired1' array, both of size m by n.
>
> Roger Stafford
>


Yup that works:
Times for processing the whole matrix:

for-loop (with pre-allocation): ~50 seconds do both
vectorized with fliplr: ~15 seconds
new vectorized: ~10 seconds

A 5-second improvement is still significant since this will
be calculated thousands of times.

Since the input matrix is 5188x9110 and it is ~50-60% full
which makes the temp array p is ~27,100,000 of doubles.

The 'reshape' statement is the only bottleneck. I separated
out the cumsum statement just to check which was the
offending statement.

I've been checking the 'bsxfun' since it might let me avoid
the temp arrays.

On a similar note I've noticed drastically different
performance based on whether the function is implemented as
a stand-alone m-file as opposed to a sub-function.

Stand-alone times:
for-loop (with pre-allocation): ~50 seconds do both
vectorized with fliplr: ~15 seconds
new vectorized: ~10 seconds

subfunction times:
for-loop (with pre-allocation): ~55 seconds do both
vectorized with fliplr: ~120 seconds
new vectorized: ~150 seconds

The choice of implementation is obvious but I'm not sure of
the reason. I would have expected the opposite results. If
you have any idea why this would happen it would help
satisfy my intellectual curiosity.

Thanks
Will


Subject: Re: finding consecutive numbers (runs)

From: William Dampier

Date: 13 Dec, 2007 06:53:44

Message: 6 of 8

> William, it has occurred to me that the 'fliplr'
operations are not really
> needed at all in the above processing if the appropriate
steps are taken.
> Besides these two 'fliplr' operations, one 'diff' and one
'find' operation can
> also be eliminated. Consequently you may find that,
hopefully, a little
> execution time will be saved.
>
> Again, if x is the m by n array whose rows are to be
processed in the
> specified manner, then do this:
>
> m = size(x,1);
> y = [reshape([zeros(1,m);x.'],[],1);0];
> z = y;
> p = find(~y);
> d = 1-diff(p);
> y(p) = [0;d];
> y = reshape(cumsum(y(1:end-1)),[],m).';
> y(:,1) = [];
> z(p) = [d;0];
> z = reshape(cumsum(-z(1:end-1)),[],m).';
> z(:,end) = [];
>
> The y array will then be the above-mentioned 'desired2'
array, and z will be
> the 'desired1' array, both of size m by n.
>
> Roger Stafford
>


Yup that works:
Times for processing the whole matrix:

for-loop (with pre-allocation): ~50 seconds do both
vectorized with fliplr: ~15 seconds
new vectorized: ~10 seconds

A 5-second improvement is still significant since this will
be calculated thousands of times.

Since the input matrix is 5188x9110 and it is ~50-60% full
which makes the temp array p is ~27,100,000 of doubles.

The 'reshape' statement is the only bottleneck. I separated
out the cumsum statement just to check which was the
offending statement.

I've been checking the 'bsxfun' since it might let me avoid
the temp arrays.

On a similar note I've noticed drastically different
performance based on whether the function is implemented as
a stand-alone m-file as opposed to a sub-function.

Stand-alone times:
for-loop (with pre-allocation): ~50 seconds do both
vectorized with fliplr: ~15 seconds
new vectorized: ~10 seconds

subfunction times:
for-loop (with pre-allocation): ~55 seconds do both
vectorized with fliplr: ~120 seconds
new vectorized: ~150 seconds

The choice of implementation is obvious but I'm not sure of
the reason. I would have expected the opposite results. If
you have any idea why this would happen it would help
satisfy my intellectual curiosity.

Thanks
Will


Subject: Re: finding consecutive numbers (runs)

From: Bruno Luong

Date: 13 Dec, 2007 07:09:54

Message: 7 of 8

"William Dampier" <walldo2@gmail.com> wrote in message
<fjqkto$9mc$1@fred.mathworks.com>...

>
> subfunction times:
> for-loop (with pre-allocation): ~55 seconds do both
> vectorized with fliplr: ~120 seconds
> new vectorized: ~150 seconds
>
> The choice of implementation is obvious but I'm not sure of
> the reason. I would have expected the opposite results. If
> you have any idea why this would happen it would help
> satisfy my intellectual curiosity.
>

There is a recent thread that reports slowness of nested
function when handling non-persistence local variable. This
seems to be related to your finding.

Bruno

Subject: Re: finding consecutive numbers (runs)

From: Roger Stafford

Date: 13 Dec, 2007 09:53:18

Message: 8 of 8

"William Dampier" <walldo2@gmail.com> wrote in message <fjqkto$9mc
$1@fred.mathworks.com>...
> ........
> The 'reshape' statement is the only bottleneck. I separated
> out the cumsum statement just to check which was the
> offending statement.
> .......
------------
  Will, I am surprised that you found the three 'reshape' instructions to be
significant bottlenecks. As I understand it, this instruction by itself does not
require any relative location shifts or alterations in memory, but rather an
altered element-indexing scheme that ought to be comparatively rapid.

  In the case of the first 'reshape', I would think that computing the array,
[zeros(1,m);x.'], would have been much the greater computational task. This
requires first, a transpose of x involving as it does a complete rearrangement
in memory, and then followed by an expansion of it by inserting m zeros at
equally-spaced intervals throughout its length which means further extensive
displacements. Perhaps I ought to have written the equivalent expression,
[zeros(m,1),x].', instead, which would allow inserting the m zeros in one
contiguous initial block in memory and then transposing it all afterwards.
(You might try that.) Of course, I am only guessing as to the coding tricks
used by MathWorks' programmers.

  For the second two reshapes it seems to me the final transpose in each case
would cause the greatest delay, again because of the necessary
rearrangement in relative memory locations. The cumsums in each do not
involve rearrangement but simply a cumulative serial addition of successive
elements, which ought to be relatively fast with no fancy indexing schemes
involved.

  I am afraid I cannot claim to be an expert on matlab's underlying array
manipulation algorithms, however, since my own matlab system is seriously
outdated (v4a - 1994.)

Roger Stafford

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
subfunctions William Dampier 13 Dec, 2007 01:55:43
bsxfun William Dampier 13 Dec, 2007 01:55:43
runs William Dampier 12 Dec, 2007 13:40:19
vectorize William Dampier 12 Dec, 2007 13:40:19
rssFeed for this Thread

envelope graphic E-mail this page to a colleague

Public Submission Policy
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.
Related Topics