Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Vectorising/avoiding complicated loops

Subject: Vectorising/avoiding complicated loops

From: Charles Dillon

Date: 23 Jun, 2011 12:14:04

Message: 1 of 9

As somebody who is only getting used to using MATLAB, I've been reading a lot about how one should try to avoid loops where possible, and that the majority of loops can be avoided by using existing MATLAB functions. However, the examples accompanying such advice are generally quite simple, and I was wondering if there were ways of avoiding more complicated loops, such as ones which loop through various different things using while and for loops, or nested loops with variable numbers of iterations. The following are two examples of the sort of thing I am talking about:

Example 1
train_1_size = numel(train1);
train_2_size = numel(train2);
altcor=0;
place_in_1=train_1_size;
for i= train_2_size:-1:1
    while place_in_1>0&& train1(place_in_1)> train2(i)
        place_in_1=place_in_1-1;
    end
    if place_in_1>0 && train1(place_in_1)<train2(i)
        altcor=altcor+ exp(-(train2(i)-train1(place_in_1))/tau)*f1(place_in_1);
    end
end

This is a loop that runs backwards over the elements of train2, for each element finding the greatest element of train1 which is less than it (which is train1(place_in_1)) and then adding exp(-(train2(i)-train1(place_in_1))/tau)*f1(place_in_1) to the altcor value. train1, train2 and f1 are all vectors of doubles. How one would go about vectorising something like that, I've no idea.

Example 2
for i = 1 : trains_size
    for j = (i+1):trains_size
        this_d =sqrt(sq(i) + sq(j) - 2 * altcor(trains{i},trains{j},fs{i},fs{j},tau));
        d_matrix(i, j) = this_d;
        d_matrix(j, i) = this_d;
    end
end
In this function, I'm calculating the off diagonal entries of a symmetric matrix d_matrix, which are found via the formula this_d =sqrt(sq(i) + sq(j) - 2 * altcor(trains{i},trains{j},fs{i},fs{j},tau)) where this_d represents each entry, and all the values used in the calculation are doubles. Again, I don't know how to approach vectorising it.

Any help, with either loop, would earn unending gratitude. Thank you!
PS: I know it would be more efficient/faster in another language like C++, but I have to write the code in MATLAB.

Subject: Vectorising/avoiding complicated loops

From: ImageAnalyst

Date: 23 Jun, 2011 13:16:43

Message: 2 of 9

I didn't go over all of your code, so there may be a way to optimize
it, but in general it's not possible to avoid all for loops,
particularly those where you have lots of other things going on inside
them, such as other if/while statements, calls to other functions you
wrote, calls to MATLAB functions (dir, image, plot, set, imwrite,
sprintf, etc.), communications with other devices (cameras, ADCs
etc.), etc..

Subject: Vectorising/avoiding complicated loops

From: someone

Date: 23 Jun, 2011 14:10:24

Message: 3 of 9

ImageAnalyst <imageanalyst@mailinator.com> wrote in message <e02fc3c5-f89a-4e5a-8536-329a0c95578b@x38g2000pri.googlegroups.com>...
> I didn't go over all of your code, so there may be a way to optimize
> it, but in general it's not possible to avoid all for loops,
> particularly those where you have lots of other things going on inside
> them, such as other if/while statements, calls to other functions you
> wrote, calls to MATLAB functions (dir, image, plot, set, imwrite,
> sprintf, etc.), communications with other devices (cameras, ADCs
> etc.), etc..

% Also, for loops are not NECESSARILY slower than vectorized code.
% In some cases they are actually faster.
% It depends on size of arrays, available memory, and many other factors.
% The main thing is to make sure all the arrays you "create" within a loop
% are preallocated, if possible.
% In your above code for Example 2, be sure to add something like:

d_matrix = zeros(train_size);

% before the for loops.

Subject: Vectorising/avoiding complicated loops

From: arich82

Date: 23 Jun, 2011 21:03:04

Message: 4 of 9

A lot of the old advice you find online is truly antiquated, as Matlab has become much better with loops since version 6 and the advent of the JIT (Just In Time) accelerator. The most recent version (2011a) has even gotten better about growing vectors inside of loops, although (as mentioned) it's still best to preallocate whenever possible.

That said, vectorizing can really speed things up (in your case, ~15x). There are three incredibly useful functions for vectorizing (aside from standard matrix algebra):
-- histc
-- bsxfun
-- accumarray

(Others, like cellfun and arrayfun, can be useful, but have some performance issues in older versions.) In general, don't optimize unless runtime (or memory) is actually important; clear, readable code is far more important that a 10% speedup. (And if you haven't found it yet, look into the 'profiler' under 'Tools'.)

Here's what I could quickly come up with for you, after making some assumptions about your data (if the assert statements invalid, you'll need to modify the scripts). Both are essentially two-line solutions. Copy, paste, and use the help files.

%%%%%%%
%% Problem 1

% Generate dummy data
tic;
n = 3e2;
rand('seed', 1);
t1 = sort(rand(n, 1)); % assert issorted(t1);
t2 = sort(rand(n, 1)); % assert issorted(t2);
train1 = t1(t1 < t2(end)); % assert max(train1) < max(train2);
train2 = t2(t2 > t1(1)); % assert min(train2) > min(train1);
f1 = rand(size(t1));
tau = rand(1, 1);
toc;


% loopless soln
tic;
[dummy, k] = histc(train2, [train1; inf]); % use train1 as LHS of bins in histogram
altcor_loopless = sum(exp(-(train2 - train1(k))/tau).*f1(k)); % use '.*', not '*'
toc;

% looped soln
tic;
train_1_size = numel(train1);
train_2_size = numel(train2);
altcor=0;
place_in_1=train_1_size;
for i= train_2_size:-1:1
    while place_in_1>0&& train1(place_in_1)> train2(i)
        place_in_1=place_in_1-1;
    end
    if place_in_1>0 && train1(place_in_1)<train2(i)
        altcor=altcor+ exp(-(train2(i)-train1(place_in_1))/tau)*f1(place_in_1);
    end
end
toc;

% verify same answer
abs(altcor - altcor_loopless)


%% Problem 2

% Generate dummy data
tic;
n = 3e2;
rand('seed', 1);
v = rand(n, 1);
A = rand(n, n);
A = 0.5*(A + A.'); % assert issymmetric(A);
toc;

% loopless soln
tic;
V = v(:, ones(1, n)); % note: 'Tony's trick' is not memory efficient, but it's very quick...
D_loopless = sqrt(V + V.' - 2*A);
D_loopless(1:n+1:n*n) = 0; % need to reset diagonal to zero
toc;

% looped soln
% I've modified your original loop since I didn't have your data
% or altcor function, but I think the structure is essentially the same
tic;
D = zeros(n, n);
for i = 1:n
    for j = (i+1):n
        d = sqrt(v(i) + v(j) - 2*A(i, j));
        D(i, j) = d;
        D(j, i) = d;
    end
end
toc;

% verify same answer
sum(abs(D(:) - D_loopless(:)))

% loopless soln 2
% this one's a little esoteric,
% but slightly faster and more memory efficient
tic;
D_loopless_2 = sqrt(bsxfun(@plus, v.', bsxfun(@plus, v, -2*A)));
D_loopless_2(1:n+1:n*n) = 0; % need to reset diagonal to zero
toc;
sum(abs(D(:) - D_loopless_2(:)))
%%%%%%

I might be a little slow at responding, but feel free to ask questions if you're stumped on anything. (Maybe others can help answer questions too.) Good luck.


--Andy

Subject: Vectorising/avoiding complicated loops

From: Rune Allnor

Date: 24 Jun, 2011 05:36:16

Message: 5 of 9

On Jun 23, 2:14 pm, "Charles Dillon" <charl...@live.ie> wrote:
> As somebody who is only getting used to using MATLAB, I've been reading a lot about how one should try to avoid loops where possible, and that the majority of loops can be avoided by using existing MATLAB functions.

Don't follow that advice.

Loops are the nuts and bolst of programming. The remarks you
refer to are the remnants of a decades-old bug / design flaw
of matlab.

As a programmer (and particularly a novice one) your task
is to write programs that

1) Are understandable to a human reader.
2) Work.

In that order! Yes, I am serious. By far the most programs
evolve through stages when they *don't* work. Getting
non-working incomprehensable code to work is a mess,
compared to non-working *comprehensable* code.

So learn how to write *comprehensable* code first. Use those
loops. Only when / if you see that the code runs too slow
do you start messing with optimization.

And do remember that except for certain numerical tasks,
matlab is one of the slowest languages around. One doesn't
use matlab for its run-time efficiency. If you want *fast*
code, use one of the compiled languages.

Rune

Subject: Vectorising/avoiding complicated loops

From: Charles Dillon

Date: 24 Jun, 2011 08:39:04

Message: 6 of 9

Thanks everybody for the help and advice! I think I'm best leaving the code as it is, as arich82's code didn't work on first attempt, and my lack of understanding of exactly what is going on in those functions means I don't really know how to go about fixing it. Given that, and the fact that I need to explain how this code works to people (which means that I also have to understand how it works), I think I'm best leaving it as it is. Thank you everyone for all your help!

Subject: Vectorising/avoiding complicated loops

From: arich82

Date: 24 Jun, 2011 18:45:20

Message: 7 of 9

"Charles Dillon" wrote in message <iu1if8$ahb$1@newscl01ah.mathworks.com>...
> Thanks everybody for the help and advice! I think I'm best leaving the code as it is, as arich82's code didn't work on first attempt, and my lack of understanding of exactly what is going on in those functions means I don't really know how to go about fixing it. Given that, and the fact that I need to explain how this code works to people (which means that I also have to understand how it works), I think I'm best leaving it as it is. Thank you everyone for all your help!
---------

I probably should have explained things a bit better, but was short on time yesterday. However, if you're more comfortable with the loops, then you should certainly heed Rune's well-stated advice.

In case you're curious (or for posterity's sake for future readers), I can try to explain the two solutions.

Problem 1:
HISTC effectively bins things to make a histogram. For
[~, k] = histc(train2, train1);
the second output 'k' has the same length as train2, and tells you which bin each element of train2 got sorted into.

The left-hand-side of each bin was specified by train1, s.t.
train1(i) <= i-th bin members < train1(i+1)
so we know that, if the j-th element train2(j) belonged to the i-th bin, i.e.
k(j) == i
then the max element in train1 that's less than train2(j) is train1(i).

'k' now tells us which element in train1 satisfies your criterion for each element of train2, so a vector of all the arguments for the exp is simply
dt = train2 - train1(k).
and the full solution is
altcor = sum(exp(-dt/tau).*f1(k))

(There is a slight subtlety in that I padded the bins with inf to make sure train2 didn't go out of range.) If this didn't work for you, my guess is one of two things happened:
-- min(train1) ~< min(train2)
-- or I got burned by the '<=' (your loop was strictly 'less than').
(-- or possibly your data was stored in a row vec, not column, but that's fixed below by sanitizing with the colon operator...)

If min(train2) is too small, then there are no elements in train1 which meet your criteria, and k could contain zeros (which can't be used for indexing). However, you can simply throw these train2 values away before calling histc since your looped solution would have ignored them. To do this, you can use logical indexing to only keep the elements which satisfy the min criteria:
train2 = train2(train2 > train1(1));
% or
train2(train2 > train1(1)) = []; % equivalent
% or, using find
trim = find(train2 > train1(1), 1, 'first');
train2(1:trim-1) = [];
% or slower, but more intuitive
trim = find(train2 < train1(1), 1, 'last');
train2(1:trim) = [];

If it's the second case where members of train2 fell exactly on top of train1 (unlikely with real data, but possible with integer numbers), then it would be a little trickier to use histc. In hindsight, your loop actually has a hole in the logic statements for the case train1(place_in_1) == train2(i) since it would fail both the 'while' and 'if' criteria. My solution would keep these cases where dt == 0, but since exp(0) ~= 0, so my answer could differ from yours. Assuming this was your intent, the more robust and obviated solution would be:

%% Example 1
train2(train2 < train1(1)) = []; % trim train2 so min(train1) < min(train2)
[dummy, k] = histc(train2, [train1(:); inf]); % sort train2 into bins according to train1
dt = (train2 - train1(k)); % compute argument for exp
mask = (dt ~= 0); % create mask to ignore cases where dt ==0
altcor = sum(exp(-dt(mask)/tau).*f1(k(mask))); % compute altcor
%%%


Problem 2:
For a vector, the operation U(i, j) = v(i) + v(j) can be performed many ways, including using a loop like you did. [Note that U is obviously symmetric, as is your looped output matrix D.] Another method is to use Tony's trick. Examine the following outputs:

v = [1:5].'; % column vec
v(:, 1)
v(:, [1, 1])
v(:, [1, 1, 1])
v(:, ones(1, 3)]

This simply replicates the 1st column for each '1' in the second argument. We can then set:
V = v(:, ones(1, n)]
U = V + V.' % equivalent to U(i, j) = v(i) + v(j)

An alternative is
U = bsxfun(@plus, v, v.');

You can essentially think of this as the the "dyadic addition" of two vectors, in that it's analogous to the dyadic product:
v*v.' == bsxfun(@times, v, v.') == kron(v, v.')

For bsxfun(@plus, v, v.'), you get every combination of 'v(i) + v(j)' stored as U(i, j). You can now write the argument of your sqrt as 'U - 2*A', i.e.
D = sqrt(bsxfun(@plus, v, v.') - 2*A); % this is how I should have written it before

Since we've performed the additions on the full matrices but you only wanted the off the diagonal terms, we must reset the diagonal to zeros. The easiest way to do this is using linear indices. Rather than accessing D(i, j), we can treat D(k) as a column-wise vector.
Examine the following:
D = [1, 2, 3; 4, 5, 6; 7, 8, 9]
D(:)
D(1:4:end) = 0
or in general,
D(1:n+1:end) = 0;

The final code is then:
%% Example 2
U = bsxfun(@plus, v(:), v(:).'); % compute U(i, j) = v(i) + v(j)
D = sqrt(U - 2*A); % compute D(i, j) = sqrt(v(i) + v(j) - 2*A(i, j))
D(1:n+1:end) = 0; % set diagonal of D to zeros
%%%

Short of a typo, I'm not sure there are too many ways for this method to fail with your data, assuming A is indeed symmetric. (Both of these examples work with my dummy data, but I did make some hefty assumptions about the nature of your real data. I'd be interested in knowing what errors you received if you take the time to review all this.)

I hope this helps somebody someday, but I'd agree that if you need to explain your code to others, it's probably best to leave it as is...


--Andy

Subject: Vectorising/avoiding complicated loops

From: Charles Dillon

Date: 27 Jun, 2011 10:56:05

Message: 8 of 9

"arich82" wrote in message <iu2m00$ndf$1@newscl01ah.mathworks.com>...
> "Charles Dillon" wrote in message <iu1if8$ahb$1@newscl01ah.mathworks.com>...
> > Thanks everybody for the help and advice! I think I'm best leaving the code as it is, as arich82's code didn't work on first attempt, and my lack of understanding of exactly what is going on in those functions means I don't really know how to go about fixing it. Given that, and the fact that I need to explain how this code works to people (which means that I also have to understand how it works), I think I'm best leaving it as it is. Thank you everyone for all your help!
> ---------
>
> I probably should have explained things a bit better, but was short on time yesterday. However, if you're more comfortable with the loops, then you should certainly heed Rune's well-stated advice.
>
> In case you're curious (or for posterity's sake for future readers), I can try to explain the two solutions.
>
> Problem 1:
> HISTC effectively bins things to make a histogram. For
> [~, k] = histc(train2, train1);
> the second output 'k' has the same length as train2, and tells you which bin each element of train2 got sorted into.
>
> The left-hand-side of each bin was specified by train1, s.t.
> train1(i) <= i-th bin members < train1(i+1)
> so we know that, if the j-th element train2(j) belonged to the i-th bin, i.e.
> k(j) == i
> then the max element in train1 that's less than train2(j) is train1(i).
>
> 'k' now tells us which element in train1 satisfies your criterion for each element of train2, so a vector of all the arguments for the exp is simply
> dt = train2 - train1(k).
> and the full solution is
> altcor = sum(exp(-dt/tau).*f1(k))
>
> (There is a slight subtlety in that I padded the bins with inf to make sure train2 didn't go out of range.) If this didn't work for you, my guess is one of two things happened:
> -- min(train1) ~< min(train2)
> -- or I got burned by the '<=' (your loop was strictly 'less than').
> (-- or possibly your data was stored in a row vec, not column, but that's fixed below by sanitizing with the colon operator...)
>
> If min(train2) is too small, then there are no elements in train1 which meet your criteria, and k could contain zeros (which can't be used for indexing). However, you can simply throw these train2 values away before calling histc since your looped solution would have ignored them. To do this, you can use logical indexing to only keep the elements which satisfy the min criteria:
> train2 = train2(train2 > train1(1));
> % or
> train2(train2 > train1(1)) = []; % equivalent
> % or, using find
> trim = find(train2 > train1(1), 1, 'first');
> train2(1:trim-1) = [];
> % or slower, but more intuitive
> trim = find(train2 < train1(1), 1, 'last');
> train2(1:trim) = [];
>
> If it's the second case where members of train2 fell exactly on top of train1 (unlikely with real data, but possible with integer numbers), then it would be a little trickier to use histc. In hindsight, your loop actually has a hole in the logic statements for the case train1(place_in_1) == train2(i) since it would fail both the 'while' and 'if' criteria. My solution would keep these cases where dt == 0, but since exp(0) ~= 0, so my answer could differ from yours. Assuming this was your intent, the more robust and obviated solution would be:
>
> %% Example 1
> train2(train2 < train1(1)) = []; % trim train2 so min(train1) < min(train2)
> [dummy, k] = histc(train2, [train1(:); inf]); % sort train2 into bins according to train1
> dt = (train2 - train1(k)); % compute argument for exp
> mask = (dt ~= 0); % create mask to ignore cases where dt ==0
> altcor = sum(exp(-dt(mask)/tau).*f1(k(mask))); % compute altcor
> %%%
>
>
> Problem 2:
> For a vector, the operation U(i, j) = v(i) + v(j) can be performed many ways, including using a loop like you did. [Note that U is obviously symmetric, as is your looped output matrix D.] Another method is to use Tony's trick. Examine the following outputs:
>
> v = [1:5].'; % column vec
> v(:, 1)
> v(:, [1, 1])
> v(:, [1, 1, 1])
> v(:, ones(1, 3)]
>
> This simply replicates the 1st column for each '1' in the second argument. We can then set:
> V = v(:, ones(1, n)]
> U = V + V.' % equivalent to U(i, j) = v(i) + v(j)
>
> An alternative is
> U = bsxfun(@plus, v, v.');
>
> You can essentially think of this as the the "dyadic addition" of two vectors, in that it's analogous to the dyadic product:
> v*v.' == bsxfun(@times, v, v.') == kron(v, v.')
>
> For bsxfun(@plus, v, v.'), you get every combination of 'v(i) + v(j)' stored as U(i, j). You can now write the argument of your sqrt as 'U - 2*A', i.e.
> D = sqrt(bsxfun(@plus, v, v.') - 2*A); % this is how I should have written it before
>
> Since we've performed the additions on the full matrices but you only wanted the off the diagonal terms, we must reset the diagonal to zeros. The easiest way to do this is using linear indices. Rather than accessing D(i, j), we can treat D(k) as a column-wise vector.
> Examine the following:
> D = [1, 2, 3; 4, 5, 6; 7, 8, 9]
> D(:)
> D(1:4:end) = 0
> or in general,
> D(1:n+1:end) = 0;
>
> The final code is then:
> %% Example 2
> U = bsxfun(@plus, v(:), v(:).'); % compute U(i, j) = v(i) + v(j)
> D = sqrt(U - 2*A); % compute D(i, j) = sqrt(v(i) + v(j) - 2*A(i, j))
> D(1:n+1:end) = 0; % set diagonal of D to zeros
> %%%
>
> Short of a typo, I'm not sure there are too many ways for this method to fail with your data, assuming A is indeed symmetric. (Both of these examples work with my dummy data, but I did make some hefty assumptions about the nature of your real data. I'd be interested in knowing what errors you received if you take the time to review all this.)
>
> I hope this helps somebody someday, but I'd agree that if you need to explain your code to others, it's probably best to leave it as is...
>
>
> --Andy
Thank you! I understood and implemented your solution to the second example, but because I still needed to run a pair of loops to get A (which is the matrix of values returned by the altcor function) the increase in speed was negligible (over 200 runs of the programme (which includes many other parts, but this one is the time consuming part, accounting for ~85% of the runtime), the speed went from 76.7 seconds to 76.1 seconds). Nevertheless, that's a very cool way of replacing the loop.Thanks!
As regards the first one, the problem lies with instances where min(train1) ~< min(train2). When you fixed that, it worked perfectly, but unfortunately much more slowly than with the loops. It took 307 seconds instead of 76.

Subject: Vectorising/avoiding complicated loops

From: arich82

Date: 28 Jun, 2011 02:01:08

Message: 9 of 9

"Charles Dillon" wrote in message <iu9nk5$m48$1@newscl01ah.mathworks.com>...
> Thank you! I understood and implemented your solution to the second example, but because I still needed to run a pair of loops to get A (which is the matrix of values returned by the altcor function) the increase in speed was negligible (over 200 runs of the programme (which includes many other parts, but this one is the time consuming part, accounting for ~85% of the runtime), the speed went from 76.7 seconds to 76.1 seconds). Nevertheless, that's a very cool way of replacing the loop.Thanks!
> As regards the first one, the problem lies with instances where min(train1) ~< min(train2). When you fixed that, it worked perfectly, but unfortunately much more slowly than with the loops. It took 307 seconds instead of 76.
-------

Further evidence that vectorizing isn't always faster. For the toy data I showed, my code ran 15x faster on my MacBook 2.1, 2Gb RAM, and Matlab 2007a. Depending on whether you're using a newer version of Matlab, and whether or not your data fits in memory, I could easily believe that the looped version runs 4x faster.

I'm glad you were able to understand and test both versions.


--Andy

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us