Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Failure of Tony's Trick

Subject: Failure of Tony's Trick

From: Bronson

Date: 26 Sep, 2009 16:28:04

Message: 1 of 10

Last week, I endeavored to see how much of an improvement "Tony's trick" offered over the more brute force calculation.

For those in the know, Tony's trick amounts to replacing code such as

N=1e6;
h=4; % this is just a scalar
V=h*ones(N,1) ; % this fills the V vector

with the supposedly faster:

N=1e6;
h=4; % this is just a scalar
V=h(ones(N,1)) ; % this fills the V vector


The subtle difference being that V is built via indexing instead of performing the multiplication of h*1, N times. Surprisingly, I've noted a stark failure in tony's trick as the dimension gets higher. This examination was completed via R14. I'm looking for an explanation (an error in my code is a perfectly acceptable explanation) for why tony's trick is no longer working.

Further Drea Thomas (mathworks employee) posted the following:

So, why use Tony's trick? It is usually faster. The second method
performs unnecessary matrix multiplication. This technique is most
useful when you have large matrices and the operation is inside
your inner-most loop (where speed is critical). Let's test this
hypothesis.
a = rand(1,100);
I = ones(100,1);
tic;V=I*a;toc
elapsed_time = 0.0315
tic;V=a(I,:);toc
elapsed_time = 0.0115

There are two issues with her code. One is that she doesn't empty V before the second fill, so there's an obvious memory allocation bias. The second problem, which is extensively more damning, is that these results have not held for R14 on two different local machines, i.e. the tic-toc time is FASTER for the matrix multiply.

This is the code for both tests, one for the vector fill and one for Drea's matrix construction. This code compares the two, clears V after every fill, and outputs a histogram of the results.

The vector comparison code:
%This code compares the speed of replicating a vector via h*ones(m,1) versus
% h(ones(m,1)). The algorithm fills and clears a vector b, N times, and
% plots the histogram results.

close all
clear

N = 1000; % the number of trials
m = 1e7; % the vector length
h = 4; % assignment of a test number

time=zeros(N,2); %preallocation of memory for the resulting toc times

disp('-------h*ones(m,1) Analysis-------')
for ncnt=1:N
    clear b %clear the b vector at the beginning of every loop
    tic
    b = h*ones(m,1); %populate b
    time(ncnt,1)=toc; %store the toc time
end
disp('done.')

disp('-------h(ones(m,1)) Analysis-------')
for ncnt=1:N
    clear b
    tic
    b=h(ones(m,1));
    time(ncnt,2)=toc;
end
disp('done.')

%plotting normalized histograms
subplot(2,1,1)
[n,r]=hist(time(:,1),40); %histogram
bar(r, 1./(r(2)-r(1))*n/sum(n)); %normalized histogram
title('using h*ones(m,1)')
xlabel('tic-toc time')

subplot(2,1,2)
[n,r]=hist(time(:,2),40);
bar(r, 1./(r(2)-r(1))*n/sum(n));
title('using h(ones(m,1))')
xlabel('tic-toc time')
%end of file

And the matrix code-------------------------------------------------------------------------------------
%this code is directly from drea thomas at mathworks, I just repeat the
%calculation and create a histogram.
close all
clear
clc

N = 1000; % number of trials
m = 1e3; % size of vector

time=zeros(N,2); % preallocate memory

for p=1:N % this is the comparison loop
        a = rand(1,m); % generate a random row vector a
        I = ones(m,1); % generate a ones column vector I
    clear V % clear the V variable to avoid a memory allocation bias
            tic;V=I*a; time(p,1)=toc; %tic-toc the time to create V
    % clear the V variable to avoid a memory allocation bias
            tic;V=a(I,:); time(p,2)=toc; %tic-toc the time to create V
end

%%plotting normalized histograms
subplot(2,1,1)
[n,r]=hist(time(:,1),40); %histogram
bar(r, 1./(r(2)-r(1))*n/sum(n)); %normalized histogram
title('using V=I*a')
xlabel('tic-toc time')
xlim([0 mean(time(:,1))*10])
grid on

subplot(2,1,2)
[n,r]=hist(time(:,2),40);
bar(r, 1./(r(2)-r(1))*n/sum(n));
title('using V=a(I,:)')
xlabel('tic-toc time')
xlim([0 mean(time(:,1))*10])
grid on
%end of file


I see in the vector version a HUGE speed advantage of the matrix multiply. The matrix version shows tony's trick offering a "slight" advantage for very small dimensions (m<100), but for anything big, the matrix multiply starts to pull away.

I encourage readers to run the code themselves (10-15 minutes) and come to an explanation. Maybe an issue with the tic-toc time? I don't know, but this is certainly something that needs to be addressed.

Thanks all,

Bronson

Subject: Failure of Tony's Trick

From: TideMan

Date: 27 Sep, 2009 09:41:22

Message: 2 of 10

On Sep 27, 5:28 am, "Bronson " <Bargyle13REMOVET...@gsb.columbia.edu>
wrote:
> Last week, I endeavored to see how much of an improvement "Tony's trick" offered over the more brute force calculation.  
>
> For those in the know,  Tony's trick amounts to replacing code such as
>
> N=1e6;
> h=4;                   % this is just a scalar
> V=h*ones(N,1) ;  % this fills the V vector
>
> with the supposedly faster:
>
> N=1e6;
> h=4;                   % this is just a scalar
> V=h(ones(N,1)) ;  % this fills the V vector
>
> The subtle difference being that V is built via indexing instead of performing the multiplication of h*1, N times.  Surprisingly, I've noted a stark failure in tony's trick as the dimension gets higher.  This examination was completed via R14.  I'm looking for an explanation (an error in my code is a perfectly acceptable explanation) for why tony's trick is no longer working.  
>
> Further Drea Thomas (mathworks employee) posted the following:
>
> So, why use Tony's trick? It is usually faster. The second method
> performs unnecessary matrix multiplication. This technique is most
> useful when you have large matrices and the operation is inside
> your inner-most loop (where speed is critical). Let's test this
> hypothesis.
> a = rand(1,100);
> I = ones(100,1);
> tic;V=I*a;toc
> elapsed_time = 0.0315
> tic;V=a(I,:);toc
> elapsed_time = 0.0115
>
> There are two issues with her code.  One is that she doesn't empty V before the second fill, so there's an obvious memory allocation bias.  The second problem, which is extensively more damning, is that these results have not held for R14 on two different local machines, i.e. the tic-toc time is FASTER for the matrix multiply.  
>
> This is the code for both tests, one for the vector fill and one for Drea's matrix construction.  This code compares the two, clears V after every fill, and outputs a histogram of the results.  
>
> The vector comparison code:
> %This code compares the speed of replicating a vector via h*ones(m,1) versus
> % h(ones(m,1)).  The algorithm fills and clears a vector b, N times, and
> % plots the histogram results.
>
> close all
> clear
>
> N = 1000;   % the number of trials
> m = 1e7;  % the vector length
> h = 4;    % assignment of a test number
>
> time=zeros(N,2);  %preallocation of memory for the resulting toc times
>
> disp('-------h*ones(m,1) Analysis-------')
> for ncnt=1:N
>     clear b  %clear the b vector at the beginning of every loop
>     tic
>     b = h*ones(m,1);    %populate b
>     time(ncnt,1)=toc;   %store the toc time
> end
> disp('done.')
>
> disp('-------h(ones(m,1)) Analysis-------')
> for ncnt=1:N
>     clear b
>     tic
>     b=h(ones(m,1));
>     time(ncnt,2)=toc;
> end
> disp('done.')
>
> %plotting normalized histograms
> subplot(2,1,1)
> [n,r]=hist(time(:,1),40);        %histogram
> bar(r, 1./(r(2)-r(1))*n/sum(n)); %normalized histogram
> title('using h*ones(m,1)')
> xlabel('tic-toc time')
>
> subplot(2,1,2)
> [n,r]=hist(time(:,2),40);
> bar(r, 1./(r(2)-r(1))*n/sum(n));
> title('using h(ones(m,1))')
> xlabel('tic-toc time')
> %end of file
>
> And the matrix code-------------------------------------------------------------------------------------
> %this code is directly from drea thomas at mathworks, I just repeat the
> %calculation and create a histogram.
> close all
> clear
> clc
>
> N = 1000; % number of trials
> m = 1e3;  % size of vector
>
> time=zeros(N,2);  % preallocate memory
>
> for p=1:N   % this is the comparison loop
>         a = rand(1,m);   % generate a random row vector a
>         I = ones(m,1);   % generate a ones column vector I
>     clear V % clear the V variable to avoid a memory allocation bias
>             tic;V=I*a; time(p,1)=toc;    %tic-toc the time to create V
>     % clear the V variable to avoid a memory allocation bias
>             tic;V=a(I,:); time(p,2)=toc; %tic-toc the time to create V
> end
>
> %%plotting normalized histograms
> subplot(2,1,1)
> [n,r]=hist(time(:,1),40);        %histogram
> bar(r, 1./(r(2)-r(1))*n/sum(n)); %normalized histogram
> title('using V=I*a')
> xlabel('tic-toc time')
> xlim([0 mean(time(:,1))*10])
> grid on
>
> subplot(2,1,2)
> [n,r]=hist(time(:,2),40);
> bar(r, 1./(r(2)-r(1))*n/sum(n));
> title('using V=a(I,:)')
> xlabel('tic-toc time')
> xlim([0 mean(time(:,1))*10])
> grid on
> %end of file
>
> I see in the vector version a HUGE speed advantage of the matrix multiply.  The matrix version shows tony's trick offering a "slight" advantage for very small dimensions (m<100), but for anything big, the matrix multiply starts to pull away.
>
> I encourage readers to run the code themselves (10-15 minutes) and come to an explanation.  Maybe an issue with the tic-toc time?  I don't know, but this is certainly something that needs to be addressed.
>
> Thanks all,
>
> Bronson

Please explain why you think this is so important.
And why you say "this is certainly something that needs to be
addressed".

I'm afraid I'm a rather prosaic sort of chap.
I tend to use the best/fastest method I can find and move on, without
worrying about the details of why it's the best. I've found this way
makes money, whereas trying to figure out how many angels can dance on
the head of a pin is time-wasting and unproductive.

Subject: Failure of Tony's Trick

From: Bruno Luong

Date: 27 Sep, 2009 10:41:04

Message: 3 of 10

Who is Tony?

Here is something you might play with and further include it in your analysis (call the below Bruno's trick if you like).

V = zeros(N,1,'double');
V(:) = h;

I must admit I don't have courage to go over your code. All I know is computer is more or less a deterministic machine. If we find something "strange" that not falls in our expectation, likely we must do a wrong assumption(s) that lead(s) to this expectation. As my colleague often remind me: The problem is likely somewhere between the keyboard and the chair.

Bruno

Subject: Failure of Tony's Trick

From: Matt Fig

Date: 27 Sep, 2009 15:54:01

Message: 4 of 10

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <h9nfg0$418$1@fred.mathworks.com>...
> Who is Tony?



This has been around a long time. See these for a background:

http://blogs.mathworks.com/loren/2006/03/01/more-on-expansion-arrayfun/
http://blogs.mathworks.com/loren/2006/02/22/scalar-expansion-and-more-take-2/

I also have not run the OP's code, though I will probably do it later (with the method Bruno showed, as this was my first thought also). But, given that Tony's trick is very old (pre-MATLAB 5?) I am surprised that Bronson is surprised that this may not be the current fastest approach. When Tony's trick was discovered, For loops were unquestionably slower than equivalent vectorized code, for example. A lot has changed since then, and this may just be one of many things that no longer holds since the early days. Only a careful analysis will tell.

Subject: Failure of Tony's Trick

From: Steve Eddins

Date: 28 Sep, 2009 11:49:34

Message: 5 of 10

Bronson wrote:
> Last week, I endeavored to see how much of an improvement "Tony's
> trick" offered over the more brute force calculation.
>
> For those in the know, Tony's trick amounts to replacing code such
> as
>
> N=1e6;
> h=4; % this is just a scalar
> V=h*ones(N,1) ; % this fills the V vector
>
> with the supposedly faster:
>
> N=1e6;
> h=4; % this is just a scalar
> V=h(ones(N,1)) ; % this fills the V vector
>
>
> The subtle difference being that V is built via indexing instead of
> performing the multiplication of h*1, N times. Surprisingly, I've
> noted a stark failure in tony's trick as the dimension gets higher.
> This examination was completed via R14. I'm looking for an
> explanation (an error in my code is a perfectly acceptable
> explanation) for why tony's trick is no longer working.
>
> Further Drea Thomas (mathworks employee) posted the following:
>
> So, why use Tony's trick? It is usually faster. The second method
> performs unnecessary matrix multiplication. This technique is most
> useful when you have large matrices and the operation is inside your
> inner-most loop (where speed is critical). Let's test this
> hypothesis. a = rand(1,100); I = ones(100,1); tic;V=I*a;toc
> elapsed_time = 0.0315 tic;V=a(I,:);toc elapsed_time = 0.0115
>
> There are two issues with her code. One is that she doesn't empty V
> before the second fill, so there's an obvious memory allocation bias.
> The second problem, which is extensively more damning, is that these
> results have not held for R14 on two different local machines, i.e.
> the tic-toc time is FASTER for the matrix multiply.
 >
 > [rest of explanation and code samples snipped]

Hi Bronson,

Drea (a guy, by the way), hasn't worked at MathWorks for a very long
time - 13 years or so? I don't know how you came across this post, but
it's VERY old. MATLAB has undergone several generations of substantial
changes since then.

---
Steve Eddins
http://blogs.mathworks.com/steve/

Subject: Failure of Tony's Trick

From: Matt Fig

Date: 28 Sep, 2009 22:11:02

Message: 6 of 10

Ok, so I finally got around to looking this over. Here is a link to the OP's reading matter, dating to 1995:

http://www.ee.columbia.edu/~marios/matlab/Vectorization.pdf

This is a good study in how things change. Drea gives 3 main tips for performance throughout the document, and it is interesting to look at how they work out today.

1. Drea discusses negation of certain elements of a matrix, subject to a certain criteria. He offers a double FOR loop and use of the FIND function. I tested on two of my machines, both running 2007b, and found the FOR loop to be faster with a 3000-by-3000 matrix. In addition, I wonder why Drea used FIND instead of logical indexing? Maybe it didn't work back then. Here is the code, with a typical rel_times = [1, 2.44, 1.25]

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [] = dreas_desk1()
% Indexing example.
S = 3000; % Size of matrix.
rand('twister',5340); % Compare the same matrices.
M = round(rand(S)*100); % Different results for, say *10: For loop is even better.
T = zeros(1,3); % for timings
tic
for ii=1:S,
    for jj=1:S,
        if (M(ii,jj) > 4)
            M(ii,jj) = -M(ii,jj);
        end
    end
end
T(1) = toc;
rand('twister',5340);
M = round(rand(S)*100);
tic
ind = find(M > 4);
M(ind)=-M(ind);
T(2) = toc;
rand('twister',5340);
M = round(rand(S)*100);
tic
ind = M > 4;
M(ind) = -M(ind);
T(3) = toc;
rel_times = T./min(T) % display the relative times.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%




2. Drea then discusses the removal of whitespace from a string. He offers a WHILE loop, FINDSTR, and the use of the FILTER function. His conclusion is that FINDSTR is slower than FILTER. Not on my machine. Here is the code, with typical rel_times = [1663.6 3.75 1]

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [] = dreas_desk2()
% Remove extra whitespace
V = 'I really hate extra white space. ';
V = repmat(V,1,500);
T = zeros(1,3);
tic
len = length(V);
ii = 1;
while (ii<len),
    if (V(ii) == ' ' && V(ii+1) == ' ')
        for jj = ii:(len-1),
            V(jj) = V(jj+1);
        end
        V(len)=0; % Or something that will shorten the vector
        len=len-1;
    else
        ii=ii+1;
    end
end
V = char(V); % The modern setstr
T(1) = toc;
V = 'I really hate extra white space.';
V = repmat(V,1,500);
tic
ind = find(filter([1 1],2,V==' ')==1);
V(ind)= [];
T(2) = toc;
V = 'I really hate extra white space.';
V = repmat(V,1,500);
tic
ind = findstr(V,' ');
V(ind)= [];
T(3) = toc;
rel_times = T./min(T)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%



3. Finally, Drea discusses subtracting the column mean from each column in a matrix. This is of course tailor made for bsxfun, which I know didn't exist in 1995. Again he concludes that indexing is faster than a FOR loop. Not on my machine. Even bsxfun barely beats Drea's FOR loop solution. Here is the code with typical rel_times = [1.05 7.39 1]

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [] = dreas_desk3()
% Subtract the column mean from a matrix.
S = 2000;
M = rand(S);
V = mean(M);
T = zeros(1,3);
tic
for ii=1:S,
    M(:,ii) = M(:,ii) - V(ii);
end
T(1) = toc;
M = rand(S);
V = mean(M);
tic
M = M - V(ones(S,1,'int8'),:);
T(2) = toc;
M = rand(S);
V = mean(M);
tic
M = bsxfun(@minus,M,V);
T(3) = toc;
rel_times = T./min(T)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%



Which at last brings me to Tony's trick. I found that Tony's trick is indeed slower for scalar expansion, such as:

h = 3;
V = h(ones(3))

But the story is more complex when expanding a vector. For instance, when expanding a column vector, Tony's trick is faster than the multiplication method shown by Drea. When expanding a row vector, as Bronson did, Tony's trick is not as fast as the multiplication method. So I tested if permuting the row vector coupled with Tony's trick on a colon indexed row vector could beat the multiplication method. The results are that this is faster than the multiplication method up to size about 1200, then the the multiplication method starts to pull away on my machine. Here is the code with typical
rel_times =
       1.9636 1
       1.1837 1
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [] = dreas_desk4()
% REPMAT a vector
S = 1000;
a = rand(S,1); % A column vector
T = zeros(2,2);
tic
V = a*ones(1,S);
T(1,1) = toc;
clear V
tic
V = a(:,ones(S,1,'int8'));
T(1,2) = toc;
a = rand(1,S); % A row vector.
tic
V = ones(S,1)*a;
T(2,1) = toc;
clear V
tic
V = a(:); % Make a column vector, since we know this is fast.
V = permute(V(:,ones(S,1,'int8')),[2 1]); % take the transpose.
T(2,2) = toc;
rel_times = bsxfun(@rdivide,T,min(T,[],2))
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%



So if Drea were writing this document today, some things would definitely need to be changed. As for folks like Bronson, beware reading 14 year old MATLAB performance tips!

Subject: Failure of Tony's Trick

From: Bronson

Date: 22 Oct, 2009 21:07:03

Message: 7 of 10

Dear all,
    The point is rather simple, and it appears that most of the readers have missed it. 14 year old documentation or not, unless you've been living in a barn, Tony's trick is touted as ALWAYS being faster for this sort of situation. I'm looking for an explanation for why this apparently fails, not examples of it failing (though, I appreciate the solidarity of results with my own analysis). I really don't see the relevancy of a memory preallocation game. As for folks like Matt, I haven't said anything about fastest; it's an issue of faster. And though I do appreciate an annotated summary of Drea's previous work, responses that actually respond to my question would be much appreciated.

Subject: Failure of Tony's Trick

From: Matt

Date: 23 Oct, 2009 00:47:18

Message: 8 of 10

"Bronson " <Bargyle13REMOVETHIS@gsb.columbia.edu> wrote in message <hbqhhn$1vj$1@fred.mathworks.com>...
> Dear all,
> The point is rather simple, and it appears that most of the readers have missed it. 14 year old documentation or not, unless you've been living in a barn, Tony's trick is touted as ALWAYS being faster for this sort of situation. I'm looking for an explanation for why this apparently fails, not examples of it failing (though, I appreciate the solidarity of results with my own analysis).
==============

Well, I for one am finding that Tony's trick always fails both for N small (100) and N large (5e7) with straight multiplying being about twice as fast. So this could indeed be a matter of architecture or upgrades in MATLAB code over time.

To my mind, though, a smartly implemented matrix multiplication should be faster than Tony's trick because indexing operations always require that you do bounds checking and type checking to make sure that each index into h is legitimate.

Conversely, when faced with an expression h*ones(N,1) the code simply knows to loop sequentially through the vector ones(N,1). Furthermore, if the code is smart enough to pre-check whether one of the operands in the multiplication is 1 (the code 14 years ago may not have done so) it could bypass the usual labor required in a multiplication.

Subject: Failure of Tony's Trick

From: Steven Lord

Date: 23 Oct, 2009 14:40:46

Message: 9 of 10


"Bronson " <Bargyle13REMOVETHIS@gsb.columbia.edu> wrote in message
news:hbqhhn$1vj$1@fred.mathworks.com...
> Dear all,
> The point is rather simple, and it appears that most of the readers
> have missed it. 14 year old documentation or not, unless you've been
> living in a barn, Tony's trick is touted as ALWAYS being faster for this
> sort of situation.

By whom? When I search the entire MathWorks website, I find 12 references
to "Tony's trick" -- 10 in postings on Loren's blog (which doesn't discuss
performance of Tony's trick), 1 in a technical support document (which also
doesn't mention performance) and this thread in the MATLAB Central
newsreader. If there's some reference that says that ANY routine is ALWAYS
faster than another, please let us know as it's probably not true and we
shouldn't be saying that (unless the former routine is a subroutine to the
latter.)

> I'm looking for an explanation for why this apparently fails, not
> examples of it failing (though, I appreciate the solidarity of results
> with my own analysis). I really don't see the relevancy of a memory
> preallocation game. As for folks like Matt, I haven't said anything about
> fastest; it's an issue of faster. And though I do appreciate an annotated
> summary of Drea's previous work, responses that actually respond to my
> question would be much appreciated.

As Steve Eddins said, MATLAB has gone through many changes since Drea wrote
that document to which you referred. Off the top of my head, some probably
didn't impact the performance of Tony's trick or if they did the impact was
probably minor:

* the introduction of the MATLAB Desktop
* the introduction of both the old object-oriented class system and the new
OO class system
* maybe even the introduction of Handle Graphics, if that happened after
1995
* etc.

Some of those changes, though, probably did affect how much of an impact
Tony's trick had. A few examples of those changes could be:

* the introduction of N-D and non-double arrays and the modifications to the
source code to support them (It predates my tenure at The MathWorks but I
believe they were introduced in MATLAB 5 which was released in 1997.)
* the switch to use LAPACK instead of LINPACK (MATLAB 6.0, release R12)
http://www.mathworks.com/company/newsletters/news_notes/clevescorner/winter2000.cleve.html
* the change (in MATLB 6.5, release R13) of logical from an attribute to a
class and sparse from a class to an attribute
* the introduction of the JIT accelerator (again, I believe, in R13)
* years and years of bug fixes and new features
* etc.

Any or all of those could have affected the relative performance of Tony's
trick, the matrix multiplication alternative, etc. and I think it would be
difficult or impossible to determine exactly which change had the most
impact.

--
Steve Lord
slord@mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ

Subject: Failure of Tony's Trick

From: Abhinav Gaur

Date: 13 Dec, 2010 07:41:04

Message: 10 of 10

Thanks Bronson
This thread started by you was a very interesting read.

Abhinav

"Bronson " <Bargyle13REMOVETHIS@gsb.columbia.edu> wrote in message <h9lfek$5uf$1@fred.mathworks.com>...
> Last week, I endeavored to see how much of an improvement "Tony's trick" offered over the more brute force calculation.
>
> For those in the know, Tony's trick amounts to replacing code such as
>
> N=1e6;
> h=4; % this is just a scalar
> V=h*ones(N,1) ; % this fills the V vector
>
> with the supposedly faster:
>
> N=1e6;
> h=4; % this is just a scalar
> V=h(ones(N,1)) ; % this fills the V vector
>
>
> The subtle difference being that V is built via indexing instead of performing the multiplication of h*1, N times. Surprisingly, I've noted a stark failure in tony's trick as the dimension gets higher. This examination was completed via R14. I'm looking for an explanation (an error in my code is a perfectly acceptable explanation) for why tony's trick is no longer working.
>
> Further Drea Thomas (mathworks employee) posted the following:
>
> So, why use Tony's trick? It is usually faster. The second method
> performs unnecessary matrix multiplication. This technique is most
> useful when you have large matrices and the operation is inside
> your inner-most loop (where speed is critical). Let's test this
> hypothesis.
> a = rand(1,100);
> I = ones(100,1);
> tic;V=I*a;toc
> elapsed_time = 0.0315
> tic;V=a(I,:);toc
> elapsed_time = 0.0115
>
> There are two issues with her code. One is that she doesn't empty V before the second fill, so there's an obvious memory allocation bias. The second problem, which is extensively more damning, is that these results have not held for R14 on two different local machines, i.e. the tic-toc time is FASTER for the matrix multiply.
>
> This is the code for both tests, one for the vector fill and one for Drea's matrix construction. This code compares the two, clears V after every fill, and outputs a histogram of the results.
>
> The vector comparison code:
> %This code compares the speed of replicating a vector via h*ones(m,1) versus
> % h(ones(m,1)). The algorithm fills and clears a vector b, N times, and
> % plots the histogram results.
>
> close all
> clear
>
> N = 1000; % the number of trials
> m = 1e7; % the vector length
> h = 4; % assignment of a test number
>
> time=zeros(N,2); %preallocation of memory for the resulting toc times
>
> disp('-------h*ones(m,1) Analysis-------')
> for ncnt=1:N
> clear b %clear the b vector at the beginning of every loop
> tic
> b = h*ones(m,1); %populate b
> time(ncnt,1)=toc; %store the toc time
> end
> disp('done.')
>
> disp('-------h(ones(m,1)) Analysis-------')
> for ncnt=1:N
> clear b
> tic
> b=h(ones(m,1));
> time(ncnt,2)=toc;
> end
> disp('done.')
>
> %plotting normalized histograms
> subplot(2,1,1)
> [n,r]=hist(time(:,1),40); %histogram
> bar(r, 1./(r(2)-r(1))*n/sum(n)); %normalized histogram
> title('using h*ones(m,1)')
> xlabel('tic-toc time')
>
> subplot(2,1,2)
> [n,r]=hist(time(:,2),40);
> bar(r, 1./(r(2)-r(1))*n/sum(n));
> title('using h(ones(m,1))')
> xlabel('tic-toc time')
> %end of file
>
> And the matrix code-------------------------------------------------------------------------------------
> %this code is directly from drea thomas at mathworks, I just repeat the
> %calculation and create a histogram.
> close all
> clear
> clc
>
> N = 1000; % number of trials
> m = 1e3; % size of vector
>
> time=zeros(N,2); % preallocate memory
>
> for p=1:N % this is the comparison loop
> a = rand(1,m); % generate a random row vector a
> I = ones(m,1); % generate a ones column vector I
> clear V % clear the V variable to avoid a memory allocation bias
> tic;V=I*a; time(p,1)=toc; %tic-toc the time to create V
> % clear the V variable to avoid a memory allocation bias
> tic;V=a(I,:); time(p,2)=toc; %tic-toc the time to create V
> end
>
> %%plotting normalized histograms
> subplot(2,1,1)
> [n,r]=hist(time(:,1),40); %histogram
> bar(r, 1./(r(2)-r(1))*n/sum(n)); %normalized histogram
> title('using V=I*a')
> xlabel('tic-toc time')
> xlim([0 mean(time(:,1))*10])
> grid on
>
> subplot(2,1,2)
> [n,r]=hist(time(:,2),40);
> bar(r, 1./(r(2)-r(1))*n/sum(n));
> title('using V=a(I,:)')
> xlabel('tic-toc time')
> xlim([0 mean(time(:,1))*10])
> grid on
> %end of file
>
>
> I see in the vector version a HUGE speed advantage of the matrix multiply. The matrix version shows tony's trick offering a "slight" advantage for very small dimensions (m<100), but for anything big, the matrix multiply starts to pull away.
>
> I encourage readers to run the code themselves (10-15 minutes) and come to an explanation. Maybe an issue with the tic-toc time? I don't know, but this is certainly something that needs to be addressed.
>
> Thanks all,
>
> Bronson

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us