Path: news.mathworks.com!newsfeed-00.mathworks.com!newsfeed2.dallas1.level3.net!news.level3.com!postnews.google.com!c3g2000yqd.googlegroups.com!not-for-mail
From: roger <northsolomonsea@gmail.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Faster and vectorized
Date: Thu, 29 Oct 2009 05:36:46 -0700 (PDT)
Organization: http://groups.google.com
Lines: 85
Message-ID: <618a4964-ef9e-44c8-8e1d-c9c4b73d7a14@c3g2000yqd.googlegroups.com>
References: <hcbr1h$ne0$1@fred.mathworks.com>
NNTP-Posting-Host: 195.193.213.214
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1256819806 17876 127.0.0.1 (29 Oct 2009 12:36:46 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Thu, 29 Oct 2009 12:36:46 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: c3g2000yqd.googlegroups.com; posting-host=195.193.213.214; 
	posting-account=BbKioQoAAAAt_SMfLTBT5PYUV9nQycia
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 
	InfoPath.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.04506.648; 
	.NET CLR 3.5.21022),gzip(gfe),gzip(gfe)
Xref: news.mathworks.com comp.soft-sys.matlab:580940


On Oct 29, 11:33 am, "Oleg Komarov" <oleg.koma...@hotmail.it> wrote:
> Dear all,
> i've been using a user defined function, written more than  one year ago, without any problem on small datasets. Since I started to work on a bigger dataset the computational time raised significanlty, i ask you therefore to help me to optimize this function.
>
> I need to compute weighted means (by row) grouped into classes
>
> Ssize = 100;
> % Example Inputs:
> % Generate Values
> Values = 10.*rand(Ssize); Values(ceil(100.*rand(Ssize/2))) = NaN;
> % Generate Weights
> Weights = 50000.*rand(Ssize); Weights(ceil(100.*rand(Ssize/2))) = NaN;
> % Generate  Classes to which values belong
> Class = floor(Ssize/10.*rand(Ssize));
>
> tic;
>     Out = WeMean(Values, Weights, Class);
> toc;
>
> % Elapsed time is 0.011368 seconds for Ssize = 100;
> % Elapsed time is 7.054131 seconds for Ssize = 1000;
>
> The number of elements raised by a factor of 100, the time  7.054131/0.011368 > 600.
> I let you imagine what happens if Ssize = 10000.
>
> % FUNCTION HERE
> function Out =  WeMean(Values, Weights, Class)
>
> % All three inputs might come from different datasets/computations so i need to "align" Class
> Class(isnan(Values) | isnan(Weights)) = 0; % Note that class doesn't have NaN by construction
>
> % Unique classes (0 is not a class)
> unC = setdiff(unique(Class), 0)';
>
> % Preallocate OUT
> Out = NaN(size(Values,1)+1, length(unC));
> % header row
> Out(1,:) = unC;
>
> % LOOP by class
> for i = 1 : length(unC)    
>    % Index the specific class
>     IDX = Class == unC(1,i);
>
>    % Filter the values and the weights which belong to the class
>     TempV = zeros(size(Values));
>     TempV(IDX) = Values(IDX);
>     TempW = zeros(size(Weights));
>     TempW(IDX) = Weights(IDX);
>     % Denominator by row
>     Den = sum(TempW, 2);
>     % Numerator by row
>     Num = sum(TempV.*TempW,2);
>     % weighted mean by row
>     Out(2:end,i) = Num./Den;
> end
>
> i tried to vectorize the summations (for the den and num computation) using accumarray. I avoided the loop with a subfunction which computed vectorized grouped sums. I reduced the computation time by half but accumulated eps (the approzimation is still far over e-15).
>
> Thanks in advance!
>
> Oleg

losing the temp arrays may make it a bit faster. other than that
you'll just have to live with the fact that matlab is slow with loops.

for i = 1 : length(unC)
   % Index the specific class
    IDX = Class == unC(1,i);

    % weighted mean by row
    Out(2:end,i) = sum(Values(IDX).*Weights(IDX),2)./sum(Weights(IDX),
2);
end