Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Faster and vectorized
Date: Thu, 29 Oct 2009 10:33:21 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 61
Message-ID: <hcbr1h$ne0$1@fred.mathworks.com>
Reply-To: <HIDDEN>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1256812401 24000 172.30.248.37 (29 Oct 2009 10:33:21 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Thu, 29 Oct 2009 10:33:21 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1886545
Xref: news.mathworks.com comp.soft-sys.matlab:580902


Dear all, 
i've been using a user defined function, written more than  one year ago, without any problem on small datasets. Since I started to work on a bigger dataset the computational time raised significanlty, i ask you therefore to help me to optimize this function.

I need to compute weighted means (by row) grouped into classes

Ssize = 100;
% Example Inputs:
% Generate Values 
Values = 10.*rand(Ssize); Values(ceil(100.*rand(Ssize/2))) = NaN; 
% Generate Weights
Weights = 50000.*rand(Ssize); Weights(ceil(100.*rand(Ssize/2))) = NaN; 
% Generate  Classes to which values belong
Class = floor(Ssize/10.*rand(Ssize));

tic;
    Out = WeMean(Values, Weights, Class); 
toc;

% Elapsed time is 0.011368 seconds for Ssize = 100;
% Elapsed time is 7.054131 seconds for Ssize = 1000;

The number of elements raised by a factor of 100, the time  7.054131/0.011368 > 600.
I let you imagine what happens if Ssize = 10000.

% FUNCTION HERE
function Out =  WeMean(Values, Weights, Class) 

% All three inputs might come from different datasets/computations so i need to "align" Class
Class(isnan(Values) | isnan(Weights)) = 0; % Note that class doesn't have NaN by construction

% Unique classes (0 is not a class)
unC = setdiff(unique(Class), 0)';

% Preallocate OUT
Out = NaN(size(Values,1)+1, length(unC));
% header row
Out(1,:) = unC;

% LOOP by class
for i = 1 : length(unC)    
   % Index the specific class
    IDX = Class == unC(1,i);
    
   % Filter the values and the weights which belong to the class
    TempV = zeros(size(Values));
    TempV(IDX) = Values(IDX);
    TempW = zeros(size(Weights));
    TempW(IDX) = Weights(IDX);
    % Denominator by row
    Den = sum(TempW, 2);
    % Numerator by row
    Num = sum(TempV.*TempW,2);
    % weighted mean by row
    Out(2:end,i) = Num./Den;
end

i tried to vectorize the summations (for the den and num computation) using accumarray. I avoided the loop with a subfunction which computed vectorized grouped sums. I reduced the computation time by half but accumulated eps (the approzimation is still far over e-15).

Thanks in advance!

Oleg