Help with optimising my code?

Question

Rene Higuita on 11 Feb 2018

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/382007-help-with-optimising-my-code

Edited: Rene Higuita on 12 Feb 2018

Accepted Answer: per isakson

Open in MATLAB Online

Hello everyone,

Apologies for the vague title, but I think I'm missing exactly the knowledge that would let me point at the topic...

I am trying to group the values in a large matrix yMat into a single vector yOut, sorting them by binning values from a secondary matrix xMat. Starting with

[N,bin] = histc(xMat,edges); *

Then

        for k=1:numel(edges)-1
            yOut(k)=mean(yMat(bin==k));
        end

xMat/yMat vary in size, but their numel can go up to a couple tens of millions, and I need to go through hundreds of them. I picked a relatively large example: the histc line takes 0.3s, while the loop takes over 4s when numel(edges)=100. Ideally I would like much finer grain, too, with numel(edges) closer to 1000...

I am pretty confident I do this the very wrong way, for loops never being a good sign, but I can't think of a faster way to do this. Any help appreciated!

*I know histc is a legacy function, but I'm stuck with R2011b right now, and i don't think this impacts the issue

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

per isakson on 11 Feb 2018

1
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/382007-help-with-optimising-my-code#answer_304564

Edited: per isakson on 11 Feb 2018

Open in MATLAB Online

xMat    = randn( 12, 1);
edges   = ( - 4 : 1 : 4 );
yMat    = ( 1 : 12 )';
[ N, bin ] = histc( xMat, edges );
y1 = accumarray( bin, yMat, [], @mean );
for k=1:numel(edges)-1
    y2(k)=mean(yMat(bin==k));
end

comparision

>> y1'
ans =
         0   11.0000    8.5000    6.6667    4.0000    7.0000
>> y2
y2 =
       NaN   11.0000    8.5000    6.6667    4.0000    7.0000       NaN       NaN

btw: y2 should have been preallocated

In my example, elements of bin can get the value zero. Thus, bin+1 in accumarray

However, the increase in speed on R2016a is definitely not impressive.

4 Comments
Show 2 older commentsHide 2 older comments

Rene Higuita on 11 Feb 2018

Edited: Rene Higuita on 12 Feb 2018

Open in MATLAB Online

Thanks for your edits. Since I'm discovering these functions too, here's quick comments on them:

you avoid 0's in bin by adding -Inf and +Inf in edges (histc yields zero when values are out of bound).

Here's the more challenging code I used

tic;
xMat    = randn( 1e8,1);
edges   = [ -Inf ( - 4 : .01 : 4 ) Inf];
yMat    = randn(size(xMat))';
allocTime=toc;
tic;
[ N, bin ] = histc( xMat, edges );
histcTime=toc;
tic;
y1 = accumarray( bin, yMat, [], @mean );
y1time=toc;
tic;
y3 = accumarray( bin, yMat);y3=y3./N(1:end-1);
y3time=toc;
tic;
y2=zeros(numel(edges)-1,1);
for k=1:numel(edges)-1
  y2(k)=mean(yMat(bin==k));
end
y2time=toc;
%checking relative error levels
maxErr21=max(abs((y2-y1)./y1));
maxErr32=max(abs((y3-y2)./y2));
maxErr13=max(abs((y1-y3)./y3));
fprintf(['---  Numel(xMat)=%g, numel(edges)=%d ---\n'...
  'Creating vectors took %0.1fs, histc %0.1fs\n'...
  'Method y1: %0.1fs y2: %0.1fs y3: %0.1fs\n\n'...
  'Largest relative differences between:\n y1 and y2 : %0.2g'...
  ' -- y2 and y3 : %0.2g -- y3 and y1 :%0.2g\n\n'],...
  numel(xMat),numel(edges),allocTime,histcTime,...
  y1time,y2time,y3time,maxErr21,maxErr32,maxErr13);

And the output by fprintf at the end there, on a 5year old laptop and R2011b:

---  Numel(xMat)=1e+008, numel(edges)=803 ---
Creating vectors took 5.1s, histc 9.5s
Method y1: 18.8s y2: 292.5s y3: 1.5s
Largest relative differences between:
 y1 and y2 : 1.6e-012 -- y2 and y3 : 1e-012 -- y3 and y1 :2.4e-012

On a more recent desktop and R2017a:

i---  Numel(xMat)=1e+08, numel(edges)=803 ---
Creating vectors took 2.2s, histc 4.7s
Method y1: 10.4s y2: 117.6s y3: 0.7s
Largest relative differences between:
y1 and y2 : 2.1e-12 -- y2 and y3 : 1.8e-12 -- y3 and y1 :2.4e-12

So no dramatic changes in the ratios between 2011b and 2017a, and accumarray

Regarding the undocumented Matlab post, I can't really compare because accumarray only takes function handles as input for that argument, unlike cellfun which has a list of built-in functions it can deal with (much more efficiently than through a handle, which is the main point of that post). '@sum' is supposed to be the default handle, but I suspect it is actually built-in as opposed to actually calling 'sum'

Thanks for your help and interest! This definitely will take down my running times to acceptable levels.

per isakson on 12 Feb 2018

Edited: per isakson on 12 Feb 2018

Back in 2009 Yair Altman described a similar case, in which function handles cause a significant performance penalty, cellfun – undocumented performance boost. This has not changed in recent Matlab releases.

Sign in to comment.

Answer 2

Walter Roberson on 11 Feb 2018

1
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/382007-help-with-optimising-my-code#answer_304562

Try accumarray. Use the bin number as subs, use ymat in the values slot, use the empty array as the size, use @mean as the function

1 Comment
Show -1 older commentsHide -1 older comments

Rene Higuita on 11 Feb 2018

Edited: Rene Higuita on 11 Feb 2018

You got the answer as well. I tagged Per's because he gives more details, and it'll probably help others more to have the examples he gives.

But thanks a lot!

Sign in to comment.

Help with optimising my code?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

Help with optimising my code?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments

1 Comment
Show -1 older commentsHide -1 older comments