## How to vectorize this for maximum efficiency?

### tensorisation (view profile)

on 31 Aug 2019
Latest activity Commented on by Guillaume

### Guillaume (view profile)

on 1 Sep 2019
In my calculation I had to go over non-zero elements of a 3 dimensional array (label). Instead of making 3 loops over each index of the array (which would be inefficient), I ended up doing something like this:
highest_label=max(max(max(label)));
clusters_size=zeros(1,highest_label);
label_lin=nonzeros(label(:));
for q=1:length(label_lin)
clusters_size(label_lin(q))=clusters_size(label_lin(q))+1;
end
Is there a way to get rid of this loop as well, by vectorizing, and make this even more efficient? I tried stuff like:
clusters_size(label_lin)=clusters_size(label_lin)+1;
But that doesn't give a correct result.
To further clarify what I need, I will give an example:
Say:
label_lin=[1;1;2;3;4;5;5;5];
clusters_size=zeros(1,5)
Now I want to count the different values of label_lin in clusters_size (this is what the for loop does), so that:
clusters_size=[2,1,1,1,3]
If I do something like:
clusters_size(label_lin)=clusters_size(label_lin)+1;
clusters_size=[1,1,1,1,1]
Which is obviously incorrect.

R2018b

### dpb (view profile)

on 31 Aug 2019

>> histc(label_lin,unique(label_lin))
ans =
2
1
1
1
3
>>

tensorisation

### tensorisation (view profile)

on 1 Sep 2019
@Guillaume
This pretty much seems to be in agreement with what I tested, so I'll probally stay with the loop for now. After testing run times for different parts of my calculation, I'm now trying to make the part of the calculation that takes the most run time (especially for large L) more efficiently - see my previous comment above.
dpb

### dpb (view profile)

on 1 Sep 2019
Yeah, has often been noted that histc is not as efficient as could be--I just wish TMW had chosen to work on it instead or at least not changed binning behavior in introducing histcounts. This incessant introduction of overlapping functionality and inconsistent syntax and behavior is really a pain and source of both confusion and error.
As far as "large", a double array of 512^3 is 128 MB. Once upon a time, that would have taxed the largest machines, now "entry-level" is 4GB physical memory. The key item is that the operation is inside a looping construct. However, as you note above, that's still not a significant fraction of the time in the loop so even if it were cut to nil it wouldn't make all that much difference. The need to optimize brings along the need to find out what areas offer real gains overall, not just "peephole" optimization.
Looking at CC2periodic you could also profile it...not sure which output form you're using; notice that one branch uses accumarray and sort both...there might be some opportunity there.
Whether it would suit your need or not, noticed a C source on GitHub that might be of some use if wanted to write a mex file.
Guillaume

### Guillaume (view profile)

on 1 Sep 2019
It's not too hard to write your own version of bwconncomp or bwlabel so you could make it wrap around the edges. Whether or not an m file implementation will gain you anything over what you're using now remains to be seen. Certainly, if it were coded as a mex you could see a gain.