How to vectorize this for maximum efficiency?

Question

tensorisation on 31 Aug 2019

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/478397-how-to-vectorize-this-for-maximum-efficiency

Commented: Guillaume on 1 Sep 2019

In my calculation I had to go over non-zero elements of a 3 dimensional array (label). Instead of making 3 loops over each index of the array (which would be inefficient), I ended up doing something like this:

        highest_label=max(max(max(label)));
        clusters_size=zeros(1,highest_label);
        
        label_lin=nonzeros(label(:));
        for q=1:length(label_lin)
            clusters_size(label_lin(q))=clusters_size(label_lin(q))+1;
        end
         

Is there a way to get rid of this loop as well, by vectorizing, and make this even more efficient? I tried stuff like:

clusters_size(label_lin)=clusters_size(label_lin)+1;

But that doesn't give a correct result.

To further clarify what I need, I will give an example:

Say:

        label_lin=[1;1;2;3;4;5;5;5];
        clusters_size=zeros(1,5)

Now I want to count the different values of label_lin in clusters_size (this is what the for loop does), so that:

clusters_size=[2,1,1,1,3]

If I do something like:

clusters_size(label_lin)=clusters_size(label_lin)+1;

I will instead get:

clusters_size=[1,1,1,1,1]

Which is obviously incorrect.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

dpb on 31 Aug 2019

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/478397-how-to-vectorize-this-for-maximum-efficiency#answer_389922

Open in MATLAB Online

>> histc(label_lin,unique(label_lin))
ans =
     2
     1
     1
     1
     3
>> 

14 Comments
Show 12 older commentsHide 12 older comments

dpb on 1 Sep 2019

Edited: dpb on 1 Sep 2019

Open in MATLAB Online

"...clusters_size=histcounts(label_lin,unique(label_lin)) ... is incorrect. Since histc is not recommended for usage (which is why I used histcounts)..."

Note in his example Guillaume used [unique(label_lin); Inf] as the binning argument with histcounts which is why I used the venerable (yet reliable and easily understandable) histc. Instead, with histcounts you have to remember to add the last artificial entry to prevent it from lumping the last elements into the same bin, entirely unintuitive and generally also the wrong result. Just because it's new doesn't mean it's improved (or even as good) as the previous version it's supposed to replace. TMW can deprecate it all they want but they can't realistically remove it because there's too many existing cases where it is being used so I'd not worrry about it and use what is the simpler.

As for whether to bin using unique or 1:max(v), that's all dependent upon what you want for an answer--the histc()/unique() combination guarantees a bin for each value in the vector with no zeros; the alternatives will have as many bins as the various logical ways to write the definition return but will have zeros unless the number of elements is identically the same as the number of elements. Neither is right or wrong, just the answer to different questions--your initial posting indicated you wanted the count of repeitions of the elements in the vector, not a count of the values in the range of 1:max(v) including those values that might not be included in the vector.

If you're so interested in performance, while it isn't probably the answer you really want, writing

highest_label=max(label(:));

will outperform the nested max() for large arrays. unique does introduce some overhead, that is true, but is probably actually the binning that you really are looking for.

accumarray is a great tool but it is essentially a loop in one line and quite frequently will be outdone in the performance arena with the explicit loop that the JIT compiler can optimize.

Only doing real-case testing for problems of your specific datasets over a large enough sample and size can accurately judge whether there's any real performance advantage of one way versus another. In general, write the most straightforward, easy to understand and debug code you possibly can and only if it is then shown to be too slow worry about optimizing--and then only after profiling has shown that the area of code you're trying to optimize actually is a bottleneck.

Guillaume on 1 Sep 2019

Open in MATLAB Online

Yes, I did write the correct syntax for histcounts. It does say in the doc that the last bin includes both the left and right edge (unlike histc).

You never specified what the label matrix contained, so we used unique to replicate what your loop did. If that matrix is a typical labeling matrix with integer from 1 to number of labels with 0 as a background, then yes you don't need the unique. In that case, you can also simplify the histcount syntax to:

clusters_size = histcounts(label, 'binmethod', 'integers');
clusters_size = clusters_size(2:end);  %first bin would be: 0 = background

As dpb said, max(label(:)) will be faster than triple max, but since you're on R2018b, you can also use:

highest_label = max(label, 'all');

which is probably even (marginally) faster.

As for getting the most speed out of the code, you'll have to test it for yourself as this will depend on the matlab versions. I would have thought that accumarray would be the fastest but possibly histc or histcounts would be. However, mathworks have improved the speed of loops in recent version, so maybe the loop may be just as fast. Since you don't need unique, the accumarray call could be:

clusters_size = accumarray(nonzeros(label), 1);

Possibly this may be faster:

clusters_size = accumarray(label(:) + 1, 1);   %add 1 to so that background becomes a valid index
clusters_size = clusters_size(2:end);      %remove background count

I doubt you can do faster than accumarray, even using mex since it's already implemented as machine code.

dpb on 1 Sep 2019

For such tiny arrays, most of the run time is in the function overhead and differences are highly unlikely to be "statistically significant" in proving anything about overall performance.

In essence, the 'all' argument to max will be doing the (:) operation on the input array internally--all it is is a subscripting logical change, it does not do any reallocation. While I don't have the inclination to spend time testing, my gut feeling is that it may well beat the function as then the function doesn't have to process the argument--altho the overhead is still in the newer version to parse so may be a wash...

As G notes, we used unique because that was what your Q? implied you wanted and it is a general solution to the question independent of the specific set of data in the example. If your data is such that you have the full population, then there's no reason to use it if don't need to, granted.

There's a price to be paid for the movement by TMW to these new (more capable in some ways sometimes) functions that are based on OO invocations versus the traditional MATLAB procedural code in higher overhead in both memory and often performance as well. Convenience and/or conformance to a coding style may come at a price--"there is no free lunch!"

tensorisation on 1 Sep 2019

Edited: tensorisation on 1 Sep 2019

Open in MATLAB Online

@dpb

An array of size [512 512 512] doesn't seem small to me. What size would you consider "large"? Especially given the fact that certain calculations are being done with it and that eventually everything is repeated many times.

For large L (L=512) in my calculation, it seems that almost all of the run time goes to other things that came before. To be more specific:

system=1*(p>=rand(L,L,L));
label=CC2periodic(bwconncomp(system,6),[0,1,1],'LabelMatrix'); 

I'm using the bwconncomp(...) function of Matlab to make a Hoshen–Kopelman algorithm, and then I'm also using the function CC2periodic to apply periodic boundary conditions (because sadly, bwconncomp function of Matlab doesn't seem to have that option). The function CC2periodic is taken from here.

I now tested run times for L=256 and L=512, and it seems that the run time of CC2periodic takes up almost all of the run time. bwconncomp takes time more than all the others (expect CC2periodic), but then CC2periodic takes alot more than bwconncomp (for L=256 about 16 times more, and for L=512 about 57 times more).

I'm not exactly sure what should be the theoretical efficiency of applying periodic boundary conditions to a Hoshen-Kopelman algorithm in 3D (should it be in the same order of magnitude as bwconncomp?) , but it seems that CC2periodic is not as efficient as it should be, and I'm unsure as to what to do about that. Maybe I should open a new question about that.

dpb on 1 Sep 2019

Yeah, has often been noted that histc is not as efficient as could be--I just wish TMW had chosen to work on it instead or at least not changed binning behavior in introducing histcounts. This incessant introduction of overlapping functionality and inconsistent syntax and behavior is really a pain and source of both confusion and error.

As far as "large", a double array of 512^3 is 128 MB. Once upon a time, that would have taxed the largest machines, now "entry-level" is 4GB physical memory. The key item is that the operation is inside a looping construct. However, as you note above, that's still not a significant fraction of the time in the loop so even if it were cut to nil it wouldn't make all that much difference. The need to optimize brings along the need to find out what areas offer real gains overall, not just "peephole" optimization.

Looking at CC2periodic you could also profile it...not sure which output form you're using; notice that one branch uses accumarray and sort both...there might be some opportunity there.

Whether it would suit your need or not, noticed a C source on GitHub that might be of some use if wanted to write a mex file.

Guillaume on 1 Sep 2019

It's not too hard to write your own version of bwconncomp or bwlabel so you could make it wrap around the edges. Whether or not an m file implementation will gain you anything over what you're using now remains to be seen. Certainly, if it were coded as a mex you could see a gain.

Sign in to comment.

How to vectorize this for maximum efficiency?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

14 Comments
Show 12 older commentsHide 12 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to vectorize this for maximum efficiency?

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

14 Comments Show 12 older commentsHide 12 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

14 Comments
Show 12 older commentsHide 12 older comments