Are there speedier alternatives to the mode function?

6 views (last 30 days)
I'm trying to sort through a very large matrix (12x4^12). It's actually a matrix generated by combinator of all possible permutations with repetitions of choosing 12 from 4 , though that's not critical to this question. What I need to do is identify all columns in which each number is repeated no more than x times (say x = 3 here). Here's simplified sample code:
x = 3;
poss = randi(4,12,4^12); % in my code: poss = combinator(4,12,'p','r')';
[M,F] = mode(poss,1);
evenposs = poss(:, F<=x );
My problem is that the third line above (with mode) is very slow, over 400 seconds on my computer, so it seems that doing it through the mode function is highly inefficient. Obviously I could do this with a for loop, but I've been trained to avoid that in Matlab - I thought this mode function might have been a crafty way to avoid the loop, but now I'm wondering if there are still craftier ways of achieving this.
Bonus points (i.e. my eternal gratitude) if anybody can suggest a way of using combinator or another package to generate permutations with limited repetitions, so that I can avoid this filtering step entirely.

Accepted Answer

Roger Stafford
Roger Stafford on 9 Jul 2014
You could attempt to take advantage of the fact that elements in 'poss' are positive integers and can be used as indices. However, I tend to doubt that in a span of only 12 elements in each column of 'poss' the following code could outperform Mathworks' 'mode', but you can try it and see.
subs = [poss(:),reshape(repmat(1:4^12,12,1),[],1)];
F = max(accumarray(subs,1,[4,4^12]),[],1);
evenposs = poss(:,F==x); % <-- Or do you mean F<=x?
  7 Comments
Shane
Shane on 10 Jul 2014
Brilliant - thanks Roger. I may add further comments later describing timings once I have more time to implement this next week.
Shane
Shane on 29 Jul 2014
Edited: Shane on 5 Aug 2014
A quick update - I spoke with Matt Fig (the guy who wrote combinator.m on MEX) to see if he had any ideas of how to do permutations with limited reputations more generally, and he said he'd think about it. He also, however, proposed an alternative to my use of the mode function (which stemmed this question) and Roger's suggestion using accumarray (see above). It's actually quite simple:
x = 3;
poss = combinator(4,12,'p','r')';
% MY METHOD
tic
[M,F] = mode(poss,1);
evenposs = poss(:,F==x);
toc
% ROGER'S METHOD
tic
subs = [poss(:),reshape(repmat(1:4^12,12,1),[],1)];
F = max(accumarray(subs,1,[4,4^12]),[],1);
evenposs = poss(:,F==x);
toc
% MATT'S METHOD
tic
idx = true(1,size(poss,2));
for ii = 1:4
idx = idx & sum(poss==ii)<=x;
end
evenposs = poss(:,idx);
toc
All three methods achieve the same result. Timings are:
  • My Method: 406 seconds
  • Roger's Method: 114 seconds
  • Matt's Method: 7 seconds
As you can see, while implementing Roger's replacement saved me a lot of time, implementing Matt's saved me even more. Since this filtering of the combinator output now doesn't take too long, it's less important for me to directly generate the subset of combinator's output with limited repetitions, though obviously that would still save further time. Roger's suggestion above gives me a pretty good start on how to do that if I want to speed up the code further, but I think it's fast enough for now. Thanks again to both Roger and Matt.

Sign in to comment.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!