How to return 'X' number of unique subsets (combinations) of 'N' numbers taken 'K' at a time

8 views (last 30 days)
I need to return X number of unique combinations of N numbers (i.e., vector V of length N) taken K at a time.
I can't use 'nchoosek' because I don't want ALL unique combinations. I just want X number of them and 'nchoosek' will crash if I enter the actual values for V and K because V is too large.
Here's an example, with more descriptive variable names…
origSet = rand(1,500); %the full original (example) set of numbers
desNumComb = 10000; %the number of unique combinations/subsets that I want to end up with
subsetSize = 10; %the desired size for each combination/subset
allCombos = nchoosek(1:length(origSet), subsetSize); %will return ALL possible combinations (if it ran)
subsetInds = allCombos(desNumComb,:); %the indices for each of the desNumComb subsets
Worth mentioning is that the size of the original set of numbers [i.e., length(origSet) ], the desired subset size [i.e., subsetSize], and the desired number of unique combinations [i.e., desNumComb] will possibly vary every time I loop through, which will be many times.
Thanks in advance to all.
Cheers, John
  2 Comments
Walter Roberson
Walter Roberson on 23 Jul 2015
Which X subsets? The "first" X subsets under some specific ordering? X random subsets? Are you using this to iterate through all the possibilities in batches?
John Trimper
John Trimper on 24 Jul 2015
Hi Walter,
It doesn't matter which X subsets out of the full range of unique possibilities. What matters is that they're all unique.
Here's what I'm doing: I need to compare two groups but they have really different numbers of samples. One group has up to several hundred, while the other group might have as few as 5. The metric I'm using is biased so I need to equate the number of samples in each group. So what I want to do is repeatedly subsample the larger group down to match the number of samples in the smaller group, up to 10,000 times (but not more) and then average over the measurements taken across those 10,000 subsamples. Since the total number of unique combinations is WAY more than I need (incomputable by nchoosek), I need to find a way to only get a reduced chosen number of unique combinations.
I hope that helps to clarify. Thank you for your time.

Sign in to comment.

Accepted Answer

John Trimper
John Trimper on 27 Jul 2015
Edited: Walter Roberson on 27 Jul 2015
Answer provided by Star Strider & Walter Roberson above, worked out in comments, summarized here:
Use randperm to generate more vectors than necessary, then use unique(A, 'rows', 'stable') to select only unique combinations.
Example code for those interested:
biggerGroup = rand(1,100);
subsetSize = 10;
mixer = zeros(1, length(biggerGroup));
mixer(1:subsetSize) = 1;
for s = 1:20000; %more shuffles than I actually need
mixer = mixer(randperm(length(mixer)));
allCombs(s,:) = biggerGroup(mixer==1);
end
uniqueShufs = unique(allCombs, 'rows', 'stable');
desNumUniShuf = 10000; %actual desired # of unique shuffles
myUniShufs = uniqueShufs(1:desNumUniShuf,:);

More Answers (0)

Categories

Find more on Creating and Concatenating Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!