File Exchange

## VChooseK

version 1.0.0.0 (13.7 KB) by Jan

### Jan (view profile)

Choose K elements from a vector - MEX: 100 times faster than NCHOOSEK

Updated 23 Dec 2009

VChooseK(V, K) creates a matrix, which rows are all combinations of choosing K elements of the vector V without order and without repititions.

INPUT:
V: Array of class DOUBLE, SINGLE, (U)INT8/16/32/64, LOGICAL, CHAR.
K: Number of elements to choose.

OUTPUT:
Y: Matrix of size [N!/K!(N-K)!, K] with N is the number of elements of V.
Y has the same class as the input V.

The output equals Matlab's NCHOOSEK, except that NCHOOSEK replies the number of combinations for a scalar V:
VChooseK(-1, 1) replies [-1]: One element taken out of a set of length one.
NCHOOSEK(-1, 1) fails at calculating N!/K!(N-K)!.

EXAMPLES:
Choose 2 elements from [1,2,3,4]:
VChooseK(1:4, 2)
==> [1,2; 1,3; 1,4; 2,3; 2,4; 3,4]
For speed cast the input to integer types if possible:
Y = double(VChooseK(int16(1:1000), 2);
is faster than:
Y = VChooseK(1:1000, 2);
To get the combinations of cell arrays, use the combinations of the index:
C = {'a', 'b', 'c', 'd'};
C2 = C(VChooseK(1:4, 2))
==> C2 = {'a', 'b'; 'a', 'c'; 'a', 'd'; 'b', 'c'; 'b', 'd'; 'c', 'd'}

INSPIRATION:
Jos van der Geest has published an efficient NCHOOSE2, which is much faster than Matlab's NCHOOSEK: http://www.mathworks.com/matlabcentral/fileexchange/20144
I was curious, if a MEX version is much faster. At first I've created NChoose2q, which was 5 times faster than Jos' Matlab implementation. But the algorithm was so simple in C and did not consume temporary memory, that I've expanded it to K=3,4,5 soon. The algorithm for free K was more challenging, but it needs just 3*K pointers for temporary memory. Therefore it is much faster than Matlab's NCHOOSEK (Matlab 7.8, 1.5GHz Pentium-M, 512 MB):
NCHOOSEK(1:50, 5): 45.30 sec
VChooseK(1:50, 5): 0.32 sec => 141 times faster
NCHOOSEK(1:20, 16): 0.52 sec
VChooseK(1:20, 16) 0.0024 sec => 216 times faster
NCHOOSEK(int8(1:40), 4): 0.64 sec
VChooseK(int8(1:40), 4): 0.001562 sec => 410 times faster

The powerful COMBINATOR of Matt Fig handles repititions and ordered sequences also, but for its specific job VChooseK is remarkably faster (see screen shot).

Tested: Matlab 6.5, 7.7, 7.8, WinXP
Compilers: BCC5.5, LCC2.4/3.8, Open Watcom 1.8
Please run the unit-test TestVChooseK after compiling and for speed comparisons.

Mattj.d.

### Mattj.d. (view profile)

Excuse my mistake, I am mixing up NCHOOSEK and nsumk

Mattj.d.

### Mattj.d. (view profile)

Wonderful function but why is this recommended as a replacement to NCHOOSEK when it does not perform integer partitioning? Yes, it does set partitioning faster but what about the integer partitioning? Do you have a file for integer partitioning?

Jan

### Jan (view profile)

@Hugh Carson: This is the wanted behavior as explained the the help section of VChooseK:

NOTE: The output equals Matlab's NCHOOSEK, except that NCHOOSEK replies the
number of combinations for a scalar V:
VChooseK(-1, 1) replies [-1]: One element taken out of a set of length one.
NCHOOSEK(-1, 1) fails at calculating N!/K!(N-K)!.

Matlab's nchoosek replies two different things depending on the size of the input. For scalar inputs N!/K!(N-K)! is calculated, for vectors the combinations are replied. This ambiguity is avoided on purpose in VChooseK, but only the combinations are replied in all cases. And then [] is the correct answer, because there are no combinations with 3 elements from a scalar.

Hugh Carson

### Hugh Carson (view profile)

VChooseK_M(21,3) returns []; when nchoosek(21,3) returns 1330 (the correct answer)

Anton Semechko

Jan

### Jan (view profile)

@Yupin Yang: The message "Maximum variable size allowed by the function is exceeded" is not created by this function. It should be: "Output would be too large".

If you select 5 elements out of 265, you get a [10484943678 x 5] matrix, see: nchoosek(265,5). If the elements are INT8, this needs 52.4 GB, for the type DOUBLE 8 times more. You need not only this amount of free RAM, but it must be available in a contiguous block in addition. As a rule of thumb the computer should have at least the double RAM size. If you really work on such a huge machine, you can modify the source code of VChooseK.c accordingly:

// Limits for warning and error (Type DOUBLE needed instead of LONG):
#define WARN_LIMIT 10000000000.0 // *must* be smaller than ERROR_LIMIT
#define ERROR_LIMIT 60000000000.0 // 60 GB (for 5 out of 265 in type INT8)

I will improve the error message such that the required amount of RAM is displayed explicitly.

Yupin Yang

### Yupin Yang (view profile)

Hello,
If I get error message "Maximum variable size allowed by the function is exceeded.". Can I change the variable to get the output? I need to select 5 elements from a vector size 265. Thanks.

Torquato Garulli

### Torquato Garulli (view profile)

Stefano Massaroli

Mohamed

zoe cao

Hi Jan,

Best regards

Jan

### Jan (view profile)

@zoe cao: This is not a problem of VChooseK, but you have to install a C-compiler at first. Follow the suggestion on the link, which is included in the error message. You can search in the Answers forum for a bunch of corresponding questions.

zoe cao

### zoe cao (view profile)

This code is definitely useful, but so far I can not even run it on my 2015a.

mex -O VChooseK.c
Error using mex
No supported compiler or SDK was found. For options, visit http://www.mathworks.com/support/compilers/R2015a/win64.html.

I visited the suggested website and downloaded/installed the required SDK (but always not successfully installed). Is there anyway I could use this function without the SDK problem?

Jan

### Jan (view profile)

@Mazsyed: This cannot be done efficiently with this function.

Mazsyed

### Mazsyed (view profile)

Hi Jan,

is it possible to obtain combinations from a number of vectors (or arrays)?
for example if :
a=[1:100];
b=[1:100];
c=[1:100];
d=[1:100];
e=[1:100];
all combinations from the above 5 arrays that would sum to 400.

Dvir

Jan

### Jan (view profile)

@Mikhail Konnik: Thanks.

NCHOOSEK mixes 2 different functions, unfortunately: It replies the *matrix* of combinations, if the input is a vector, and the *number* of combinations, if the input is a scalar:
nchoosek(0,0) % 1
nchoosek(1:2, 0) % empty matrix 1-by-0 (R2011b)

I wanted to avoid this ambiguity and reply the matrix of combinations in every case, such that the user does not have to care about the exception for vectors with a single element. In consequence requesting all combinations with 0 elements must reply the empty matrix, because this is the only set, which contains zero elements of the input vektor.

Mikhail Konnik

### Mikhail Konnik (view profile)

Jan, fantastic submission, thank you. Works in Linux & Matlab 2015a x64 like a charm with gcc.

My only suggestion is the following: can you please add the case of VChooseK(0,0) and (X,0) in general - in my algorithms it returns empty vector [], and nchoosek returns 1 (as it should). Trivial fix, but useful.

Jan

### Jan (view profile)

@Nitin Agrawal: No, there is no pure C-code. I meant, that you can use the MSVC2010 version to compile the C-mex function. Sorry.

Nitin Agrawal

### Nitin Agrawal (view profile)

@Jan. Thanks for your response. In your earlier reply dated 16 feb 2014, you said that the c-code can be compiled using MSV 2010 without any modifications. Is there a pure C version available with you? Could you upload it.

Jan

### Jan (view profile)

@Nitin Agrawal: mex.h belongs to Matlab. If you want to use the function in a C-environment, you neither need mex.h nor the gateway function mexFunction(). Some modifications are required, e.g. the functions for the errors and warnings. I'd suggest to start from the inc-file and implement one data type required for your problem only.

Nitin Agrawal

### Nitin Agrawal (view profile)

Hi Jan

I wanted the use this code as a C function in one of my C programs, however when I try to compile this code in C (using gcc), headers file "MEX.h" is found to be missing by the compiler. It in turn required me to have "Pair.h" and "matrix.h", <vector> etc when I tried using a "MEX.h" header file which I found on the internet. Please advice.

Timo Eckhard

### Timo Eckhard (view profile)

@Jan: great function!

For those using Linux, you need to compile using:
'mex -O CFLAGS="\\$CFLAGS -std=c99" VChooseK.c'

There is a small mistake in the description in 'VChooseK.m', indicating to compile as:
mex CFLAGS="\\$CFLAGS -std=C99" -O VChooseK.c

For those who run into an error because attempting to get a too large number of combinations:

-> Edit Line 94 in 'VChooseK.c':
'#define ERROR_LIMIT 1000000000L // 1 GB (recommended for 32&64 bit systems)'
to something bigger. Keep in mind to have enough memory available

-> recompile the source code

Works great, works fast. Just fine for me...

Timo Eckhard

### Timo Eckhard (view profile)

For those who run into an error because attempting to get a too large number of combinations:

-> Edit Line 94 in 'VChooseK.c':
'#define ERROR_LIMIT 1000000000L // 1 GB (recommended for 32&64 bit systems)'
to something bigger. Keep in mind to have enough memory available

-> recompile the source code

Works great, works fast. Just fine for me...

Jan

### Jan (view profile)

@Deepak Gawali: You can compile this C-code with MSVC2010 without any modifications.

Deepak Gawali

### Deepak Gawali (view profile)

If I want to use vchoosek.c in visual studio 2010, will it work or do you have other code to use in vs2010.

Alexandros Iliopoulos

Gabriel Bruneau

### Gabriel Bruneau (view profile)

Fantastic submission!

For the job I had to do:
nchoosek: 405 seconds
VChooseK: 6.19 seconds

Thanks a lot Jan!

Hassan

### Hassan (view profile)

Error in compiler
R2012a
No supported SDK or compiler was found on this computer.
For a list of supported compilers, see
http://www.mathworks.com/support/compilers/R2012a/win64.html

Error using mex (line 206)
Unable to complete successfully.

Kristofer Kusano

### Kristofer Kusano (view profile)

This works great, much faster than the built in Matlab function, as advertised. Thanks for sharing your work!

Jan

### Jan (view profile)

Dear Mark, before you can use the function, you have to compile it as explained in the documentation:
mex -O VChooseK.c
An explicite message will appear in a future version.

Mark Whirdy

### Mark Whirdy (view profile)

Hi Jan

I went to use this but came across a snag at the first hurdle, have you an idea of where the problem may lie?

==== Test VChooseK: 19-Dec-2011 18:15:35
Version: J:\Financial_Engineering\QUANT FIXED INCOME\...\VChooseK.m

== DOUBLE, K = 1:
Error in ==> VChooseK at 1
function Y = VChooseK(X, K)

??? Output argument "Y" (and maybe others) not assigned during call to "J:\Financial_Engineering\...\VChooseK.m>VChooseK".

Error in ==> TestVChooseK at 66
S = VChooseK(my_cast([], aClass), K);

Jan

Jan

### Jan (view profile)

I've deleted the file NChoose2q (limited to K==2), because it had no advantages compared with the more general VChooseK. Please mail me, if this causes problems for anybody.

##### MATLAB Release Compatibility
Created with R2009a
Compatible with any release
##### Platform Compatibility
Windows macOS Linux