File Exchange

image thumbnail

VChooseK

version 1.0 (13.7 KB) by

Choose K elements from a vector - MEX: 100 times faster than NCHOOSEK

4.88889
10 Ratings

19 Downloads

Updated

View License

VChooseK(V, K) creates a matrix, which rows are all combinations of choosing K elements of the vector V without order and without repititions.

INPUT:
V: Array of class DOUBLE, SINGLE, (U)INT8/16/32/64, LOGICAL, CHAR.
K: Number of elements to choose.

OUTPUT:
Y: Matrix of size [N!/K!(N-K)!, K] with N is the number of elements of V.
Y has the same class as the input V.

The output equals Matlab's NCHOOSEK, except that NCHOOSEK replies the number of combinations for a scalar V:
VChooseK(-1, 1) replies [-1]: One element taken out of a set of length one.
NCHOOSEK(-1, 1) fails at calculating N!/K!(N-K)!.

EXAMPLES:
Choose 2 elements from [1,2,3,4]:
VChooseK(1:4, 2)
==> [1,2; 1,3; 1,4; 2,3; 2,4; 3,4]
For speed cast the input to integer types if possible:
Y = double(VChooseK(int16(1:1000), 2);
is faster than:
Y = VChooseK(1:1000, 2);
To get the combinations of cell arrays, use the combinations of the index:
C = {'a', 'b', 'c', 'd'};
C2 = C(VChooseK(1:4, 2))
==> C2 = {'a', 'b'; 'a', 'c'; 'a', 'd'; 'b', 'c'; 'b', 'd'; 'c', 'd'}

INSPIRATION:
Jos van der Geest has published an efficient NCHOOSE2, which is much faster than Matlab's NCHOOSEK: http://www.mathworks.com/matlabcentral/fileexchange/20144
I was curious, if a MEX version is much faster. At first I've created NChoose2q, which was 5 times faster than Jos' Matlab implementation. But the algorithm was so simple in C and did not consume temporary memory, that I've expanded it to K=3,4,5 soon. The algorithm for free K was more challenging, but it needs just 3*K pointers for temporary memory. Therefore it is much faster than Matlab's NCHOOSEK (Matlab 7.8, 1.5GHz Pentium-M, 512 MB):
NCHOOSEK(1:50, 5): 45.30 sec
VChooseK(1:50, 5): 0.32 sec => 141 times faster
NCHOOSEK(1:20, 16): 0.52 sec
VChooseK(1:20, 16) 0.0024 sec => 216 times faster
NCHOOSEK(int8(1:40), 4): 0.64 sec
VChooseK(int8(1:40), 4): 0.001562 sec => 410 times faster

The powerful COMBINATOR of Matt Fig handles repititions and ordered sequences also, but for its specific job VChooseK is remarkably faster (see screen shot).

Tested: Matlab 6.5, 7.7, 7.8, WinXP
Compilers: BCC5.5, LCC2.4/3.8, Open Watcom 1.8
Please run the unit-test TestVChooseK after compiling and for speed comparisons.

Comments and Ratings (29)

Jan Simon

Jan Simon (view profile)

@Yupin Yang: The message "Maximum variable size allowed by the function is exceeded" is not created by this function. It should be: "Output would be too large".

If you select 5 elements out of 265, you get a [10484943678 x 5] matrix, see: nchoosek(265,5). If the elements are INT8, this needs 52.4 GB, for the type DOUBLE 8 times more. You need not only this amount of free RAM, but it must be available in a contiguous block in addition. As a rule of thumb the computer should have at least the double RAM size. If you really work on such a huge machine, you can modify the source code of VChooseK.c accordingly:

// Limits for warning and error (Type DOUBLE needed instead of LONG):
#define WARN_LIMIT 10000000000.0 // *must* be smaller than ERROR_LIMIT
#define ERROR_LIMIT 60000000000.0 // 60 GB (for 5 out of 265 in type INT8)

I will improve the error message such that the required amount of RAM is displayed explicitly.

Yupin Yang

Hello,
If I get error message "Maximum variable size allowed by the function is exceeded.". Can I change the variable to get the output? I need to select 5 elements from a vector size 265. Thanks.

Mohamed

zoe cao

Hi Jan,
Thanks a lot for your reply. If possible, can you please send me the link for the pre-cmopiled binaries at your website?

Best regards

Jan Simon

Jan Simon (view profile)

@zoe cao: This is not a problem of VChooseK, but you have to install a C-compiler at first. Follow the suggestion on the link, which is included in the error message. You can search in the Answers forum for a bunch of corresponding questions.

I'm going to add pre-cmopiled binaries on my home page soon.

zoe cao

This code is definitely useful, but so far I can not even run it on my 2015a.

mex -O VChooseK.c
Error using mex
No supported compiler or SDK was found. For options, visit http://www.mathworks.com/support/compilers/R2015a/win64.html.

I visited the suggested website and downloaded/installed the required SDK (but always not successfully installed). Is there anyway I could use this function without the SDK problem?

Thanks a lot for your answer in advance.

Jan Simon

Jan Simon (view profile)

@Mazsyed: This cannot be done efficiently with this function.

Mazsyed

Hi Jan,

is it possible to obtain combinations from a number of vectors (or arrays)?
for example if :
a=[1:100];
b=[1:100];
c=[1:100];
d=[1:100];
e=[1:100];
all combinations from the above 5 arrays that would sum to 400.

Thanks in advance.

Dvir

Dvir (view profile)

Jan Simon

Jan Simon (view profile)

@Mikhail Konnik: Thanks.

NCHOOSEK mixes 2 different functions, unfortunately: It replies the *matrix* of combinations, if the input is a vector, and the *number* of combinations, if the input is a scalar:
nchoosek(0,0) % 1
nchoosek(1:2, 0) % empty matrix 1-by-0 (R2011b)

I wanted to avoid this ambiguity and reply the matrix of combinations in every case, such that the user does not have to care about the exception for vectors with a single element. In consequence requesting all combinations with 0 elements must reply the empty matrix, because this is the only set, which contains zero elements of the input vektor.

Jan, fantastic submission, thank you. Works in Linux & Matlab 2015a x64 like a charm with gcc.

My only suggestion is the following: can you please add the case of VChooseK(0,0) and (X,0) in general - in my algorithms it returns empty vector [], and nchoosek returns 1 (as it should). Trivial fix, but useful.

Jan Simon

Jan Simon (view profile)

@Nitin Agrawal: No, there is no pure C-code. I meant, that you can use the MSVC2010 version to compile the C-mex function. Sorry.

Nitin Agrawal

@Jan. Thanks for your response. In your earlier reply dated 16 feb 2014, you said that the c-code can be compiled using MSV 2010 without any modifications. Is there a pure C version available with you? Could you upload it.

Jan Simon

Jan Simon (view profile)

@Nitin Agrawal: mex.h belongs to Matlab. If you want to use the function in a C-environment, you neither need mex.h nor the gateway function mexFunction(). Some modifications are required, e.g. the functions for the errors and warnings. I'd suggest to start from the inc-file and implement one data type required for your problem only.

Nitin Agrawal

Hi Jan

I wanted the use this code as a C function in one of my C programs, however when I try to compile this code in C (using gcc), headers file "MEX.h" is found to be missing by the compiler. It in turn required me to have "Pair.h" and "matrix.h", <vector> etc when I tried using a "MEX.h" header file which I found on the internet. Please advice.

Timo Eckhard

@Jan: great function!

For those using Linux, you need to compile using:
'mex -O CFLAGS="\$CFLAGS -std=c99" VChooseK.c'

There is a small mistake in the description in 'VChooseK.m', indicating to compile as:
mex CFLAGS="\$CFLAGS -std=C99" -O VChooseK.c

For those who run into an error because attempting to get a too large number of combinations:

-> Edit Line 94 in 'VChooseK.c':
'#define ERROR_LIMIT 1000000000L // 1 GB (recommended for 32&64 bit systems)'
to something bigger. Keep in mind to have enough memory available

-> recompile the source code

Works great, works fast. Just fine for me...

Timo Eckhard

For those who run into an error because attempting to get a too large number of combinations:

-> Edit Line 94 in 'VChooseK.c':
'#define ERROR_LIMIT 1000000000L // 1 GB (recommended for 32&64 bit systems)'
to something bigger. Keep in mind to have enough memory available

-> recompile the source code

Works great, works fast. Just fine for me...

Jan Simon

Jan Simon (view profile)

@Deepak Gawali: You can compile this C-code with MSVC2010 without any modifications.

If I want to use vchoosek.c in visual studio 2010, will it work or do you have other code to use in vs2010.

Fantastic submission!

For the job I had to do:
nchoosek: 405 seconds
VChooseK: 6.19 seconds

Thanks a lot Jan!

Hassan

Hassan (view profile)

Error in compiler
R2012a
No supported SDK or compiler was found on this computer.
For a list of supported compilers, see
http://www.mathworks.com/support/compilers/R2012a/win64.html

Error using mex (line 206)
Unable to complete successfully.

This works great, much faster than the built in Matlab function, as advertised. Thanks for sharing your work!

Jan Simon

Jan Simon (view profile)

Dear Mark, before you can use the function, you have to compile it as explained in the documentation:
mex -O VChooseK.c
An explicite message will appear in a future version.

Mark Whirdy

Mark Whirdy (view profile)

Hi Jan

I went to use this but came across a snag at the first hurdle, have you an idea of where the problem may lie?

==== Test VChooseK: 19-Dec-2011 18:15:35
Version: J:\Financial_Engineering\QUANT FIXED INCOME\...\VChooseK.m

== DOUBLE, K = 1:
Error in ==> VChooseK at 1
function Y = VChooseK(X, K)

??? Output argument "Y" (and maybe others) not assigned during call to "J:\Financial_Engineering\...\VChooseK.m>VChooseK".

Error in ==> TestVChooseK at 66
S = VChooseK(my_cast([], aClass), K);

Jan Simon

Jan Simon (view profile)

I've deleted the file NChoose2q (limited to K==2), because it had no advantages compared with the more general VChooseK. Please mail me, if this causes problems for anybody.

MATLAB Release
MATLAB 7.8 (R2009a)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video