View License

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video

Highlights from

5.0 | 7 ratings Rate this file 39 Downloads (last 30 days) File Size: 19.3 KB File ID: #27076 Version: 1.4
image thumbnail



Jan Simon (view profile)


24 Mar 2010 (Updated )

Random permutation of array elements, C-Mex: much faster than RANDPERM

| Watch this File

File Information

Shuffle - Random permutation of array elements

This function is equivalent to X(RANDPERM(LENGTH(X)), but 50% to 85% faster. It uses D.E. Knuth's shuffle algorithm (also called Fisher-Yates) and the cute KISS random number generator (G. Marsaglia). While RANDPERM needs 2*LENGTH(X)*8 bytes as temporary memory, SHUFFLE needs just a fixed small number of bytes.

1. Inplace shuffling: Y = Shuffle(X, Dim)
  X: DOUBLE, SINGLE, CHAR, LOGICAL, (U)INT64/32/16/8 array.
  Dim: Dimension to operate on. Optional, default: 1st non-singleton dimension.
  Y: Array of same type and size as X with shuffled elements.

2. Create a shuffle index: Index = Shuffle(N, 'index', NOut)
This is equivalent to Matlab's RANDPERM, but much faster, if N is large and NOut is small.
  N: Integer number.
  NOut: The number of output elements. Optional, default: N.
  Index: [1:NOut] elements of shuffled [1:N] vector in the smallest possible integer type.

3. Derangement index:
Index = Shuffle(N, 'derange', NOut)
Equivalent to the index method, but all Index[i] ~= i. A rejection method is used: Create an index vector until a derangement is gained.

  R = Shuffle(1:8) % [8, 1, 2, 6, 4, 3, 5, 7]
  R = Shuffle('abcdefg') % 'efbadcg'
  R = Shuffle([1:4; 5:8], 2) % [3, 2, 1, 4; 6, 8, 7, 5]
  I = Shuffle(8, 'index'); % UINT8([1, 5, 7, 6, 2, 3, 4, 8])
Choose 10 different rows from a 1000 x 100 matrix:
  X = rand(1000, 100); Y = X(Shuffle(1000, 'index', 10), :);
Operate on cells or complex arrays:
  C = {9, 's', 1:5}; SC = C(Shuffle(numel(C), 'index'));
  M = rand(3) + i * rand(3); SM = M(:, Shuffle(size(C, 2), 'index'))

NOTES: There are several other shuffle functions in the FEX. Some use Knuth's method also, some call RANDPERM. This implementation is faster due to calling a compiled MEX file and it has a smaller memory footprint. The KISS random numbers are much better than the RAND() of the C-standard libs.

Run the unit-test TestShuffle to test validity and speed (see screenshot).
Tested: Matlab 6.5, 7.7, 7.8, 32bit, WinXP,
Compiler: LCC 2.4/3.8, BCC 5.5, Open Watcom 1.8, MSVC 2008.
Compatibility to 64 bit, Linux and Mac is assumed.
Pre-compiled Mex:

MATLAB release MATLAB 7.8 (R2009a)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (32)
07 Feb 2016 Jan Simon

Jan Simon (view profile)

@Stephen: The statement about X(RANDPERM(LENGTH(X)) concerns vectors only. For arrays this function Shuffle() operates on a specific dimension and there is no comparable one-line command containing RANDPERM. So for arrays Shuffle() does *not* perform X(randperm(numel(X))) .

Comment only
05 Feb 2016 Stephen Cobeldick

The description is incorrect, because |length| does not do what the author seems to think it does. For arrays |length| is not |numel|!. The description states that it does this (using |length|):

>> X = randi(9,3)
X =
9 4 3
6 3 7
4 6 1
>> X(randperm(length(X)))
ans =
4 6 9

Whereas the function actually returns _all_ elements, as per |numel|:

>> X(randperm(numel(X)))
ans =

1 7 6 3 4 6 9 4 3

Comment only
25 Sep 2012 Ephedyn

08 Aug 2011 Willem-Jan de Goeij

Dear Jan,

I've tested calling your function from inside a parfor loop. The results show that repetitions occur.

>> matlabpool local
Starting matlabpool using the 'local' configuration ... connected to 3 labs.
>> shuffleparfor
output =

8 1 3 7 5 2 6 4
2 3 8 7 1 5 6 4
8 1 3 7 5 2 6 4
2 3 8 7 1 5 6 4
8 1 3 7 5 2 6 4
2 3 8 7 1 5 6 4
4 8 7 3 6 2 5 1
4 8 7 3 6 2 5 1
4 8 7 3 6 2 5 1

output = zeros(9,8);
Shuffle(1234567890, 'seed');

parfor ii=1:9
output(ii, :) = Shuffle(8, 'index');

Comment only
03 Aug 2011 Willem-Jan de Goeij

Hi Jan,

In reply to your first comment from 31/07:
Setting the seed to randi([0, 2^32-1], 1, 4) in every parfor loop would be advisible, I guess?
On my computer 2^32-1+d becomes zero when casted to uint32_T, with d >= 0.999999762.

I'm setting seeds because I also want to be able to reproduce the results.

Comment only
01 Aug 2011 Willem-Jan de Goeij

I use parfor loops on my 64 bit computer with Matlab 64 bit.

I see there are only 32 bit mex compilations on your website.

Does anyone have a 64 bit mex compilation of this function?

Comment only
31 Jul 2011 Jan Simon

Jan Simon (view profile)

@Willem-Jan: Sorry, I'm afraid that my advice was wrong. I do not have the Parallel Computing Toolbox, such that such that I cannot test it. Please check, if using the *same* seed in all PARFOR branchs lead to the *same* permutations.

Comment only
31 Jul 2011 Jan Simon

Jan Simon (view profile)

@Willem-Jan: The problem for multiple seeding is to get independent seeds. Choosing a seed based on the current time is fair on a single thread, but likely to fail in a PARFOR loop. But if the seeds are guaranteed to be very different, the random numbers are independent.
I suggest to use 4*32 bit values with high entropy to seed Shuffle for each PARFOR thread. A cheap source for such seeds is (n is number of PARFOR threads) "S=rand(4, n)*2^32". Then seed the iteration with S(:, n).

Comment only
31 Jul 2011 Willem-Jan de Goeij

Hi, can this version be used for parallel computing (parfor loops)?

I've read that by setting the seed multiple times, let's say once in every parfor loop iteration, it is not guaranteed that the draws are independent (for example rand function).

That's why I'm using the mrg32k3a stream which supports substreams which are guaranteed to be independent.

Would it be difficult to create a version that supports substreams?

Comment only
21 Jul 2011 Jan Simon

Jan Simon (view profile)

@Hessam: Use the index-method to shuffle the rows and keep the columns:
X = rand(3, 4800);
S = X(:, Shuffle(4800, 'index'));

Comment only
11 Jul 2011 Hessam

Hessam (view profile)

I have the a data Matrix in the format:
the first row is T, the Second M and the 3rd one D.
[2.91000E+02 2.91000E+02 2.91000E+02;
7.47867E-43 7.47867E-43 7.47867E-43;
4.77566E-82 5.30311E-83 0.00000E+00]

It's just a part of my 3*4800 matrix.
Each column should be the same after shuiffelling. I mean just the place of the columns should be changed not the rows

Your help is mostly appreciated

Comment only
06 Jul 2011 Jan Simon

Jan Simon (view profile)

Does anybody need a faster multi-threaded version? It is not worth to improve the program, if nobody needs it.

Comment only
13 Feb 2011 Jan Simon

Jan Simon (view profile)

@Daniel: This and another bug will be fixed in the next update: Shuffle(>2^32, 'index') relies bad values with a tiny chance.

Comment only
10 Feb 2011 Daniel Shub

I think there is a bug with the seeding. If I start up MATLAB fresh

a = Shuffle(1:100);
Shuffle([], 'seed');
b = Shuffle(1:100);
all(a == b)

does not yield true.

Looking at the c code on line 174 kc is set equal to 7654321. I think this is part of the default seed when Shuffle first launches. On line 1062 kc is set equal to 698769069, which is part of the seed when the seed is reset.

Comment only
07 Feb 2011 Bruno Luong

Bruno Luong (view profile)

Useful addition to FEX

23 Apr 2010 James Tursa

James Tursa (view profile)

My apologies. I knew about the mixed integer arithmetic bug with lcc 2.4 and had accounted for it in the code I sent you but neglected to point this out (sorry about that). You managed to discover this on your own. Good job!

23 Apr 2010 Jan Simon

Jan Simon (view profile)

@James: Now KISS works with LCC 2.4 also, after casting the 32 bit integers to 64 bit explicitely to avoid mixed integer arithmetics. Your code was very helpful to find this solution! Thanks again!

I've added a compiler flag to choose fast&tiny bias or slow&unbiased random integers.

Comment only
17 Apr 2010 James Tursa

James Tursa (view profile)

P.S. Did you notice I changed your 698769069UL constant to 698769069ULL? That makes a difference.

Comment only
17 Apr 2010 Jan Simon

Jan Simon (view profile)

@James: Thank you! I've tried KISS with LCC 2.4 and signed long long also, but the replies differ from other compilers for some seeds (seems to depend on the highest bit). Marsaglia has published a KISS generator with pure 32 bit processing also, but I decided not to adjust the C-source to an obviously buggy compiler. I assume, TMW has the power to include an improved version of LCC :-)

Comment only
17 Apr 2010 James Tursa

James Tursa (view profile)

As you have discovered, the lcc C compiler that ships with MATLAB has bugs for 64-bit integer arithmetic, particularly the unsigned long long type. Fortunately in some cases you can work around this by using signed 64-bit types. Basically you use unspecified behavior and rely on 2's complement representation for the signed integer types and you also rely on signed integer overflow doing modulo arithmetic. This is the case for the PC, so the technique works on that platform (but is not guaranteed to be portable). I sent you a modified version of your Shuffle.c source code that has modified KISS() and KISS_d() routines that *seem* to work for lcc. Basically, instead of doing the unsigned 64-bit arithmetic I do it in 32-bit pieces as part of 64-bit signed words, and then use a union to get the result back to unsigned 64-bit type. HTH.

Comment only
14 Apr 2010 Oliver Woodford

Nice and fast. I welcome the ability to specify the output length.

13 Apr 2010 Sebastien PARIS

07 Apr 2010 Jan Simon

Jan Simon (view profile)

@Derek: I cannot see any unexpected memory occupation by SHUFFLE.MEX. Example: x = rand(1, 1e7); for i=1:100; x = Shuffle(x); end. I assume, it is a problem in your ShuffleMexPerm32. Please send me this function by email and let us discuss either by email or in the NG.
You can create int32(1:n) efficiently with int32(1):int32(n).

SHUFFLE will be updated soon with new features.

Comment only
07 Apr 2010 Derek O'Connor

Your Shuffle(Mex) is faster of course, but the memory overhead still exists. Here is the memory profile using Windows Task Manager:

>clear all Tot Mem : 3.5GB
>n=10^9;tic;p = ShuffleMexPerm32(n);toc Tot Mem : 10.8GB
Elapsed time is 109.555864 seconds. Tot Mem : 7.2GB
>clear all Tot Mem : 3.5GB

Apart from no int32 arithmetic, Matlab's memory manager seems to be using more than expected. I don't see any solution to this in the short term. R2020a, perhaps?

Derek O'Connor

Comment only
31 Mar 2010 Jan Simon

Jan Simon (view profile)

Currently SHUFFLE operates on the real part of an array only. If several users need to shuffle the imaginary part also, I could implement this also. So please mail me on demand - thanks. Jan

@Derek: While your M-code ShuffleF3264 suffers from using 32bit variables, Shuffle.c does not as far as I can see. So I'd simply suggest to use the MEX function. The same for the memory overhead.

@Derec and Rob: I agree that returning N values would be useful. But internally the Knuth-shuffle has to process the full array even for this case and it is not faster than creating the whole array at first and cutting it afterwards! SHUFFLE is designed to operate on large arrays. To get e.g. 100 randomly out of 1e7, a completely different algorithm should be used - in a different function!
It mightbe more important to implement a "derangement", which forces each element to have a different location after sorting.

I've found Ragaar Ashnod' Fisher-Yates-Shuffle as M-code today:

Comment only
30 Mar 2010 Derek O'Connor

There is no real speed difference between i = 2:n and Knuth's i = n:-1:2. I call this ShuffleF because it uses r = floor(rand*i)+1. I call your version ShuffleC because it uses r = ceil(rand*i).

Memory "Overhead"

You are right about p =int32(1:n) creating a temporary DOUBLE. However, I noticed that p = zeros(n,1,'int32') creates no temporary double, but now I have to initialize p=I with for i = 1:n, p(i)=i, end; This is a bit slow but allowed me to create an int32(p) of length n = 3.5x10^9 (45 secs) of size 14GB, without disk swapping. Wonderful!

I was about to ShuffleF this large p vector but caution intruded and I tested it with n = 10^8, with p using just 0.4GB. It took 244 secs, about 10 times what I expected. Here's the reason: int32 'arithmetic' is slow, as this function shows.

> function p = ShuffleF3264(p,bits);
> % Shuffle a vector p using Knuth's Algorithm P,
> % Section 3.4.2, TAOCP, Vol 2, 2nd Ed.
> % This shows the difference between Matlab's
> % normal 64-bit double integers and 32-bit int32 integers.
> %
> % Derek O'Connor, 30 Mar 2010
> %
> n=length(p);
> if bits == 32 % This type propagates below and
> n = int32(n); % causes a big slowdown
> end;
> for i = n:-1:2
> r = floor(rand*i)+1; % random integer between 1 and i
> t = p(r);
> p(r) = p(i); % Swap(p(r),p(i))
> p(i) = t;
> end;
> return; % ShuffleF3264

>> n = 10^8; p = 1:n; % 'normal' Matlab integers (doubles)
>> tic; p = ShuffleF3264(p,64);toc
>> Elapsed time is 30.057698 seconds.
>> tic; p = ShuffleF3264(p,32);toc
>> Elapsed time is 243.815251 seconds.

So, I can generate huge permutations, but I can't do anything with them, except wait for a 1000GHz, liquid-helium-cooled PC.

As Henry Ford said : You can have any type you want, as long as it's a double.

Derek O'Connor

Comment only
30 Mar 2010 Rob Campbell

Rob Campbell (view profile)

The Mex-based shuffle is much faster than randperm although the m-based shuffle is slower in some cases. I can see myself using the Mex-based shuffle in some of my work.

Author was very helpful in getting the Mex code to compile on my Linux box.

I agree with Oliver, it would be useful to have the ability to return only N values.

30 Mar 2010 Oliver Woodford

A parameter allowing only N values to be returned would be good, as this would speed up many sampling-based methods, e.g. RANSAC, which require only a subset of the data each iteration. Also useful would be to work on matrices and return the rows shuffled, and even potentially to name the dimension to shuffle along.

Comment only
29 Mar 2010 Derek O'Connor

Dear Jan,

You are right, the (Knuth) loop should be << for i = n:-1:2 >>. I was trying to follow Knuth exactly and have got it wrong, twice. I'm not sure why Knuth uses a backward loop, but I seem to remember (1970s) that decrementing and testing a loop counter was faster than incrementing it. I don't think it makes any difference, but I'll check.

On memory overhead, my table was mangled and is hard to read, so let me give one example. For n = 5x10^8, p = int32(1:n) occupies 4x5x10^8 = 2GB. When I ran << p = int32(randperm(n))>> I used Windows Task Manager (WTM) to monitor cpu and memory use and it showed memory use climbing from 2.5GB up to 13GB. Memory overhead = 13-2-2.5 = 8.5GB. When I ran << p = ShufflePermF32(n) >> WTM shows memory use going from 2.5GB to 7.3GB Memory overhead = 7.3-2-2.5 = 2.5GB.

What this means is that with 16GB RAM the largest << p = int32(randperm(n))>> is 2GB or n = 5x10^8, while the largest << p = ShufflePermF32(n) >> is 4GB or n = 10^9. In this case WTM showed memory use rising from 2.5 to 12 for about 10secs, and dropping to 5.5GB. Where or why is this memory overhead used?

Derek O'Connor

Comment only
29 Mar 2010 Jan Simon

Jan Simon (view profile)

Dear Derek: Thanks for the comments! How do you measure "memory overhead"? ShufflePerm32F creates a temporary [1 x n] object of type DOUBLE in: p = int32(1:n). Does this influence the "memory overhead"? I assume that the FOR loop does *not* create another 1:n vector explicitely.

Do you find statistical difference between looping from n=2:n or n:-1:2 (n==1 can be omitted)?

Comment only
29 Mar 2010 Derek O'Connor


The loop % for i = 1:n % should be % for i = n:-1:1 %

Derek O'Connor

Comment only
28 Mar 2010 Derek O'Connor

This is a great idea for generating random permutations because, as you say, "While RANDPERM needs 2*LENGTH(X)*8 bytes as temporary memory, SHUFFLE needs just a fixed small number of bytes."

Here are some timing tests for my machine (Trp = time for randperm, Trs = time for Shuffle)

times = TestRandPShufP32('mex')
% n Trp Trs Trp/Trs
% 1e+005 1.68e-002 1.48e-003 1.13e+001
% 1e+006 2.06e-001 1.86e-002 1.10e+001
% 1e+007 2.40e+000 5.01e-001 4.78e+000
% 1e+008 2.71e+001 6.44e+000 4.20e+000
% times = TestRandPShufP32('m')
% n Trp Trs Trp/Trs
% 1e+005 1.64e-002 1.48e-002 1.11e+000
% 1e+006 2.06e-001 1.58e-001 1.30e+000
% 1e+007 2.40e+000 2.40e+000 1.00e+000
% 1e+008 2.73e+001 2.82e+001 9.68e-001
% MATLAB Version (R2008a) 64-bit on a Dell Precision
% 690 2 x Quadcore Xeon E5345 2.33GHz, Windows7 64-bit, MSVC 2008.

The mex version of your Shuffle is much faster than randperm. The non-mexed version is about the same as randperm.

Memory use is lower for Shuttle but there is still a lot of memory 'overhead'.

% p = int32(randperm(n); or p = ShufflePerm32F(n);

% n 10^8 5x10^8 10^9
% size(p) GB 0.4 2.0 4.0
% randperm GB 4.5 13.0 >16
% ShufflePerm32F GB 3.0 7.3 12-->5.5

% ----------------------------------------------------------
% function p = ShufflePerm32F(n)
% % Generate a random permutation using
% % Knuth's Algorithm P, Section 3.4.2
% % TAOCP, Vol 2, 2nd Ed., 1981.
% % Derek O'Connor, 28 Mar 2010
% p = int32(1:n); % n <= 2^32-1 = 2,147,483,647 = 2G
% % 4Bytes x 2G = 8GB
% for i = 1:n
% r = floor(rand*i)+1; % random integer between 1 and i
% t = p(r);
% p(r) = p(i); % Swap(p(r),p(i))
% p(i) = t;
% end
% return; % ShufflePerm32F
% ---------------------------------------------------------

Each test started from a base of about 2GB with a 'clear all' between tests.

Any ideas why there is so much memory overhead?

Derek O'Connor.

12 Apr 2010 1.1

Create index vector with optionally reduced length, dimension to operate on can be specified. *New interface* !

23 Apr 2010 1.2

Can be compiled with LCC 2.4 (shipped with Matlab) now. Thanks to James Tursa for the helpful ideas!

15 Jul 2010 1.3

Bugfix for [1 x 4] seeds.
Shuffle([], 'seed') to initialize the RNG with the default seed.
Mex is locked in the memory to prevent the RNG status from being reinitialiized by CLEAR ALL (thanks to Ulrik Nash!).

06 Mar 2011 1.4

Some percent faster, bugfix for 64-bit index vectors, derangements - Thanks to Derek O'Connor.

Contact us