Path: news.mathworks.com!not-for-mail
From: "Jan Simon" <matlab.THIS_YEAR@nMINUSsimon.de>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Randperm, Randi, and Shuffle
Date: Thu, 4 Nov 2010 02:05:04 +0000 (UTC)
Organization: Universit&#228;t Heidelberg
Lines: 47
Message-ID: <iat4cg$cpi$1@fred.mathworks.com>
References: <iaq1os$sak$1@fred.mathworks.com> <iarp4o$d46$1@fred.mathworks.com>
Reply-To: "Jan Simon" <matlab.THIS_YEAR@nMINUSsimon.de>
NNTP-Posting-Host: webapp-05-blr.mathworks.com
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1288836304 13106 172.30.248.35 (4 Nov 2010 02:05:04 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Thu, 4 Nov 2010 02:05:04 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 869888
Xref: news.mathworks.com comp.soft-sys.matlab:683722

Dear Bruno,

You are right: The sorting in UNIQUE belongs to the smaller data set of 125e3 values, while RANDPERM would have to operate on [1:prod(sampbox)].

Therefore Bruno's code is faster than SHUFFLE inspite of the sorting:

> a = randi(myrandistr, prod(sampbox), round(SPtemp*1.1), 1);
> a = unique(a);
> a(SPtemp +1:end) = [];

But it is biased also, because the reply of UNIQUE is sorted. But lukily SHUFFLE works efficient here:
  a = randi(myrandistr, prod(sampbox), round(SPtemp*1.1), 1);
  a = Shuffle(unique(a));
  a(SPtemp +1:end) = [];

How safe is 1.1*SPtemp? At least you should include a test of UNIQUE(a) has enough elements.

I have not implemented an equivalent method in SHUFFLE, because the computing time grows extremly, if the number of chosen values is gets near to the number of all elements. With other words: It is very hard to draw N unique random numbers from [1:(N+tiny)] using the RANDI approach.
But for [1:N*10] the number of collision would be small. Then this approach might be fast:
% Pseudocode, to be implemented in C:
% Choose N values from [1:M]:
  x = zeros(1, N);
  r = zeros(1, N);
  for i = 1:N
     ready = false;
     while ~ready
        a = randi(M);   % Draw
        if isempty(find(x(1:i) == a))
          r(i) = a;   % Unsorted list as output
          x(1:i) = sort([x(1:i-1), a]);  % Sorted list for FIND
          ready = true;
       end
  end

This could use a binary search to find [a] in the list [x] of already drawn numbers, and reuse the found index for sorting.
This competes with the simple method of creating Y=ONES(1,M,'uint8') at first, draw a number X, if Y(X) is 1, accept X and set Y(X) to zero.
Finally there are three methods to draw N out of [1:M]:
1. Fisher-Yates Shuffle: efficient for M=N+(tiny value)
2. The dull draw and check method: efficient for M=N+(fair value)
3. Draw and sort (the pseudocode above): efficient for M=N+(giantic value)
Implementing these 3 methods is easy, but finding a stable method to choose the best method seems to be cruel: It depends on M, N, the available RAM and the size or the processor cache.
I will not start to work on this before 2011, so use Bruno's suggestion.

BTW: Did you say, that RANDI is biased?
This would be worth a new thread! 

Going to bed now, expecting N*M*tiny+giantic nightmares, Jan