Path: news.mathworks.com!not-for-mail From: "Jan Simon" <matlab.THIS_YEAR@nMINUSsimon.de> Newsgroups: comp.soft-sys.matlab Subject: Re: Randperm, Randi, and Shuffle Date: Thu, 4 Nov 2010 02:05:04 +0000 (UTC) Organization: Universität Heidelberg Lines: 47 Message-ID: <iat4cg$cpi$1@fred.mathworks.com> References: <iaq1os$sak$1@fred.mathworks.com> <iarp4o$d46$1@fred.mathworks.com> Reply-To: "Jan Simon" <matlab.THIS_YEAR@nMINUSsimon.de> NNTP-Posting-Host: webapp-05-blr.mathworks.com Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: fred.mathworks.com 1288836304 13106 172.30.248.35 (4 Nov 2010 02:05:04 GMT) X-Complaints-To: news@mathworks.com NNTP-Posting-Date: Thu, 4 Nov 2010 02:05:04 +0000 (UTC) X-Newsreader: MATLAB Central Newsreader 869888 Xref: news.mathworks.com comp.soft-sys.matlab:683722 Dear Bruno, You are right: The sorting in UNIQUE belongs to the smaller data set of 125e3 values, while RANDPERM would have to operate on [1:prod(sampbox)]. Therefore Bruno's code is faster than SHUFFLE inspite of the sorting: > a = randi(myrandistr, prod(sampbox), round(SPtemp*1.1), 1); > a = unique(a); > a(SPtemp +1:end) = []; But it is biased also, because the reply of UNIQUE is sorted. But lukily SHUFFLE works efficient here: a = randi(myrandistr, prod(sampbox), round(SPtemp*1.1), 1); a = Shuffle(unique(a)); a(SPtemp +1:end) = []; How safe is 1.1*SPtemp? At least you should include a test of UNIQUE(a) has enough elements. I have not implemented an equivalent method in SHUFFLE, because the computing time grows extremly, if the number of chosen values is gets near to the number of all elements. With other words: It is very hard to draw N unique random numbers from [1:(N+tiny)] using the RANDI approach. But for [1:N*10] the number of collision would be small. Then this approach might be fast: % Pseudocode, to be implemented in C: % Choose N values from [1:M]: x = zeros(1, N); r = zeros(1, N); for i = 1:N ready = false; while ~ready a = randi(M); % Draw if isempty(find(x(1:i) == a)) r(i) = a; % Unsorted list as output x(1:i) = sort([x(1:i-1), a]); % Sorted list for FIND ready = true; end end This could use a binary search to find [a] in the list [x] of already drawn numbers, and reuse the found index for sorting. This competes with the simple method of creating Y=ONES(1,M,'uint8') at first, draw a number X, if Y(X) is 1, accept X and set Y(X) to zero. Finally there are three methods to draw N out of [1:M]: 1. Fisher-Yates Shuffle: efficient for M=N+(tiny value) 2. The dull draw and check method: efficient for M=N+(fair value) 3. Draw and sort (the pseudocode above): efficient for M=N+(giantic value) Implementing these 3 methods is easy, but finding a stable method to choose the best method seems to be cruel: It depends on M, N, the available RAM and the size or the processor cache. I will not start to work on this before 2011, so use Bruno's suggestion. BTW: Did you say, that RANDI is biased? This would be worth a new thread! Going to bed now, expecting N*M*tiny+giantic nightmares, Jan