Thanks a lot for your help.
Below is the way I finally implemented it (quite similar to your
ideas). It runs quite fast, and though it's hard to verify that the
matrices are indeed generated UAR, at least the plots show that all
the entries in the generated matrices are identically distributed.
Here's the simple code. Maybe I'll also upload it later as a shared
file.
% Simple algorithm for generating doublystochastic matrices
% (matrices, where the sum of each column and each row is exactly 1).
% The algorithm:
% 1. set an NxN matrix TM s.t TM[i,j] = 1/N for each 1<=i,j<=N.
% 2. for X number of iterations:
% 3. Draw i1, j1, i2, j2 UAR on [1,...,N].
% 4. Draw d UAR on (0, min {TM[i1, j1], TM[i2, j2]}).
% 5. M[i1,j1] <= M[i1,j1]  d;
% 6. M[i2,j2] <= M[i2,j2]  d;
% 7. M[i1,j2] <= M[i1,j2] + d;
% 8. M[i2,j1] <= M[i2,j1] + d;
% For a "good" distribution, X should be a (polynomial, probably)
function of N.
% I guess that X=N^4 is enough; for large N, maybe even a smaller X
is
% enough.
% The algorithm's code
% For make the generation more efficient, the code actually generates
many
% doublystochastic matrices "almost" at once.
% Constants
N = 4;
NUM_OF_SMPLS = 1000;
NUM_OF_BINS = 10;
BIN_INTERVAL = 1 / NUM_OF_BINS;
ar_of_TMs = repmat (ones (N) / N, [1 1 NUM_OF_SMPLS]);
for iter = 1: N^4
indices = unidrnd (N, [NUM_OF_SMPLS 4]); % random indices i1, j1,
i2, j2 for all the sampled TMs
i1 = indices (:, 1); j1 = indices (:, 2); i2 = indices (:, 3); j2 =
indices (:, 4);
for smp_num = 1:NUM_OF_SMPLS
% ar_of_max_delta (i) will hold min {TM(i1,j1), TM(i2,j2)} for
the ith TM in the array; this is
% the maximum allowed value for DELTA for this TM in the current
% iteration
ar_of_max_delta (smp_num) = min (ar_of_TMs (i1 (smp_num) ,
j1(smp_num), smp_num), ...
ar_of_TMs (i2 (smp_num) , j2(smp_num), smp_num));
end; % for smp_num
ar_of_delta = unifrnd (0, ar_of_max_delta); % Generate DELTA values
for all the TMs
clear ar_of_max_delta; % not needed anymore  save memory space
% add / substruct DELTA from the required entries in each TM
for smp_num = 1:NUM_OF_SMPLS
ar_of_TMs (i1 (smp_num), j1 (smp_num), smp_num) = ...
ar_of_TMs (i1 (smp_num), j1 (smp_num), smp_num)  ar_of_delta
(smp_num);
ar_of_TMs (i2 (smp_num), j2 (smp_num), smp_num) = ...
ar_of_TMs (i2 (smp_num), j2 (smp_num), smp_num)  ar_of_delta
(smp_num);
ar_of_TMs (i1 (smp_num), j2 (smp_num), smp_num) = ...
ar_of_TMs (i1 (smp_num), j2 (smp_num), smp_num) + ar_of_delta
(smp_num);
ar_of_TMs (i2 (smp_num), j1 (smp_num), smp_num) = ...
ar_of_TMs (i2 (smp_num), j1 (smp_num), smp_num) + ar_of_delta
(smp_num);
end; % for smp_num
clear ar_of_delta;
if (sum (sum (ar_of_TMs (:, :, smp_num)>0))~=N^2)
error ('rgrgr');
end;
end; % for iter
% An example of a plot for the PDFs: plot the PDFs of the entries in
the
% jth col.
j = 1;
for i=1:N
ar_of_TMij = zeros (NUM_OF_SMPLS, 1);
ar_of_TMij (:) = ar_of_TMs (i, j, 1:NUM_OF_SMPLS);
ar_of_bins = floor ( ar_of_TMij / BIN_INTERVAL + 1);
for bin = 1:NUM_OF_BINS
pdfer (bin, i) = size (find (ar_of_bins==bin), 1);
end; % for bin
end; % for i
bar (pdfer);
legend ('Surface: bar PDF of the entries in a single row');
