File Exchange

## BinaryMatrixUniform​Rnd.m

version 2.0 (16.7 KB) by

approximate uniform generation of binary matrices with specified margins

Updated

Given the sequence of row and column sums (the margins) of a zero-one valued matrix, there may be many other binary matrices with the same margins. Efficient algorithms for randomly sampling from the uniform distribution over this space of matrices are not known, except for small or very sparse matrices (see http://arxiv.org/abs/1301.6635). In many (but not all) cases, however, the algorithm in this m-file comes very close to uniform. It can be used for an importance sampling proposal distribution when the target distribution is uniform. An unusual feature is that the importance sampling variability tends to improve as the size of the matrix gets large, especially if the margins are not too variable. (See http://arxiv.org/abs/0906.1004 for details.)
Detailed usage:
function [logQ,alist,B] = BinaryMatrixUniformRnd(N,r,c,Flag1,Val1,Flag2,Val2,...)

r is mx1 nonnegative integers
c is 1xn nonnegative integers
N is a positive integer

Approximate sampling from the uniform distribution over mxn binary matrices
with row sums r and column sums c. Generates N independent samples.

An error is generated if no binary matrix agrees with r and c.

alist stores the locations of the ones in the samples.
If d = sum(r) = sum(c), then alist is 2 x d x N.

The 1-entries of the kth matrix are stored as alist(:,:,k). The
(row,column) indices are (alist(1,t,k),alist(2,t,k)) for t=1:d.

If the third output B is requested, then the matrices are explicitly
created via:
B = false(m,n,N); for k = 1:N, for t = 1:d, B(alist(1,t,k),alist(2,t,k),k) = true; end, end

so that B(:,:,k) is the kth random matrix.

logQ(k)=log(probability that algorithm generates B(:,:,k))

OPTIONS: The optional arguments are Flag,Value pairs and can be provided
in any order. The flags are strings (not case sensitive).
In the description below, the flags are listed first, followed by a
description of the allowable values.

'Binput': The value for Binput is a mxn binary matrix.
If it is provided, then alist(:,:,1) agrees with Binput and logQ(1)
computes the corresponding log probability.
Omit this flag or use Binput = [] to skip this option.

'orient': {'rows','cols','fast' (default),'slow'} determines
whether the algorithm loops over rows ('rows') or columns ('cols').
'fast' loops over the largest dimension (and is fastest). 'slow' does
the opposite, but often has less variable importance weights.

'sort': {'same','descend' (default)} determines the selection
order for the entries of the dimension that is looped over. 'same'
leaves them in the same order and 'descend' sorts descending, which
tends to work best.

'prune': {'none','rows','cols','both' (default)} determines whether
zero margins are pruned before using the algorithm.

'approx': {'canfield' (default), 'greenhill', 'cdhl', 'oneil'} determines
which asymptotic enumeration formula is used in the approximation ...
'greenhill' can be better for sparse and highly irregular margins.
'cdhl' and 'oneil' are useful for comparing to Chen et al (2005).

'log': {1 (true), 0 (false) (default)} determines whether to use a
logarithmic transformation inside the algorithm to avoid underflow.
Setting 'log' to true will allow the algorithm to scale to larger and
more imbalanced matrices, at the expense of slower execution time.
If a certain matrix gives highly variable importance weights, it can
be a good idea to try setting 'log' to true.

'mex': {1 (true) (default), 0 (false)}. Setting 'mex' to false forces the
algorithm to use mfile code even if a mex implementation is
available. This is only useful for debugging.

To provide a mex implementation (often 5-10 times faster), type
mex BinaryMatrixUniformRndC.cpp
at the Matlab prompt. This will create a mex file that will be called
automatically when you use this m-file. Both implementations should
give nearly identical results (as long as the random number generator is
in the same state), but the two implementations may not leave the random
number generator in the same state upon exiting. Any differences in output
of the two implementations are the result of slight differences in
floating point arithmetic. It is possible that these slight differences
could interact with the random number generator to create substantial
changes in output, but this should be incredibly rare.

If you use this software, please reference:

M Harrison (2009) A dynamic programming approach for approximate uniform
generation of binary matrices with specified margins. arXiv:0906.1004
(or see if there is an updated published reference)

Michal Kvasnicka

### Michal Kvasnicka (view profile)

Very nice piece of code. Perfect description and effective implementation (MATLAB and C-MEX version).

Well done ... thanks!!!