I'm looking for a better way to compute the possible sequences of a random variable whos value at time k is given by.
x(k) = 1 with probability p(1), 2 with probability p(2), ... np with probability p(np).
However, since the number of possible sequences increases exponentially with n and np, I only want to compute the most probable sequences which have a combined probability less than a threshold p_cut.
I've devised a function to do this below but it has two major drawbacks.
- It doesn't generalize to sequences longer than 3.
- It is quite memory inefficient since it always computes all possible permutations when only a subset may be needed.
function S = most_probable_sequences(p, n, p_cut)
assert(sum(p) == 1);
np = numel(p);
switch n
case 1
all_perms = (1:np)';
case 2
[X,Y] = ndgrid(1:np,1:np);
all_perms = [X(:) Y(:)];
case 3
[X,Y,Z] = ndgrid(1:np,1:np,1:np);
all_perms = [X(:) Y(:) Z(:)];
otherwise
error("n > 3 not implemented")
end
probs = prod(reshape(p(all_perms), [], n), 2);
[probs_sorted, order] = sort(probs, 'descend');
most_prob = cumsum(probs_sorted) <= p_cut;
S = all_perms(order(most_prob), :);
end
Examples of correct output of this function:
>> S = most_probable_sequences([0.95 0.04 0.01], 1, 0.99)
S =
1
2
>> S = most_probable_sequences([0.01 0.04 0.95], 2, 0.99)
S =
3 3
3 2
2 3
3 1