Hidden Markov model parameter estimates from emissions and states

`[TRANS,EMIS] = hmmestimate(seq,states)`

hmmestimate(...,'Symbols',SYMBOLS)

hmmestimate(...,'Statenames',STATENAMES)

hmmestimate(...,'Pseudoemissions',PSEUDOE)

hmmestimate(...,'Pseudotransitions',PSEUDOTR)

`[TRANS,EMIS] = hmmestimate(seq,states)`

calculates
the maximum likelihood estimate of the transition, `TRANS`

,
and emission, `EMIS`

, probabilities of a hidden Markov
model for sequence, `seq`

, with known states, `states`

.

`hmmestimate(...,'Symbols',SYMBOLS)`

specifies the symbols that are
emitted. `SYMBOLS`

can be a numeric array, a string array or a cell array
of the names of the symbols. The default symbols are integers 1 through N, where N is the
number of possible emissions.

`hmmestimate(...,'Statenames',STATENAMES)`

specifies the names of the
states. `STATENAMES`

can be a numeric array, a string array, or a cell
array of the names of the states. The default state names are 1 through
`M`

, where `M`

is the number of states.

`hmmestimate(...,'Pseudoemissions',PSEUDOE)`

specifies
pseudocount emission values in the matrix `PSEUDOE`

.
Use this argument to avoid zero probability estimates for emissions
with very low probability that might not be represented in the sample
sequence. `PSEUDOE`

should be a matrix of size *m*-by-*n*,
where *m* is the number of states in the hidden Markov
model and *n* is the number of possible emissions.
If the $$i\to k$$ emission does not occur in `seq`

,
you can set `PSEUDOE(i,k)`

to be a positive number
representing an estimate of the expected number of such emissions
in the sequence `seq`

.

`hmmestimate(...,'Pseudotransitions',PSEUDOTR)`

specifies
pseudocount transition values. You can use this argument to avoid
zero probability estimates for transitions with very low probability
that might not be represented in the sample sequence. `PSEUDOTR`

should
be a matrix of size *m*-by-*m*,
where *m* is the number of states in the hidden Markov
model. If the $$i\to j$$ transition does
not occur in `states`

, you can set `PSEUDOTR(i,j)`

to
be a positive number representing an estimate of the expected number
of such transitions in the sequence `states`

.

If the probability of a specific transition or emission is very
low, the transition might never occur in the sequence `states`

,
or the emission might never occur in the sequence `seq`

.
In either case, the algorithm returns a probability of 0 for the given
transition or emission in `TRANS`

or `EMIS`

.
You can compensate for the absence of transition with the `'Pseudotransitions'`

and `'Pseudoemissions'`

arguments.
The simplest way to do this is to set the corresponding entry of `PSEUDOE`

or `PSEUDOTR`

to `1`

.
For example, if the transition $$i\to j$$ does
not occur in `states`

, set ```
PSEUDOTR(i,j)
= 1
```

. This forces `TRANS(i,j)`

to be positive.
If you have an estimate for the expected number of transitions $$i\to j$$ in a sequence of the same length
as `states`

, and the actual number of transitions $$i\to j$$ that occur in `seq`

is
substantially less than what you expect, you can set `PSEUDOTR(i,j)`

to
the expected number. This increases the value of `TRANS(i,j)`

.
For transitions that do occur in states with the frequency you expect,
set the corresponding entry of `PSEUDOTR`

to `0`

,
which does not increase the corresponding entry of `TRANS`

.

If you do not know the sequence of states, use `hmmtrain`

to
estimate the model parameters.

trans = [0.95,0.05; 0.10,0.90]; emis = [1/6 1/6 1/6 1/6 1/6 1/6; 1/10 1/10 1/10 1/10 1/10 1/2]; [seq,states] = hmmgenerate(1000,trans,emis); [estimateTR,estimateE] = hmmestimate(seq,states);

[1] Durbin, R., S. Eddy, A. Krogh, and
G. Mitchison. *Biological Sequence Analysis*.
Cambridge, UK: Cambridge University Press, 1998.