Implementation details of (i)-(iii) can be found in .
The fourth function (gmm2sv.m) is to concatenate the means (i.e. centers) of GMM. The concatenated mean of adapted GMM is known as GMM supervector (GSV) and it is used in GMM-SVM based speaker recognition system. Details of GMM-SVM based speaker recognition system can be found in .
These codes require Netlab toolbox.
 D. A. Reynolds, T. F. Quatieri, and R. B. dunn, "Speaker verification using adapted Gaussian mixture models", Digital signal processing, vol. 10, pp. 19--41, 2000.
 Campbell, W.M.; Sturim, D.E.; Reynolds, D.A.; , "Support vector machines using GMM supervectors for speaker verification," Signal Processing Letters, IEEE , vol.13, no.5, pp. 308- 311, May 2006.
Md Sahidullah (2020). Useful Matlab Functions for Speaker Recognition Using Adapted Gaussian Mixture Model (https://www.mathworks.com/matlabcentral/fileexchange/31678-useful-matlab-functions-for-speaker-recognition-using-adapted-gaussian-mixture-model), MATLAB Central File Exchange. Retrieved .
@sara chellali: This script is for type 'diag' only. "covars 12x12x64" indicates you have used full-covariance based GMM.
when i run this code, i have these errors:
Matrix dimensions must agree.
Error in gmmactiv (line 46)
a(:, j) = exp(-0.5*sum((diffs.*diffs)./(ones(ndata, 1) * mix.covars(j,:)), 2)) ./ (normal*s(j));
Error in gmmpost (line 25)
a = gmmactiv(mix, x);
Error in gmmmap (line 28)
[post, act] = gmmpost(mix, data); %Computes class posterior probabilities
@Sandeep: The choice of relevance factor is ad-hoc. It is usually chosen between 8 to 14 for NIST SREs where the speech utterance length is higher (typically, more than 60 seconds). However, for text-dependent speaker recognition corpus with short segments, a lower value (between 1 to 4) of relevance factor is used.
I suggest setting this value based on the performance in development set.
what is the valure of relevance factor for map adaptation
I think you can. But I would try some implementation of PLDA or deep learning.
This one is intended to be used on speaker ID, but you could adapt it easily, since the important functions take features as arguments:
Can use for image recognition?
Description is updated.
Inspired by: Netlab