Implementation details of (i)-(iii) can be found in .
The fourth function (gmm2sv.m) is to concatenate the means (i.e. centers) of GMM. The concatenated mean of adapted GMM is known as GMM supervector (GSV) and it is used in GMM-SVM based speaker recognition system. Details of GMM-SVM based speaker recognition system can be found in .
These codes require Netlab toolbox.
 D. A. Reynolds, T. F. Quatieri, and R. B. dunn, "Speaker verification using adapted Gaussian mixture models", Digital signal processing, vol. 10, pp. 19--41, 2000.
 Campbell, W.M.; Sturim, D.E.; Reynolds, D.A.; , "Support vector machines using GMM supervectors for speaker verification," Signal Processing Letters, IEEE , vol.13, no.5, pp. 308- 311, May 2006.
I think you can. But I would try some implementation of PLDA or deep learning.
This one is intended to be used on speaker ID, but you could adapt it easily, since the important functions take features as arguments:
Can use for image recognition?
Description is updated.