Estimation of LDA with Collapsed Gibbs Sampling and marginal data density

Hi,
I'm using the Text Analytics Toolbox to estimate an LDA Model using the fitlda function with the Collapsed Gibbs Sampling ("cgs", as in [3] Griffiths, Thomas L., and Mark Steyvers. "Finding scientific topics." Proceedings of the National academy of Sciences 101, no. suppl 1 (2004): 5228–5235). After the estimation of the model is completed (fixing a certain number of topics), I'm not sure how to get the marginal data density (that is: the same measure use in figure 3 of the Griffiths et all paper). In other words: the marginal likelihood that corresponds to the Probability(w|Topics) in-sample. To achieve this, I'm currently using the logp function (https://www.mathworks.com/help/textanalytics/ref/ldamodel.logp.html). In particular, I'm using the logProb output of the function called with the fitted ldaModel and the same documents I used to fit the lda (I would like the in-sample marginal data density):
Does this correspond to the marginal data density? The Probability(w|Topics) used in the Griffith paper (Figure 3)?
I opened the code of the logp function to take a look at the function's code and I noticed that the logProb is calculated through another function:
logprob = clustering.ldautils.isip(alphas,Phi,documents,nsample,Pzw,rsh);
but I cannot access the function to see what exactly does. I can see that the perplexity is just a transformation of logProb: ppl = exp(-nansum(logprob)/num_words).
Thanks a lot for your support.

Answers (0)

Categories

Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange

Asked:

on 9 May 2020

Edited:

on 10 May 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!