Visualize LDA Topic Correlations
This example shows how to analyze correlations between topics in a Latent Dirichlet Allocation (LDA) topic model.
A latent Dirichlet allocation (LDA) model is a topic model which discovers underlying topics in a collection of documents and infers word probabilities in topics. The vectors of per-topic word probabilities characterize the topics. Using the per-topic word probabilities, you can identify correlations between the topics.
Load LDA Model
Load the LDA model
factoryReportsLDAModel which is trained using a data set of factory reports detailing different failure events. For an example showing how to fit an LDA model to a collection of text data, see Analyze Text Data Using Topic Models.
load factoryReportsLDAModel mdl
mdl = ldaModel with properties: NumTopics: 7 WordConcentration: 1 TopicConcentration: 0.5755 CorpusTopicProbabilities: [0.1587 0.1573 0.1551 0.1534 0.1340 ... ] DocumentTopicProbabilities: [480x7 double] TopicWordProbabilities: [158x7 double] Vocabulary: ["item" "occasionally" "get" ... ] TopicOrder: 'initial-fit-probability' FitInfo: [1x1 struct]
Visualize the topics using word clouds.
numTopics = mdl.NumTopics; figure t = tiledlayout("flow"); title(t,"LDA Topics") for i = 1:numTopics nexttile wordcloud(mdl,i); title("Topic " + i) end
Visualize Topic Correlations
Calculate the correlations between the topics using the
corrcoef function with the LDA model topic word probabilities as input.
correlation = corrcoef(mdl.TopicWordProbabilities);
View the correlations in a heat map and label each topic with its top three words. To prevent the heat map from highlighting the trivial correlations between topics each and itself, subtract the identity matrix from the correlations.
For each topic, find the top three words.
numTopics = mdl.NumTopics; for i = 1:numTopics top = topkwords(mdl,3,i); topWords(i) = join(top.Word,", "); end
Plot the correlations using the
figure heatmap(correlation - eye(numTopics), ... XDisplayLabels=topWords, ... YDisplayLabels=topWords) title("LDA Topic Correlations") xlabel("Topic") ylabel("Topic")
For each topic, find the topic with the strongest correlation and display the pairs in a table with the corresponding correlation coefficient.
[topCorrelations,topCorrelatedTopics] = max(correlation - eye(numTopics)); tbl = table; tbl.TopicIndex = (1:numTopics)'; tbl.Topic = topWords'; tbl.TopCorrelatedTopicIndex = topCorrelatedTopics'; tbl.TopCorrelatedTopic = topWords(topCorrelatedTopics)'; tbl.CorrelationCoefficient = topCorrelations'
tbl=7×5 table TopicIndex Topic TopCorrelatedTopicIndex TopCorrelatedTopic CorrelationCoefficient __________ ______________________________ _______________________ ______________________________ ______________________ 1 "mixer, sound, assembler" 5 "mixer, fuse, coolant" 0.34304 2 "scanner, agent, stuck" 4 "scanner, appear, spool" 0.34526 3 "sound, agent, hear" 1 "mixer, sound, assembler" 0.26909 4 "scanner, appear, spool" 2 "scanner, agent, stuck" 0.34526 5 "mixer, fuse, coolant" 1 "mixer, sound, assembler" 0.34304 6 "arm, robot, smoke" 1 "mixer, sound, assembler" 0.0042125 7 "software, sorter, controller" 7 "software, sorter, controller" 0