- Analyzing the text of the original novels and short stories of Sherlock Holmes, that is the works written by Sir Arthur Conan Doyle, mainly by fitting the Latent Dirichlet Allocation (LDA) topic model to discover the topics of Sherlock Holmes books
- Only original works of Sherlock Holmes were involved. The total numbers of words in the 2 data sets are 203,936 for the novels and 454,214 for the short stories collections respectively. A total of 658,150 words were processed and analysed.
- 3 classic romance novels written by 3 different authors ("Sense and Sensibility" by Jane Austen, "Wuthering Heights" by Emily Brontë and "Jane Eyre" by Charlotte Brontë) were mixed with Sherlock Holmes short stories collection (i.e. the second set of documents in Part 1) to train another LDA model.
- The 3 additional classic romance novels introduced in Part 2 ("Sense and Sensibility", "Wuthering Heights" and "Jane Eyre") are 186,302 words, 116,537 words and 119,580 words respectively, summing to 422,419 words. The total numbers of words processed in Part 2 of this project amounts to 1,080,569. Undoubtedly, the almost doubled number of words processed and analysed generated a more accurate result.
- This newly trained model was then used to examine the topic mixtures of Sherlock Holmes novels (i.e. the first set of documents in Part 1).
Carrie Ching (2023). Text Analytics of Sherlock Holmes Books Using Topic Model (https://www.mathworks.com/matlabcentral/fileexchange/114410-text-analytics-of-sherlock-holmes-books-using-topic-model), MATLAB Central File Exchange. Retrieved .
MATLAB Release Compatibility
Platform CompatibilityWindows macOS Linux
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!