Text Analytics of Sherlock Holmes Books Using Topic Model

This project used the LDA topic model in Matlab to analyze the texts in the original novels and short stories of Sherlock Holmes.

15 Downloads

Updated 18 Dec 2022

View License

This is a course project, where students were free to decide the topic, objective and which modeling/simulation techniques introduced in the course to apply.
Project Objective:
  • Analyzing the text of the original novels and short stories of Sherlock Holmes, that is the works written by Sir Arthur Conan Doyle, mainly by fitting the Latent Dirichlet Allocation (LDA) topic model to discover the topics of Sherlock Holmes books
Background, Methodology, Results and Discussion:
This project was divided into 2 parts.
Part 1:
  • Only original works of Sherlock Holmes were involved. The total numbers of words in the 2 data sets are 203,936 for the novels and 454,214 for the short stories collections respectively. A total of 658,150 words were processed and analysed.
Part 2:
  • 3 classic romance novels written by 3 different authors ("Sense and Sensibility" by Jane Austen, "Wuthering Heights" by Emily Brontë and "Jane Eyre" by Charlotte Brontë) were mixed with Sherlock Holmes short stories collection (i.e. the second set of documents in Part 1) to train another LDA model.
  • The 3 additional classic romance novels introduced in Part 2 ("Sense and Sensibility", "Wuthering Heights" and "Jane Eyre") are 186,302 words, 116,537 words and 119,580 words respectively, summing to 422,419 words. The total numbers of words processed in Part 2 of this project amounts to 1,080,569. Undoubtedly, the almost doubled number of words processed and analysed generated a more accurate result.
  • This newly trained model was then used to examine the topic mixtures of Sherlock Holmes novels (i.e. the first set of documents in Part 1).
A brief account of the project can be found in the presentation file <SH_TopicModelling_comb.pdf>.
Codes for modeling can be found in the two .mlx files

Cite As

Carrie Ching (2023). Text Analytics of Sherlock Holmes Books Using Topic Model (https://www.mathworks.com/matlabcentral/fileexchange/114410-text-analytics-of-sherlock-holmes-books-using-topic-model), MATLAB Central File Exchange. Retrieved .

MATLAB Release Compatibility
Created with R2022a
Compatible with any release
Platform Compatibility
Windows macOS Linux
Tags Add Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Version Published Release Notes
1.0.2

the presentation is included now

1.0.1

code files should be downloadable

1.0.0