This example shows how to visualize text data using word clouds.
Text Analytics Toolbox extends the functionality of the
wordcloud (MATLAB) function. It adds support for creating word clouds directly from string arrays, and creating word clouds from bag-of-words models and LDA topics.
Load the example data. The file
weatherReports.csv contains weather reports, including a text description and categorical labels for each event.
filename = "weatherReports.csv"; T = readtable(filename,'TextType','string');
Extract the text data from the
textData = T.event_narrative; textData(1:10)
ans = 10x1 string array "Large tree down between Plantersville and Nettleton." "One to two feet of deep standing water developed on a street on the Winthrop University campus after more than an inch of rain fell in less than an hour. One vehicle was stalled in the water." "NWS Columbia relayed a report of trees blown down along Tom Hall St." "Media reported two trees blown down along I-40 in the Old Fort area." "" "A few tree limbs greater than 6 inches down on HWY 18 in Roseland." "Awning blown off a building on Lamar Avenue. Multiple trees down near the intersection of Winchester and Perkins." "Quarter size hail near Rosemark." "Tin roof ripped off house on Old Memphis Road near Billings Drive. Several large trees down in the area." "Powerlines down at Walnut Grove and Cherry Lane roads."
Create a word cloud from all the weather reports.
figure wordcloud(textData); title("Weather Reports")
Compare the words in the reports with labels
"Thunderstorm Wind". Create word clouds of the reports for each of these labels. Specify the word colors to be blue and magenta for each word cloud respectively.
figure labels = T.event_type; subplot(1,2,1) idx = labels == "Hail"; wordcloud(textData(idx),'Color','blue'); title("Hail") subplot(1,2,2) idx = labels == "Thunderstorm Wind"; wordcloud(textData(idx),'Color','magenta'); title("Thunderstorm Wind")
Compare the words in the reports from the states Florida, Kansas, and Alaska. Create word clouds of the reports for each of these states in rectangles and draw a border around each word cloud.
figure state = T.state; subplot(1,3,1) idx = state == "FLORIDA"; wordcloud(textData(idx),'Shape','rectangle','Box','on'); title("Florida") subplot(1,3,2) idx = state == "KANSAS"; wordcloud(textData(idx),'Shape','rectangle','Box','on'); title("Kansas") subplot(1,3,3) idx = state == "ALASKA"; wordcloud(textData(idx),'Shape','rectangle','Box','on'); title("Alaska")
Compare the words in the reports with property damage reported in thousands of dollars to the reports with damage reported in millions of dollars. Create word clouds of the reports for each of these amounts with highlight color blue and red respectively.
cost = T.damage_property; idx = endsWith(cost,"K"); figure wordcloud(textData(idx),'HighlightColor','blue'); title("Damage Reported in Thousands")
idx = endsWith(cost,"M"); figure wordcloud(textData(idx),'HighlightColor','red'); title("Damage Reported in Millions")