Conduct Sentiment Analysis Using Historical Tweets

This example shows how to search and retrieve all available Tweets in the last 7 days and import them into MATLAB®. After importing the data, you can conduct sentiment analysis. This analysis enables you to determine subjective information, such as moods, opinions, or emotional reactions, from text data. This example searches for positive and negative moods regarding the financial services industry.

To run this example, you need Twitter® credentials. To obtain these credentials, you must first log in to your Twitter account. Then, fill out the form in Create an application.

To access the example code, enter edit TwitterExample.m at the command line.

Connect to Twitter

Create a Twitter connection using your credentials. (The values in this example do not represent real Twitter credentials.)

consumerkey = 'abcdefghijklmnop123456789';
consumersecret = 'qrstuvwxyz123456789';
accesstoken = '123456789abcdefghijklmnop';
accesstokensecret = '123456789qrstuvwxyz';

c = twitter(consumerkey,consumersecret,accesstoken,accesstokensecret);

Check the Twitter connection. If the StatusCode property has the value OK, the connection is successful.

c.StatusCode
ans = 

    OK

Retrieve Latest Tweets

Search for the latest 100 Tweets about the financial services industry using the Twitter connection object. Use the search term financial services. Import Tweet® data into the MATLAB workspace.

tweetquery = 'financial services';
s = search(c,tweetquery,'count',100);
statuses = s.Body.Data.statuses;
pause(2)

statuses contains the Tweet data as a cell array of 100 structures. Each structure contains a field for the Tweet text, and the remaining fields contain other information about the Tweet.

Search and retrieve the next 100 Tweets that have occurred since the previous request.

sRefresh = search(c,tweetquery,'count',100, ...
    'since_id',s.Body.Data.search_metadata.max_id_str);
statuses = [statuses;sRefresh.Body.Data.statuses];

statuses contains the latest 100 Tweets in addition to the previous 100 Tweets.

Retrieve All Available Tweets

Retrieve all available Tweets about the financial services industry using a while loop. Check for available data using the isfield function and the structure field next_results.

while isfield(s.Body.Data.search_metadata,'next_results')
    % Convert results to string
    nextresults = string(s.Body.Data.search_metadata.next_results); 
    % Extract maximum Tweet identifier  
    max_id = extractBetween(nextresults,"max_id=","&");             
    % Convert maximum Tweet identifier to a character vector
    cmax_id = char(max_id);         
    % Search for Tweets                                
    s = search(c,tweetquery,'count',100,'max_id',cmax_id);        
    % Retrieve Tweet text for each Tweet
    statuses = [statuses;s.Body.Data.statuses];                     
end

Retrieve the creation time and text of each Tweet. Retrieve the creation time for unstructured data by accessing it in a cell array of structures. For structured data, access the creation time by transposing the field in the structure array.

if iscell(statuses)
  % Unstructured data
    numTweets = length(statuses);             % Determine total number of Tweets
    tweetTimes = cell(numTweets,1);           % Allocate space for Tweet times and Tweet text
    tweetTexts = tweetTimes; 
    for i = 1:numTweets
      tweetTimes{i} = statuses{i}.created_at; % Retrieve the time each Tweet was created
      tweetTexts{i} = statuses{i}.text;       % Retrieve the text of each Tweet
    end
else
    % Structured data
    tweetTimes = {statuses.created_at}'; 
    tweetTexts = {statuses.text}'; 
end

tweetTimes contains the creation time for each Tweet. tweetTexts contains the text for each Tweet.

Create the timetable tweets for all Tweets by using the text and creation time of each Tweet.

tweets = timetable(tweetTexts,'RowTimes', ...
    datetime(tweetTimes,'Format','eee MMM dd HH:mm:ss +SSSS yyyy'));

Conduct Sentiment Analysis on Tweets

Create a glossary of words that are associated with positive sentiment.

poskeywords = {'happy','great','good', ...
    'fast','optimized','nice','interesting','amazing','top','award', ...
    'winner','wins','cool','thanks','useful'};

poskeywords is a cell array of character vectors. Each character vector is a word that represents an instance of positive sentiment.

Search each Tweet for words in the positive sentiment glossary. Determine the total number of Tweets that contain a positive sentiment. Out of the total number of positive Tweets, determine the total number of Retweets.

% Determine the total number of Tweets
numTweets = height(tweets);            

% Determine the positive Tweets
numPosTweets = 0;
numPosRTs = 0;
for i = 1:numTweets
    % Compare Tweet to positive sentiment glossary
    dJobs = contains(tweets.tweetTexts{i},poskeywords,'IgnoreCase',true); 
    if dJobs
        % Increase total count of Tweets with positive sentiment by one
        numPosTweets = numPosTweets + 1; 
        % Determine if positive Tweet is a Retweet
        RTs = strncmp('RT @',tweets.tweetTexts{i},4);
        if RTs
            % Increase total count of positive Retweets by one
            numPosRTs = numPosRTs + 1;
        end
    end
end

numPosTweets contains the total number of Tweets with positive sentiment.

numPosRTs contains the total number of Retweets with positive sentiment.

Create a glossary of words that are associated with negative sentiment.

negkeywords = {'sad','poor','bad','slow','weaken','mean','boring', ...
    'ordinary','bottom','loss','loser','loses','uncool', ...
    'criticism','useless'};

negkeywords is a cell array of character vectors. Each character vector is a word that represents an instance of negative sentiment.

Search each Tweet for words in the negative sentiment glossary. Determine the total number of Tweets that contain a negative sentiment. Out of the total number of negative Tweets, determine the total number of Retweets.

% Determine the negative Tweets
numNegTweets = 0;
numNegRTs = 0;
for i = 1:numTweets
    % Compare Tweet to negative sentiment glossary
    dJobs = contains(tweets.tweetTexts{i},negkeywords,'IgnoreCase',true); 
    if dJobs
        % Increase total count of Tweets with negative sentiment by one
        numNegTweets = numNegTweets + 1; 
        % Determine if negative Tweet is a Retweet
        RTs = strncmp('RT @',tweets.tweetTexts{i},4);
        if RTs
            numNegRTs = numNegRTs + 1;
        end
    end
end

numNegTweets contains the total number of Tweets with negative sentiment.

numNegRTs contains the total number of Retweets with negative sentiment.

Display Sentiment Analysis Results

Create a table with columns that contain:

  • Number of Tweets

  • Number of Tweets with positive sentiment

  • Number of positive Retweets

  • Number of Tweets with negative sentiment

  • Number of negative Retweets

matlabTweetTable = table(numTweets,numPosTweets,numPosRTs,numNegTweets,numNegRTs, ...
    'VariableNames',{'Number_of_Tweets','Positive_Tweets','Positive_Retweets', ...
    'Negative_Tweets','Negative_Retweets'});

Display the table of Tweet data.

matlabTweetTable
matlabTweetTable =

  1×5 table

    Number_of_Tweets    Positive_Tweets    Positive_Retweets    Negative_Tweets    Negative_Retweets
    ________________    _______________    _________________    _______________    _________________

    11465               688                238                  201                96               

Out of 11,465 total Tweets about the financial services industry in the last 7 days, 688 Tweets have positive sentiment and 201 Tweets have negative sentiment. Out of the positive Tweets, 238 Tweets are Retweets. Out of the negative Tweets, 96 are Retweets.

See Also

Functions

Objects

Related Topics

External Websites