Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

How do I count and save twitter hashtags?

Asked by Abim on 14 Dec 2012

I am writing a script that analyzes the hashtags from tweets that I saved in a text file. So far I managed to count the amount of hashtags in the file:

    fid = fopen('Tweets.txt');
numberOfTweets = 0;
while i ~= -1
    i = fgetl(fid);
    numberOfTweets = numberOfTweets + 1;
end
numberOfTweets = numberOfTweets - 1;
frewind(fid)
for i = 1:numberOfTweets
   twitterStuff{i} = fgetl(fid);
end
numberOfHash = 0;
for i = 1:numberOfTweets
    if(strfind(twitterStuff{i}, '#')  ~=0);
        c = strfind(twitterStuff{i}, '#');
         [rowHash columnHash] = size(c);
        numberOfHash = numberOfHash + columnHash;
    end
end

Now, I want to find out what the specific hashtags are and save them into a cell array, but I don't really know how to do that.

2 Comments

Walter Roberson on 14 Dec 2012

Is # by itself a hashtag? Is #this#that with no spaces two hashtags? Is #35 a valid hashtag? Is #? a valid hashtag?

Abim on 14 Dec 2012

When I said counting the amount of hashtags, I just counted the amount of # .But when I say, save the hashtags, I want to save the words contained within the hashtags. technically, #this#that would be two hashtags, but for now I would just want to focus on the basic #this hashtag.

Abim

Products

No products are associated with this question.

3 Answers

Answer by Jonathan Epperl on 14 Dec 2012
Edited by Jonathan Epperl on 14 Dec 2012
Accepted answer

You should use regular expressions for that, you can do pretty much anything with them. This should do what you want to, and if not, then it should point you in the right direction:

s = '#Matlab#2012b rocks my #sox # off!'
% Match a '#' with zero or more characters that aren't whitespace or '#' after it
T = regexp(s,'(#[^ #]*)','tokens')
T{:}
% Match a '#' with 1 or more characters that aren't whitespace or '#' after it
T = regexp(s,'(#[^ #]+)','tokens')
T{:}
% Match a '#' with 1 or more characters that aren't whitespace or '#' after 
% it, but don't capture the '#'
T = regexp(s,'#([^ #]+)','tokens')
T{:}

0 Comments

Jonathan Epperl
Answer by Sean de Wolski on 14 Dec 2012
Edited by Sean de Wolski on 14 Dec 2012

Using regular expressions:

str = '#MATLAB is an awesome product by #MathWorks';
[matchstart,matchend,~,hashtag] = regexp(str,'(\#(\w*))')

0 Comments

Sean de Wolski
Answer by Abim on 14 Dec 2012

Thanks

0 Comments

Abim

Contact us