How do I create a term - frequency matrix that runs fast
Show older comments
Hello everyone! I am trying to create a term frequency matrix for a TF-IDF program. I have the code written but it runs extremely slow. My code works by finding the unique words in all of the documents, say for example
A = {'dog','cat','mouse'}
and for example two documents,
D = {'dog','cat','cat'; 'cat','mouse','mouse'}.
The code I have is:
for k = 1:n
for j = 1:m
seq_sum(k,j) = sum(ismember(D{k,:},A{j}));
end
end
The output of the above example would be a matrix that looks like
seq_sum = [1 2 0 ; 0 1 2];
where k is the row size of the cell array D and j is the column size of A. I also have this written in parallel but I don't want to have to rely on parallel computing. Any help would be greatly appreciated! Oh and I guess my question is how can I improve this to run faster?
2 Comments
Muthu Annamalai
on 5 Sep 2013
In MATLAB 13b, the new datatype 'categorical' is designed to solve this problem.
Ryan
on 5 Sep 2013
Accepted Answer
More Answers (0)
Categories
Find more on Resizing and Reshaping Matrices in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!