How do I create a term - frequency matrix that runs fast

Hello everyone! I am trying to create a term frequency matrix for a TF-IDF program. I have the code written but it runs extremely slow. My code works by finding the unique words in all of the documents, say for example
A = {'dog','cat','mouse'}
and for example two documents,
D = {'dog','cat','cat'; 'cat','mouse','mouse'}.
The code I have is:
for k = 1:n
for j = 1:m
seq_sum(k,j) = sum(ismember(D{k,:},A{j}));
end
end
The output of the above example would be a matrix that looks like
seq_sum = [1 2 0 ; 0 1 2];
where k is the row size of the cell array D and j is the column size of A. I also have this written in parallel but I don't want to have to rely on parallel computing. Any help would be greatly appreciated! Oh and I guess my question is how can I improve this to run faster?

2 Comments

In MATLAB 13b, the new datatype 'categorical' is designed to solve this problem.
Thank you Muthu I am however working with 12b :(. Im sure I can get a copy of 13b though. Thank you for your comment.

Sign in to comment.

 Accepted Answer

A = {'dog','cat','mouse'};
D = {'dog','cat','cat'; 'cat','mouse','mouse'};
out=zeros(size(D));
for k=1:numel(A)
idx=ismember(D,A(k));
out(:,k)=sum(idx,2);
end
disp(out)
%or
out=cell2mat(arrayfun(@(x) sum(ismember(D,A(x)),2),1:numel(A),'un',0))

3 Comments

Thank you very much for your help. I also found that using strcmp creates the same matrix that I need and is also faster than ismember!
The arrayfun also works much faster than the for loop. Just in case anyone sees this and is interested. Thanks again!
Azzi I have one more question. I forgot in my code that D may not have the same column size so it could look like:
D = {1x3 ; 1x5}
Is there still a way to use the arrayfun as a double for loop? Or possible make it so D has the same column dimension so I can apply what you suggested?

Sign in to comment.

More Answers (0)

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!