comparing subsets of bagofwords and converting bagofwords to cell array

1 view (last 30 days)
i have one big bagOfWords array, and two bagOfWords arrays. vocabularies of two arrays are elements of the big array's vocabulary (meaning they are subsets of the big).
now i want to create an array that is, its first column must be big array's vocabulary, second column must be one of the two array's counts value for that row's vocabulary word, third column must be the other array's counts value for that row's vocabulary word. and if an array doesn't have a counts value for that row's vocabulary word then i want to make that index value 1.
how can i do this? i didn't find any function that converts bagofwords to cell array. and i'm not sure how to compare those arrays.
the arrays are in the attachment

Answers (1)

Yash Sharma
Yash Sharma on 22 Sep 2023
Edited: Yash Sharma on 22 Sep 2023
I understand that you have three Bag of Words arrays: spam, nonspam, and allwords. The spam and nonspam arrays are subsets of the allwords array. You will need to extract individual words and there counts from the arrays and then merge them according to your logic.
Here is an example code for the same.
allwords = load('allwords.mat'); % Replace with the path to the big bagOfWords file
spam = load('spam.mat'); % Replace with the path to the first bagOfWords file
nonspam = load('nonspam.mat'); % Replace with the path to the second bagOfWords file
% Convert the bagOfWords objects to cell arrays
bigVocab = allWords.Vocabulary;
bigCounts = allWords.Counts;
allwordsarray = [bigVocab', full(bigCounts)'];
vocab1 = spam.ans.Vocabulary;
counts1 = spam.ans.Counts;
array1 = [vocab1', full(counts1)'];
vocab2 = nonspam.ans.Vocabulary;
counts2 = nonspam.ans.Counts;
array2 = [vocab2', full(counts2)'];
% Create the desired array
desiredArray = cell(length(allwordsarray), 3);
for i = 1:length(allwordsarray)
word = allwordsarray{i, 1};
count0 = allwordsarray(i, 2);
count1 = 1;%Default value if the word is not found in array 1
count2 = 1;%Default value if the word is not found in array 2
% Check if the word exists in array 1 and get its count value
index1 = find(strcmp(array1(:, 1), word));
if ~isempty(index1)
count1 = array1{index1, 2};
end
% Check if the word exists in array 2 and get its count value
index2 = find(strcmp(array2(:, 1), word));
if ~isempty(index2)
count2 = array2{index2, 2};
end
% Assign the values to the desired array
desiredArray{i, 1} = word;
desiredArray{i, 2} = count0;
desiredArray{i, 3} = count1;
desiredArray{i, 4} = count2;
end
% Display the desired array
disp(desiredArray);
Please find links to below documentation which I believe will help you for further reference.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!