Finding cell array row indices based on numeric column values
10 views (last 30 days)
Show older comments
I have a large cell array keystrokes of approximate size 20000x4. Columns 1 and 3 each contain a char, while columns 2 and 4 each contain a double. For example:
>> keystrokes(378:380,:)
ans =
3×4 cell array
{'l' } {[ 180]} {'e' } {[ 69]}
{'e' } {[300664]} {'|space|'} {[ 125]}
{'|space|'} {[ 62]} {'n' } {[2500]}
I want to find the row indices in keystrokes of occurrences of every unique combination of columns 1 and 3, where the value in column 2 is less than 100000 and the value in column 4 is less than 2000. My current code gives me the error "Undefined operator '<' for input arguments of type 'cell'.", and is shown below.
% Temporarily convert keystroke structure to a table due to unique() apparently not supporting combinations of cellarray columns.
uniqueDigraphsTable = unique(cell2table(keystrokes(:,[1 3])), 'rows');
uniqueDigraphs = table2cell(uniqueDigraphsTable);
for ii = 1:length(uniqueDigraphs)
% Find rows containing the current unique digraph
occurrenceIndices = find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & strcmp(keystrokes(:,3),
uniqueDigraphs{ii,2}) & keystrokes(:,2)<100000 & keystrokes(:,4)<2000);
...
end
Using keystrokes{:,4}<2000 gives me this error: "Error using <. Too many input arguments." Is there a simple (and perhaps prettier) way to find the indices?
1 Comment
Jan
on 9 Jan 2018
Prefer to post the input data such, that they can be used by copy&paste. Is keystrokes a nested cell:
kestrokes = { ...
{'l' } {[ 180]} {'e' } {[ 69]}; ...
{'e' } {[300664]} {'|space|'} {[ 125]}; ...
{'|space|'} {[ 62]} {'n' } {[2500]}}
or a cell:
kestrokes = { ...
'l', 180, 'e', 69; ...
'e', 300664, '|space|', 125; ...
'|space|', 62, 'n' 2500}
? Even typing this question need a lot of typing.
Answers (2)
Guillaume
on 9 Jan 2018
find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & ... split over several lines for readability
strcmp(keystrokes(:,3), uniqueDigraphs{ii,2}) & ...
[keystrokes{:,2}] < 100000 & ...
[keystrokes{:,4}] < 2000)
or
find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & ... split over several lines for readability
strcmp(keystrokes(:,3), uniqueDigraphs{ii,2}) & ...
cell2mat(keystrokes(:,2)) < 100000 & ...
cell2mat(keystrokes(:,4)) < 2000)
In essence you have to transform your cell columns into numeric matrices.
Jan
on 9 Jan 2018
Edited: Jan
on 9 Jan 2018
The cell is not useful for these comparisons. Converting is to a table is the next indirection. Easier:
% Store strings in one cell string:
Strings = keystrokes(:, [1, 3]);
uStrings = unique(Strings, 'rows');
% Store numbers in a numerical array:
Values = cell2mat(keystrokes(:, [2, 4]));
% Move the check of the values out of the loop for performance:
match = (Values(:, 1) < 100000 & Values(:, 2) < 2000);
for ii = 1:length(uStrings)
occurrenceIndices = find(strcmp(Strings(:,1), uStrings{ii, 1}) & ...
strcmp(Strings(:,2), uStrings{ii, 2}) & ...
match);
...
end
This would be faster, if you use the 2nd and 3rd output of unique() also:
[uStrings, iString, iUniq] = unique(Strings, 'rows');
match = (Values(:, 1) < 100000 & Values(:, 2) < 2000);
for ii = 1:length(uStrings)
occurrenceIndices = find(iUniq == ii & match);
...
end
2 Comments
Guillaume
on 10 Jan 2018
Annoyingly, unique (and ismember) do not support the 'row' option with cell arrays even if it is a cell array of char arrays. If you have matlab R2016b or later, you can convert the cell array of char arrays into a string array which can be used with unique and the 'row' option:
unique(string(keystrokes(:, [1 3])), 'rows')
See Also
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!