Finding cell array row indices based on numeric column values

10 views (last 30 days)
I have a large cell array keystrokes of approximate size 20000x4. Columns 1 and 3 each contain a char, while columns 2 and 4 each contain a double. For example:
>> keystrokes(378:380,:)
ans =
3×4 cell array
{'l' } {[ 180]} {'e' } {[ 69]}
{'e' } {[300664]} {'|space|'} {[ 125]}
{'|space|'} {[ 62]} {'n' } {[2500]}
I want to find the row indices in keystrokes of occurrences of every unique combination of columns 1 and 3, where the value in column 2 is less than 100000 and the value in column 4 is less than 2000. My current code gives me the error "Undefined operator '<' for input arguments of type 'cell'.", and is shown below.
% Temporarily convert keystroke structure to a table due to unique() apparently not supporting combinations of cellarray columns.
uniqueDigraphsTable = unique(cell2table(keystrokes(:,[1 3])), 'rows');
uniqueDigraphs = table2cell(uniqueDigraphsTable);
for ii = 1:length(uniqueDigraphs)
% Find rows containing the current unique digraph
occurrenceIndices = find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & strcmp(keystrokes(:,3),
uniqueDigraphs{ii,2}) & keystrokes(:,2)<100000 & keystrokes(:,4)<2000);
...
end
Using keystrokes{:,4}<2000 gives me this error: "Error using <. Too many input arguments." Is there a simple (and perhaps prettier) way to find the indices?
  1 Comment
Jan
Jan on 9 Jan 2018
Prefer to post the input data such, that they can be used by copy&paste. Is keystrokes a nested cell:
kestrokes = { ...
{'l' } {[ 180]} {'e' } {[ 69]}; ...
{'e' } {[300664]} {'|space|'} {[ 125]}; ...
{'|space|'} {[ 62]} {'n' } {[2500]}}
or a cell:
kestrokes = { ...
'l', 180, 'e', 69; ...
'e', 300664, '|space|', 125; ...
'|space|', 62, 'n' 2500}
? Even typing this question need a lot of typing.

Sign in to comment.

Answers (2)

Guillaume
Guillaume on 9 Jan 2018
find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & ... split over several lines for readability
strcmp(keystrokes(:,3), uniqueDigraphs{ii,2}) & ...
[keystrokes{:,2}] < 100000 & ...
[keystrokes{:,4}] < 2000)
or
find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & ... split over several lines for readability
strcmp(keystrokes(:,3), uniqueDigraphs{ii,2}) & ...
cell2mat(keystrokes(:,2)) < 100000 & ...
cell2mat(keystrokes(:,4)) < 2000)
In essence you have to transform your cell columns into numeric matrices.
  1 Comment
Piddy
Piddy on 10 Jan 2018
Thanks a lot! Your cell2mat solution gives the results I'm looking for. The first solution seems to have sort of looping problem though. It produces a very large vector where the first elements are the correct indices, but following those are indices that exceed the length of the keystrokes array.
For example, when keystrokes is a 24894x4 cell, part of its output for a specific row in uniqueDigraph looks like this:
K>> length(occurrenceIndices)
ans =
158473
K>> occurrenceIndices(1:15)
ans =
591
677
1090
2247
2578
2912
3227
25485
25571
25984
27141
27472
27806
28121
50379
The first 7 values are correct, but the rest are too large. 24894 + 591 = 25485 though, and 24894 + 677 = 25571 etc.

Sign in to comment.


Jan
Jan on 9 Jan 2018
Edited: Jan on 9 Jan 2018
The cell is not useful for these comparisons. Converting is to a table is the next indirection. Easier:
% Store strings in one cell string:
Strings = keystrokes(:, [1, 3]);
uStrings = unique(Strings, 'rows');
% Store numbers in a numerical array:
Values = cell2mat(keystrokes(:, [2, 4]));
% Move the check of the values out of the loop for performance:
match = (Values(:, 1) < 100000 & Values(:, 2) < 2000);
for ii = 1:length(uStrings)
occurrenceIndices = find(strcmp(Strings(:,1), uStrings{ii, 1}) & ...
strcmp(Strings(:,2), uStrings{ii, 2}) & ...
match);
...
end
This would be faster, if you use the 2nd and 3rd output of unique() also:
[uStrings, iString, iUniq] = unique(Strings, 'rows');
match = (Values(:, 1) < 100000 & Values(:, 2) < 2000);
for ii = 1:length(uStrings)
occurrenceIndices = find(iUniq == ii & match);
...
end
  2 Comments
Piddy
Piddy on 10 Jan 2018
Thank you! There is still an issue though. The following line produces this warning: "The 'rows' input is not supported for cell array inputs."
[uStrings, iString, iUniq] = unique(Strings, 'rows');
Does this tie into your comment asking whether or not keystrokes is a nested cell? I didn't produce the keystrokes variable myself, but I'm fairly sure that it is not nested. I checked using class():
class(keystrokes{1,1})
ans = 'char'
I also think that if it were nested, the example command I showed in my original question would have produced an output like this:
>> keystrokes(378:380,:)
ans =
3×4 cell array
{1×1 cell} {1×1 cell} {1×1 cell} {1×1 cell}
{1×1 cell} {1×1 cell} {1×1 cell} {1×1 cell}
{1×1 cell} {1×1 cell} {1×1 cell} {1×1 cell}
I could of course be mistaken.
Guillaume
Guillaume on 10 Jan 2018
Annoyingly, unique (and ismember) do not support the 'row' option with cell arrays even if it is a cell array of char arrays. If you have matlab R2016b or later, you can convert the cell array of char arrays into a string array which can be used with unique and the 'row' option:
unique(string(keystrokes(:, [1 3])), 'rows')

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!