How to find elements in a 2D cell array in Matlab more efficiently?

25 views (last 30 days)
I was wondering if there is an easy solution to the question asked here: StackOverflow
  1 Comment
Stephen23
Stephen23 on 27 Feb 2015
Edited: Stephen23 on 27 Feb 2015
The full question is:
How to find elements in a 2D cell array in Matlab more efficiently?
I have a 2D cell array as follows:
my_cells =
Columns 1 through 11
{1x6 cell} {1x8 cell} {1x2 cell} {1x7 cell} {1x7 cell} {1x6 cell} {1x7 cell} {1x7 cell} {1x8 cell} {1x5 cell} {1x7 cell}
Columns 12 through 22
{1x4 cell} {1x3 cell} {1x3 cell} {1x5 cell} {1x5 cell} {1x4 cell} {1x3 cell} {1x5 cell} {1x4 cell} {1x5 cell} {1x4 cell}
Columns 23 through 24
{1x6 cell} {1x1 cell}
Each one of these cells has a number of arrays as follows:
my_cells{1}= [1x3 double] [1x3 double] [1x3 double] [1x3 double] [2x3 double] [1x3 double]
and
my_cells{1}{1}= 977.0000 1.0000 0.9231
my_cells{1}{2}= 286.0000 7.0000 0.9789
my_cells{2}{1}= 977.0000 1.0000 0.9231
my_cells{3}{1}= 286.0000 7.0000 0.9789
I would like to find, for example, where else the number 977 might appear as a first element in my_cells. However, I would like to avoid nested for loops to increase the performance if it is possible. Is there an easy and fast way to do it?

Sign in to comment.

Answers (2)

Stephen23
Stephen23 on 27 Feb 2015
Edited: Stephen23 on 12 Mar 2015
There is indeed a very easy and very fast solution: better data management and planning.
Just because something is possible in MATLAB (or any language) does not make it a good idea to actually do it. Nested variables make sense in a number of languages, but in MATLAB it is not an optimal way to store data. So really the original question can be re-phrased as:
"If I am unable to manage my data well and if I organize data inefficiently, will my life become difficult?" To which the answer is "Yes!". Note this applies to any language just as much as it does to MATLAB.
As one of the responses to the original question points out, this data actually has the structure:
my_cells = { {[977 1 2]} {[2 977 977]} {[977 2 1]} }
Why keep the data separate? These numeric arrays match quite nicely (both size and class), so why not a cell array of numeric arrays, or even the most obvious and simplest solution, a single numeric array? It could even be preallocated , which would likely speed things up even more, and padded if required. It is absurd to structure data in such a way that removes the ability to use MATLAB's best features: MATLAB's indexing is fast and convenient, and the fact that many operations can be applied to complete arrays makes computations on arrays a breeze. Storing numeric arrays and some corresponding indices is often preferred to splitting data up into individual cell arrays.
Presumably this data was collected together for more than just one single operation ("Find the value 977"), so whatever other operations that follow on from this are always going to be fighting the same data structure. Lets consider a basic unary operation: how would one find max of this data? And yet it would be such a simple operation if it were a simple numeric array!
I guess it was written like this because whoever wrote this is a programmer, and fails to realize that MATLAB (being a higher-level language) works under a different paradigm. All of the features of MATLAB that make it fast and convenient to use, such as logical indexing and vectorization , become useless with this data organization. So the programmers return to their dependable old friend, the loop. And then complain that it is slow and awkward to write.
It seems to be a common refrain from programmers who at some point are forced to learn MATLAB:
"I refuse to learn how to use MATLAB's tools properly and wish to use my favorite programming method from language XXXX... oh, why is it so slow?"
How should one answer this question?
  3 Comments
Stephen23
Stephen23 on 27 Feb 2015
Edited: Stephen23 on 12 Mar 2015
What you are describing is a poor way of storing data, and using MATLAB will always be a challenge if you continue to arrange data like this.
Here are two much simpler ways that you could store this data:
  • in a few separate numeric arrays, one for each kind of data, including any meta-data to keep track of which values belong together (groups, etc). Then you can use MATLAB's indexing tools to quickly locate the values that you want to.
  • in a non-scalar structure , which would allow this to be achieved much faster and neater than nested cell arrays. For every set of values corresponding to experiment n, you can define it like this:
data(n).time = 977
data(n).group = 1
data(n).value = [numeric value/s]
data(n).comment = 'great experiment!'
There is no restriction on sizes or data types, and we can easily access them individually, in groups or all together. And accessing the data is just the same, for example to get the second measurement value you can do this:
data(2).value
Even better we can access all of the data without any loops at all, for example to get all time values in a numeric array, you would use this:
[data.time]
or into a cell array like this
{data.time}
and then you can immediately search, match or do whatever on that array. So if we want to match a group, then this would work to find all experiments in group 3:
[data.group]==3
And we can also use this knowledge to "solve" your original question!:
[data.time]==977
This operation gives us a logical array of all experiments at that time. Remember that logical indexing is the fastest way of indexing into arrays in MATLAB. And without a single loop! I guarantee that it will be many many many times faster than trying to access the data in nested loops. If you assign this logical index to a variable, then you can also apply this directly to the structure, and you will get only the experiments that you are interested in:
idx = [data.time]==977;
[data(idx).group]
Alex
Alex on 13 Mar 2015
Ok, I might change my whole implementation if i grasp the main idea here, if this will make things easier but most importantly faster. Let's forget the problem above, and introduce the general problem I am tackling as well as my solution. I have already started making the changes so here it is. Imagine a struct like this:
timestep_struct(t).profile=profile;
timestep_struct(t).coordinates=points;
where it is populated inside a for loop which is fast enough. profile for each iteration t, is an array like: [1x3 double] [1x3 double] [4x3 double] [1x3 double] [1x3 double] [1x3 double]. "points" is a 2 columned matrix with N rows. What i want to do, is to iterate through timestep_struct(t) selecting one row of coordinates at a time. This will be used in let's say black box operation. Say t can take values from 1 to 120, and each coordinates matrix may have from 1 to 100 rows (each matrix can have different number of rows). One of this timestep_struct(i).coordinates(j,:) will give me the maximum utility after the processing. So, i keep this. Then, i will run the same operation again to find the next best configuration. The way I am doing all this now, is by initializing a cell array having the dimensions of timestep_struct with zero at each position. For each zero position I find the corresponding value from the struct and perform the black box operation. When i find the coordinates that give me maximum util, I set that position in the cell array to 1. So,zero positions in the cell array have been reduced by one and the position marked as 1 will be no longer be used. After that, I need to search the profile struct (timestep_struct(t).profile=profile;), search within every array of that, to find an id (remember we have arrays of Mx3 where the first column is id and the second is a number to be reduced) in other timesteps other than t, and reduce their number by 1. My implementation sounds complicated but what I really do is quite simple. Do you still think using structs is a good approach for that? Please ask me for clarifications if necessary.

Sign in to comment.


Guillaume
Guillaume on 27 Feb 2015
I agree with Stephen that a better data storage would help. Rather than a subcell array of vectors, I'd have a matrix. Aren't all your vectors the same length.
At the end of the day, with that data structure, you're going to need loops. You can hide them in cellfun, but it still a loop:
c = {{[977 1 .9] [286 7 .9] [977 1 .8] [286 7 .6]} {[300 5 .6] [900 2 .1]} {[800 8 .5] [200 5 .3] [977 6 .2]}};
numbertofind = 977;
locations = cellfun(@(subc) find(cellfun(@(v) v(1), subc) == numbertofind), c, 'UniformOutput', false)
If you were using matrices:
c = {[977 1 .9; 286 7 .9; 977 1 .8; 286 7 .6] [300 5 .6; 900 2 .1] [800 8 .5; 200 5 .3; 977 6 .2]};
numbertofind = 977;
locations =cellfun(@(m) find(m(:, 1) == numbertofind), c, 'UniformOutput', false)
  2 Comments
Guillaume
Guillaume on 27 Feb 2015
Edited: Guillaume on 2 Mar 2015
Well, in that case, use my first answer
locations = cellfun(@(subc) find(cellfun(@(v) v(1), subc) == numbertofind), c, 'UniformOutput', false)

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!