advise on accessing cell array containing structures

I have a cell array "wrkspcs" containing names of cell arrays as shown below (some entries omitted for brevity)
wrkspcs =
13×1 cell array
{'ZLH_151210_WrkSpc'}
{'MXG_151210_WrkSpc'}
{'LF_151223_WrkSpc' }
each entry refers to a 6x6 cell array in the workspace. Using eval, I can see the referenced cell array
>> eval(wrkspcs{1})
ZLH_151210_WrkSpc =
6×6 cell array
{1×1 struct} {[1]} {'16557'} {'1210'} {'zlh_1a'} {'ZLH_151210'}
{1×1 struct} {[2]} {'16557'} {'1213'} {'zlh_2a'} {'ZLH_151210'}
{1×1 struct} {[3]} {'16557'} {'1216'} {'zlh_3a'} {'ZLH_151210'}
{1×1 struct} {[1]} {'16676'} {'1210'} {'zlh_1b'} {'ZLH_151210'}
{1×1 struct} {[2]} {'16676'} {'1213'} {'zlh_2b'} {'ZLH_151210'}
{1×1 struct} {[3]} {'16676'} {'1216'} {'zlh_3b'} {'ZLH_151210'}
If I want to get the number of entries in the first workspace "ZLH_151210_WrkSpc", I can do
>> tmp=eval(wrkspcs{1});n = length(tmp(:,1))
n =
6
but if I try to eliminate the creation of the temporary variable "tmp" and access the length directly, I get the following error:
>> n = length(eval(wrkspcs{1})(:,1))
Error: ()-indexing must appear last in an index expression.
However, if I try the following everything is fine.
>> eval(['n = length(',wrkspcs{iloop},'(:,1))'])
n =
6
So, I am trying to understand which syntax rule I am violating in the second case and what is the "proper" way of obtaining the length without either creating the variable "tmp" or including the assignment in the 'eval' statement (which the Matlab documentation states I should try to avoid).
Any comments, insights, or suggested alternatives would be appreciated.

5 Comments

Rather than using this slow, complex, and buggy way of storing your data, it would be simpler to store all of the data in one structure. Then accessing the data is simple using its fieldnames: no ugly eval is required.
Note that the MATLAB documentation specifically advises against what you are doing: "A frequent use of the eval function is to create sets of variables such as A1, A2, ..., An, but this approach does not use the array processing power of MATLAB and is not recommended. The preferred method is to store related data in a single array". Note that either creating or accessing variable names dynamically suffers from the same disadvantages described in the documentation.
The important question is: how did you get all of those variables into your workspace? Usually beginners do this by calling load without an output argument, and spamming lots of variables into the workspace, which are then very difficult to process (and so they resort to writing ugly, slow, complex, buggy code using eval, and getting the variable names using slow who or whos). There is most likely an easy way around what you are doing, e.g. by simply calling load with an output argument:
S = load(...);
You might also like to read this discussion of the topic:
OK, I'm taking your suggestions to heart...eliminate the "eval"s and "poofing" variables into my workspace in my code. But I have a question:
Each wkrspc is saved to its own *.mat file with a single variable (a 6x6 cell array) with the structure shown above. If I want to replace the code:
load([wrkspcs{iwrk},'.mat']) %creates variables in workspace matching the file name
(which "poofs" the variable 'ZLH_151212_WrkSpc' into the workspace) with the preferred "load into structure" replacement statement
>> tmp=load([wrkspcs{1},'.mat'])
tmp =
struct with fields:
ZLH_151212_WrkSpc: {6×6 cell}
it works fine. I can then copy the cell array from the 'tmp' struct to myStruct
>> myStruct.(wrkspcs{1})=tmp.(wrkspcs{1})
myStruct =
struct with fields:
ZLH_151210_WrkSpc: {6×6 cell}
But how can I do the load directly into 'myStruct' struct? When I try:
>> myStruct.(wrkspcs{1})=load([wrkspcs{1},'.mat'])
myStruct =
struct with fields:
ZLH_151210_WrkSpc: [1×1 struct]
>> myStruct.ZLH_151210_WrkSpc
ans =
struct with fields:
ZLH_151210_WrkSpc: {6×6 cell}
>> myStruct.ZLH_151210_WrkSpc.ZLH_151210_WrkSpc
ans =
6×6 cell array
{1×1 struct} {[1]} {'16557'} {'1210'} {'zlh_1a'} {'ZLH_151210'}
{1×1 struct} {[1]} {'16676'} {'1210'} {'zlh_1b'} {'ZLH_151210'}
{1×1 struct} {[2]} {'16557'} {'1213'} {'zlh_2a'} {'ZLH_151210'}
{1×1 struct} {[2]} {'16676'} {'1213'} {'zlh_2b'} {'ZLH_151210'}
{1×1 struct} {[3]} {'16557'} {'1216'} {'zlh_3a'} {'ZLH_151210'}
{1×1 struct} {[3]} {'16676'} {'1216'} {'zlh_3b'} {'ZLH_151210'}
and my 6x6 cell array gets buried 2 levels deep in 'myStruct'. Now I have no idea how to directly get the 6x6 cell array to be "loaded" directly as the first level field in 'myStruct' -- same as the result I get when going through a 'tmp' struct as shown above. Specifying the variable name in the 'load' statements above makes no difference.
Again, I need some help understanding this behavior and any workaround to avoid going thru loading to a temporary structure and copying the desired field to my structure variable.
In first case,
myStruct.(wrkspcs{1})=tmp.(wrkspcs{1})
the right side tmp.(wrkspcs{1}) is not a struct, it returns a cell array. Therefore the above command means that you are creating a new field in myStruct and assigning it a cell array.
Now look at the second case,
myStruct.(wrkspcs{1})=load([wrkspcs{1},'.mat'])
the right side load([wrkspcs{1},'.mat']) itself return a struct. So in this case, you are creating a new field in myStruct and assigning it a value of another struct. Therefore you get two level deep struct. This is equivalent to running
myStruct.(wrkspcs{1})=tmp
both will have same effect.
Now coming to your second question. Can this be avoided? No. The load function will always return a struct and you cannot access its field before its creation. You will need to create a temporary variable between two steps to do what you are trying to do.
"But how can I do the load directly into 'myStruct' struct?"
You can't. Not in the way that you are trying to do it, without a temporary variable. load returns a scalar structure and you will have to allocate that to a temporary variable and then access its fields, as Ameer Hamza already explained. This is quite efficient and does not waste memory, so there is no reason to avoid it.
"...any workaround to avoid going thru loading to a temporary structure..."
as Ameer Hamza wrote, MATLAB does not allow arbitrary indexing/fieldname access to be suffixed onto function calls, so it is quite normal in MATLAB to allocate data to a temporary variable before doing some simple indexing, or accessing fields. This is standard MATLAB practice, wastes no memory whatsoever, and you have not explained why you need to avoid it.
Alternative 1: using a non-scalar structure has advantages also, when you try to process/access the data. You might like to consider doing something like this:
tmp = load([wrkspcs{k},'.mat']);
myStruct(k).data = tmp.(wrkspcs{1});
myStruct(k).name = wrkspcs{k};
The trick is to think of meta-data as data in their own right. Storing data in this way will make your code much simpler, more robust, and more generalized, which means that you can spend more time on actually processing your data rather than worrying about fieldnames and variable names and mat files and ...
Alternative 2: if each .mat file contains exactly one field/variable, then there is no real advantage to using a structure anyway, and you could easily use a cell array for all of your data. If the fields are the same size then it could even be a 3D array and then there would be no nesting of cell arrays:
out = cell(6,6,numel(wrkspcs));
for k = 1:numel(wrkspcs)
tmp = load([wrkspcs{k},'.mat']);
out(:,:,k) = tmp.([wrkspcs{k})
end
Alternative 3: Note that most of the complication here come from bad data design anyway: contrary to what some beginners think, it is much easier to process data when the variable names do not change (yes, even the ones inside .mat files). If each .mat file simply had the exactly same variables, e.g. data and name, then you really could import the files in exactly the way that you requested, without any temporary variable:
for k = numel(wrkspcs{k}):-1:1
S(k) = load([wrkspcs{k},'.mat']);
end
and you would get one non-scalar structure containing all of your data, without any nesting:
S(1).data
S(1).name
or all of the names in a cell array:
{S.name}
etc
thanks for your clear answer. (I reposted this as a new "Ask" since it really is a different question than original post...but you answered it)

Sign in to comment.

 Accepted Answer

Unlike C++ or python, in MATLAB you can't directly index the output of a function. You firstly need to store the data in a separate variable and then index the variable that variable to access the required data. So what is happening here:
1) If first case: eval() is a MATLAB function and you are trying to further index its output. Which as already stated is not supported in MATLAB.
2) In the second case: you are effectively running the following command
n = length(ZLH_151210_WrkSpc(:,1))
i.e. indexing into a cell array. This a perfectly supported MATLAB syntax. Even in the first case, the following line will work
n = length(eval([wrkspcs{1}, '(:,1)']))
as you can see that again I am trying to index in the cell array, not the output of a function.
Note: Accessing variables using eval is a very bad idea. It makes your code slow and difficult to debug. For better coding practice, you should look into storing all the variables in a struct and then access the required data using field names.

2 Comments

Ameer's answer was to the point and confirmed my testing results.
While I truly appreciate everyone's comments about avoiding "eval" and using structured input from "load" -- sometimes you need to deal with external data files, naming conventions, and existing code base. I need to deal with many data sets collected with a certain organization and file names and write general code that works with, and tracks, an arbitrary number of files and names. I wish I was clever enough to do this without invoking "eval". Even if I was, there is existing code that I must insert my functions and results into without major rewrites...so it is a balance (manage the dangers and inefficiency of "eval" against ease of integration with existing data and code).
Thanks for your comments
"While I truly appreciate everyone's comments about avoiding "eval" and using structured input from "load" -- sometimes you need to deal with external data files, naming conventions, and existing code base."
The names of external files are irrelevant to this issue. The only topic that might be relevant is the "existing code base".
"...so it is a balance (manage the dangers and inefficiency of "eval" against ease of integration with existing data and code)."
You missed one of the other main points about eval: code that has to dynamically access variable names is code that wastes the programmers time: it makes code complex and hard to debug. Your question and the days that you have spent fighting the task of simply importing data is an example of this.
Using lots of different names in the .mat files is really the design decision that has made this so complicated for you: if the .mat files used exactly the same field/variable names (e.g. data and name) then your code would be trivially simple (and yes, you could load them without any intermediate variable):
for k = ...
S(k) = load(...);
end
and that would be all! Better code through better data design: never underestimate the importance of designing your data well!

Sign in to comment.

More Answers (0)

Categories

Asked:

on 4 May 2018

Edited:

on 8 May 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!