How do I concatenate a variable number of cell arrays

I want to combine a variable number of cell arrays into one large array. I was thinking of using this:
combined_data = {};
for k = 1 : length(raw_data)
combined_data = [combined_data ; raw_data{k}.vdata(3:raw_data{k}.empties,:)];
end
I thought I read somewhere that creating an array by concatenating itself is bad form. I never see examples like this.
Is the above good or bad form?
If bad, is there an alternative?

8 Comments

Approximately what value is length(raw_data) ?
How often do you plan on calling this code?
I am not certain what you want.
One option is the cat function.
well , it's not a crime by itself but purists would tell you it's better to preallocate memory ,so create combined_data array first with the correct dimensions and then paste the new data (RHS) to combined_data (with correct start / end indexes)
there could still be one case where this code is making sense : when the new data have unpredictable length, so in that case it might be difficult to preallocate the right amount or rows / columns (unless you overestimate it)
my 2 cents
@Stephen23 - 'raw_data' is a cell array with 5 structs in it. Over time, the number of structs in 'raw_data' will grow. Each struct has two cell arrays - 'vdata' and 'empties'. 'vdata' is is the data to combine. 'empties' is the number of non-empty rows, for later use, in 'vdata' (which was preallocated much larger than needed). I probably will not be calling this script much, but I hate hard coded then revise when another struct is added.
@Star Strider - What I want is combined_data = [ cellarray1 ; cellarray2 ; ... ; cellarrayn ]. An array of all of the data from the variable number of cell arrays, all of which have a different numbers of populated rows.
@Mathieu NOE - This also works. It seems less elegant and more prone to issues though.
Let me know any thoughts or recommendations.
%add all of the data rows in the vdata cell arrays to be combined and pre-allocate a receiving cell array
combined_length = 0;
for k = 1 : length(raw_data)
combined_length = combined_length + [raw_data{k}.empties] - 2;
end
combined_data = cell(combined_length,61);
%loop through each vdata cell array and put in the corresponding
%combined_data cell array
cd = 1;
for k = 1 : length(raw_data)
for r = 3:raw_data{k}.empties
combined_data(cd,:) = raw_data{k}.vdata(r,:);
cd = cd + 1;
end
end
"'raw_data' is a cell array with 5 structs in it. "
Rather than a non-scalar cell array containing lots of scalar structures, you should probably just use one non-scalar structure array. That would simplify the task somewhat.
After that, you can probably achieve most of what you want with judicious use of a few comma-separated lists:
Why does this loop start at 3?:
for r = 3:raw_data{k}.empties
I start at 3 because these arrays have two rows of text that describe each column (header rows for lack of a better term). People view these arrays when output to excel (and as I write, this I realize I could put those in the excel dumps only).
Regarding the use of a cell array of structs of cell arrays, I am not that well versed in matlab data containers and that resulted from this code:
%for each file in the folder, load the data and find the count of rows
for k = 1 : length(Files_struct)
raw_data{k} = load([Files_struct(k).name]);
temp_empties= find(cellfun( @isempty,raw_data{k}.vdata(:,1) ));
raw_data{k}.empties = temp_empties(1)-1;
end
Each .mat file in the directory has the 'vdata' cell array.
I am open for simplifying, but not sure how.
I presume that FILES_STRUCT is the structure returned by DIR, in which case it is already a container array of the exactly the right size, so you can just use that (rather than creating even more container arrays):
for k = 1:numel(Files_struct)
S = load(Files_struct(k).name); % Got rid of the superfluous square brackets too.
X = find(cellfun(@isempty,S.vdata(:,1)));
Files_struct(k).rawdata = S;
Files_struct(k).empties = X(1)-1;
end
You can access the elements of the structure using perfectly normal indexing, e.g. the 2nd file:
Files_struct(2).name % filename
Files_struct(2).rawdata % loaded data
Files_struct(2).empties % etc.
I would recommend not working with nested structures, so after the loop you might want to do this:
data = [Files_struct.rawdata] % assumes every MAT file has the same variable names.
to make it easier to access the loaded data.
I revised my script that processes the field data to produce a table instead of an array, and incporporated your suggestions with a couple of changes: " ... = S.vdatat" instead of "... = S". This would allow the access methods you suggest. The second taking advantage of table variable names - 'Site' instead of {:,1}.
Thanks for the help.
for k = 1:numel(Files_struct)
S = load(Files_struct(k).name);
X = find(cellfun( @isempty,S.vdatat.site )); %isempty requires a cell array. using a datetime will fail as it is naT
Files_struct(k).vdata = S.vdatat; % ** ** **
Files_struct(k).empties = X(1)-1;
end

Sign in to comment.

Answers (0)

Categories

Products

Release

R2021b

Tags

Asked:

on 18 Nov 2022

Commented:

on 21 Nov 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!