An alternative to cell array
Show older comments
I know that variables should not be named dynamically, so I have developped another solution for the following task:
I wrote a function (function M = structure(A,B,C,D)) which computes an array M of 6 x 8 x p integers, from arrays A, B, C, D of 6 x 8 x p1 integers (resp 6 x 8 x p2,...) .
I need to call this function a lot of time (about a thousand) with different variables A, B, C , D (previously computed the same way), which are easy to index but may be very large (p1, p2 can be as large as 10^7).
I tried to use cell arrays and it worked for some small instances, but I am afraid storing an important number of growing arrays in the same cell will make the program unefficient for larger instances. Is there a more promising way to handle this problem? At first, I ran the program manually for each value of (A,B,C,D) but there are hundreds of such quadruplets and it is quite boring and inefficient.
4 Comments
@Daniel Gourion: how much memory do you have?
Have you calculated if all of those arrays would even fit into memory at once?
Using indexing is definitely the correct approach, but you might need e.g. to use tall arrays:
Bora Eryilmaz
on 8 Dec 2022
The size of your data will be the bottleneck regardless of whether you use a cell array to store them or another data type. It you don't need the whole cell array at once (say you only only need one M at a time), you can save your individual M variables into MAT files and fetch them only when needed. This would be more memory efficient, but it will require disk I/O, which would be a bit slower.
Daniel Gourion
on 8 Dec 2022
Edited: Daniel Gourion
on 8 Dec 2022
Stephen23
on 8 Dec 2022
"I guess that the advatage is that the names of the files would be indexed but not the names of the variables, is it correct?"
Yes. You should name the files sequentially (or after test cases, or whatever make sense for your data), but keep the variable names exactly the same in each file. Note that to make your code robust, you should LOAD into an output variable (which is a scalar structure) and access its fields:
S = load(..)
Answers (1)
You say that you need to call this function many times, but do you need the M output from each of those calls to exist in memory simultaneously? If so and p is as large as you say it is, I'm not sure you'll be able to find a machine with that much memory. Let's look at how large one of your M arrays is, assuming it's stored as an 8-bit integer (1 byte per element) and that p is on the order of 1e7.
bytes = 6*8*1e7;
gb = bytes/(1024^3)
How much space do you need for all of them?
mem = 1000*gb
Roughly speaking half a terabyte in half a gigabyte contiguous chunks. That's just considering the M arrays and assuming they're 8-bit integers; add in A, B, C, and D, any temporary arrays you need to create inside your function, or make M a double array (8 bytes per element) and your task doesn't seem feasible on one machine.
In that case you're probably going to need to make use of the Big Data functionality in MATLAB and/or the parallel computing capabilities of Parallel Computing Toolbox.
One assumption I've made in this post is that p is on the order of 1e7. You said that p1 and p2 were, but not p. If p is much smaller and all the M arrays are the same size, you may be able to use a 4-dimensional array.
M = zeros(2, 3, 4, 5);
A = reshape(1:24, [2 3 4]);
for k = 1:5
M(:, :, :, k) = A*2^k;
end
Now let's spot check: we should expect M(:, :, :, 3) to be 2^3 = 8 times A. Is it?
M(:, :, :, 3)./A
1 Comment
Daniel Gourion
on 8 Dec 2022
Edited: Daniel Gourion
on 8 Dec 2022
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!