I have a tall array with about 1.7 billion rows of data and 14 columns. I want to be able to process this data in the same way that several examples (with airline data) do it. I am just trying to extract one column and find the mean. My code is something like:
ds = datastore('some-file.csv');
tt = tall(ds);
a = tt.V;
m = mean(a);
gather_m = gather(m);
The gather step is taking way too much time. I haven't seen it complete at all. In the examples I have seen, this step is shown to be completed in a few seconds. Eventually, I want to be able to make calculations and plots, but I want to start by making this simple step work first. Can anyone recognize the problem and recommend a solution? I have parallel pool turned on and there are two workers.
Thank you very much.