Clear Filters
Clear Filters

Big data percentile calculation

1 view (last 30 days)
David Santos
David Santos on 8 Aug 2019
Commented: David Santos on 8 Aug 2019
Hi,
I have a large set (30.000) of mat files each one of them containing a 4x1 cell array of 1483x2824 double, 4 matrix for each file ~= 30-40 MB
These are timeseries files representing simulations over 3 months.
I want to calculate the percentile of all this time series files but is too much memory for my computer because I need to load all the files, any clue on how to solve this? I'm working on a server with 20cores/40 threads and 256GB of memory.
I heard about this algorithm (P-square) but I couldn't find something similar inside matlab.
All the best

Answers (1)

Steven Lord
Steven Lord on 8 Aug 2019
See some of the tools and techniques available in MATLAB for working with Big Data, data that's too big to fit in memory. Many functions are supported on tall arrays.
  2 Comments
David Santos
David Santos on 8 Aug 2019
Edited: David Santos on 8 Aug 2019
Thanks!
What would you recommend if I want to convert my 4xcell array files in just one?
David Santos
David Santos on 8 Aug 2019
Ok, I'm trying using a fileDatastore and tall arrays:
-After all definitions I have the tall array t:
function data=loadPrc(filename)
data=load(filename);
ind=strfind(filename,'/');
data=data.(strcat('l',filename(ind(end)+1:end-4-7)));
data=data{1};
end
ds=fileDatastore('matBorrame','ReadFcn',@loadPrc,'FileExtensions','.mat')
t=tall(ds)
t =
4×1 tall cell array
{1483×2824 double}
{1483×2824 double}
{1483×2824 double}
{1483×2824 double}
My problem is that now the prctile calculation gives a format error:
gather(prctile(t,90,3))
Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 1: 0% complete
Evaluation 0% complete
Error using tall/prctile (line 48)
Argument 1 to PRCTILE must be one of the following data types: numeric.
Learn more about errors encountered during GATHER.
That's because t should be in the format (1483x2824x4) but I can't reshape or permute a tall array, any clue on how to solve this¿?
All the best

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!