- Have your loadPrc return a 4 × 1483 × 2824 numeric matrix (rather than a cell array)
- Your corresponding tall array t will then be 25000 × 1483 × 2824
- Instead of the for loop, simply call prctile in dimension 1

# big data 2d matrix percentile calculation using tall

7 views (last 30 days)

Show older comments

I'm trying to calculate a percentile of a lot of files (25000 or even more) containing 4x1 cell, representing 4 maps or 1483x2824 matrixes.

I'm using tall arrays following indications of Percentiles of Tall Matrix Along Different Dimensions:

tic

%start local pool for mutithreading

c=parcluster('local');

c.NumWorkers=20;

parpool(c, c.NumWorkers);

folder='/home/temporal2/dsantos/mat/*.mat'; %more than 25000 files

A=ones(1483,2824,2);%aux matrix for stablish prdtile data type

y=tall(A);

%database of files cointaining 4x1cell of 1483*2824 maps

ds=fileDatastore(folder,'ReadFcn',@loadPrc,'FileExtensions','.mat','UniformRead', true)

t=tall(ds);

%fill the aux tall array with each map in the correct format

for i=1:25000

y(:,:,i)=t(1+(i-1)*1483:1483*i,:);

end

%calculate the percentile

p90_1=prctile(y,90,3)

P90_1=gather(p90_1);

save('/home/temporal2/dsantos/p90_1.mat','P90_1','-v7.3');

toc

But it seems that tall arrays won't work for this because I get the error:

Warning: Error encountered during preview of tall array 'p90_1'. At

tempting to

gather 'p90_1' will probably result in an error. The error encountered was:

Requested 500025x500025 (1862.8GB) array exceeds maximum array size preference.

Creation of arrays greater than this limit may take a long time and cause

MATLAB to become unresponsive. See <a href="matlab: helpview([docroot

'/matlab/helptargets.map'], 'matlab_env_workspace_prefs')">array size limit</a>

or preference panel for more information.

> In tall/display (line 21)

p90_1 =

MxNx... tall array

? ? ? ...

? ? ? ...

? ? ? ...

: : :

: : :

>> Error using digraph/distances (line 72)

Internal problem while evaluating tall expression. The problem was:

Requested 500028x500028 (1862.9GB) array exceeds maximum array size preference.

Creation of arrays greater than this limit may take a long time and cause

MATLAB to become unresponsive. See <a href="matlab: helpview([docroot

'/matlab/helptargets.map'], 'matlab_env_workspace_prefs')">array size limit</a>

or preference panel for more information.

Error in

matlab.bigdata.internal.lazyeval.LazyPartitionedArray>iGenerateMetadata (line

756)

allDistances = distances(cg.Graph);

Error in

matlab.bigdata.internal.lazyeval.LazyPartitionedArray>iGenerateMetadataFillingPart

itionedArrays

(line 739)

[metadatas, partitionedArrays] = iGenerateMetadata(inputArrays,

executorToConsider);

Error in ...

Error in tall/gather (line 50)

[varargout{:}] = iGather(varargin{:});

Caused by:

Error using matlab.internal.graph.MLDigraph/bfsAllShortestPaths

Requested 500028x500028 (1862.9GB) array exceeds maximum array size

preference. Creation of arrays greater than this limit may take a long time

and cause MATLAB to become unresponsive. See <a href="matlab:

helpview([docroot '/matlab/helptargets.map'],

'matlab_env_workspace_prefs')">array size limit</a> or preference panel for

more information.

Any clue on how to solve this problem?

All the best

##### 0 Comments

### Answers (2)

Edric Ellis
on 13 Aug 2019

That particular error is an internal error basically because your tall array expression is simply too large - contains too many expressions. tall arrays operate by building up a symbolic representation of all the expressions you've evaluated, and then running them all together when you call gather. Because you've got a for loop over 25000 elements, this symbolic representation is large - too large to be evaluated. tall arrays are basically not designed to be looped over in this way. Instead, you need to express your program in terms of a smaller number of vectorised operations.

I would proceed in the following manner (I can't be more specific since your problem statement isn't executable - see this page on tips regarding making a minimal reproduction):

ds = fileDatastore();

t = tall(ds);

p90_1=prctile(t,90,1);

P90_1=gather(p90_1);

% and then perhaps

P90_1 = shiftdim(P90_1, 1)

##### 0 Comments

### See Also

### Categories

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!