Internal problem while evaluating tall expression (requested 40.5 GB array)

2 views (last 30 days)
Hi, I'm working with a large data set with approximately 500k rows and 6k columns. I'm using a datastore and tall array to handle the loading. The file itself is comma separated file while with most of its values coded with integers or strings. I have a dictionary for decoding these values. What I am trying to do is to replace codes with the actual meaning and save the decoded file to local.
Below I copied a structure of my program
classdef myTable < handle
% ...
methods
function this = myTable
end
% ...
end
methods
function loadCsv(this)
% ...
ds = datastore(this.csvSource);
ds.SelectedFormats = repmat({'%q'}, 1, length(ds.VariableNames));
this.csvTable = tall(ds);
end
% ...
function decoding(this)
% ...
end
function export(this)
% ...
write([this.outputDir '/' this.csvTableName '_decoded_*.csv'], this.csvTable, 'WriteFcn', @myWriter);
end
end
end
%% helper
function myWriter(info, data)
filename = info.SuggestedFilename;
writetable(data, filename, 'FileType', 'text', 'Delimiter', ',')
end
Error occured at this.export:
Error using digraph/distances
Internal problem while evaluating tall expression. The problem was:
Requested 73733x73733 (40.5GB) array exceeds maximum array size preference. Creation of arrays greater than this limit
may take a long time and cause MATLAB to become unresponsive.
Question: I was thinking that the write function should be partitioning the data while exporting. Isn't that true? Why did MATLAB still try to create such a big array?
I am using a windows machine with 16GB RAM. MATLAB R2020a (tried on 19a first and just upgraded to 20a).
Thank you!
  16 Comments
Peng Li
Peng Li on 26 Mar 2020
a complete error message. For some reason, it changes from time to time. It is now requesting over 500Gb array...
Error using digraph/distances (line 72)
Internal problem while evaluating tall expression. The problem was:
Requested 269757x269757 (542.2GB) array exceeds maximum array size preference. Creation of arrays greater than this
limit may take a long time and cause MATLAB to become unresponsive.
Error in matlab.bigdata.internal.lazyeval.LazyPartitionedArray>iGenerateMetadata (line 814)
allDistances = distances(cg.Graph);
Error in matlab.bigdata.internal.lazyeval.LazyPartitionedArray>iGenerateMetadataFillingPartitionedArrays (line 795)
[metadatas, partitionedArrays] = iGenerateMetadata(inputArrays, executorToConsider);
Error in ...
Error in tall/write (line 248)
iDoWrite(location, ta, writeFunction);
Error in myTable/export (line 94)
write([this.outputDir '/' this.csvTableName '_decoded_*.csv'], this.csvTable, 'WriteFcn', @myWriter);
Error in myTable/update (line 33)
this.export;
Error in myTest (line 19)
tab.update;
Caused by:
Error using matlab.internal.graph.MLDigraph/bfsAllShortestPaths
Requested 269757x269757 (542.2GB) array exceeds maximum array size preference. Creation of arrays greater than
this limit may take a long time and cause MATLAB to become unresponsive.
Peng Li
Peng Li on 27 Mar 2020
An update:
Is that possible that for a tall array consists of various strings as elements, the majority of them are quite long, MATLAB couldn't handle this using the default partition method when writing to disk? This error happened every time at LazyPartitionArray which called a distances function. This function creates a distance matrix, which is always bigger than 10k*10k or even 100k*100k size for my case.

Sign in to comment.

Answers (0)

Categories

Find more on Matrices and Arrays in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!