MATLAB Answers

pietro
0

How to calculate the contingency table of a large dataset without memory issues?

Asked by pietro
on 13 Nov 2017
Latest activity Commented on by pietro
on 13 Nov 2017
Hi all,
I am calculating a contingency table (using crosstab function) from a large array and the result will be a matrix of a size of [175473x175473]. I am running the script with a laptop with 32GB and unfortunately the RAM is not enough. I have noticed that Matlab by default, allocates memory for double variables which requice 4 times more memory than double. The contingency table should be an integer variable and not double. Is there any way to force crosstab to compute the result on integer data so that I can save memory for running the calculation?

  0 Comments

Sign in to comment.

Tags

1 Answer

Answer by the cyclist
on 13 Nov 2017
 Accepted Answer

I don't think you can do that with the built-in crosstab function.
Depending on the number of data elements (specifically how sparse the resulting cross table is), you might just be able to "manually" fill in the crosstable, as either a single-precision array, or as a sparse matrix.
All three techniques illustrated below:
x = single([1 1 2 3 1]);
y = single([1 2 5 3 1]);
[unique_x,ix,jx] = unique(x);
[unique_y,iy,jy] = unique(y);
% Memory-intensive double-precision cross table
table = crosstab(x,y)
% "Manually" create single-precision cross table
table_single = single(zeros(numel(unique_x),numel(unique_y)));
for ii = 1:numel(x)
table_single(jx(ii),jy(ii)) = table_single(jx(ii),jy(ii)) + 1;
end
% "Manually" create sparse cross table
table_sparse = sparse(numel(unique_x),numel(unique_y));
for ii = 1:numel(x)
table_sparse(jx(ii),jy(ii)) = table_sparse(jx(ii),jy(ii)) + 1;
end

  1 Comment

Thanks a lot! Your suggestion of using the for in combination with unique is superbly clever.
Thanks a lot

Sign in to comment.