## How to calculate the contingency table of a large dataset without memory issues?

### pietro (view profile)

on 13 Nov 2017
Latest activity Commented on by pietro

on 13 Nov 2017

### the cyclist (view profile)

Hi all,
I am calculating a contingency table (using crosstab function) from a large array and the result will be a matrix of a size of [175473x175473]. I am running the script with a laptop with 32GB and unfortunately the RAM is not enough. I have noticed that Matlab by default, allocates memory for double variables which requice 4 times more memory than double. The contingency table should be an integer variable and not double. Is there any way to force crosstab to compute the result on integer data so that I can save memory for running the calculation?

### the cyclist (view profile)

on 13 Nov 2017

I don't think you can do that with the built-in crosstab function.
Depending on the number of data elements (specifically how sparse the resulting cross table is), you might just be able to "manually" fill in the crosstable, as either a single-precision array, or as a sparse matrix.
All three techniques illustrated below:
x = single([1 1 2 3 1]);
y = single([1 2 5 3 1]);
[unique_x,ix,jx] = unique(x);
[unique_y,iy,jy] = unique(y);
% Memory-intensive double-precision cross table
table = crosstab(x,y)
% "Manually" create single-precision cross table
table_single = single(zeros(numel(unique_x),numel(unique_y)));
for ii = 1:numel(x)
table_single(jx(ii),jy(ii)) = table_single(jx(ii),jy(ii)) + 1;
end
% "Manually" create sparse cross table
table_sparse = sparse(numel(unique_x),numel(unique_y));
for ii = 1:numel(x)
table_sparse(jx(ii),jy(ii)) = table_sparse(jx(ii),jy(ii)) + 1;
end

pietro

### pietro (view profile)

on 13 Nov 2017
Thanks a lot! Your suggestion of using the for in combination with unique is superbly clever.
Thanks a lot