Insufficient storage (activex) - unable to complete operation - pulling in excel data more quickly.

1 view (last 30 days)
Hi All,
I have some reasonably large excel files to process (200-800mb). After a bit of dabbling and on review of the numerous threads here on the forum I have elected to use activex controls etc.
For speed of execution I am using xlsread (as opposed to csvread / importdata). Textscan is less than ideal due the variation between sheets (i.e. requires a format specifier).
To make things easier I have been reading from the excel sheets in bite sized chunks to 1) help speed up the read and 2) to avoid (other) memory errors.
Presently I am facing a challenges with the read call, that is; eRange = get(e.Activesheet,'Range',sheetR).
Currently the script loops through a 'block' of 20 columns or so reading in all rows. On the first call
that is:
eRange = get(e.Activesheet,'Range',sheetR).
raw = eRange.Value;
the program executes as expected. However looping through on the succeeding call and loading the next block I receive the error
"Error: Not enough storage is available to complete this operation."
Below is a brief outline of the script and corresponding output.
-------------- Script ------------------
% Uses activeX controls.
% pathAndFilename = 'C:\Users\owner\Desktop\Convert GS logs\temp.xlsx'
%%Open excel application
disp('Opening Excel application...')
e=actxserver('excel.application');
wb=e.Workbooks;
w=Open(wb, pathAndFilename');
%%Find range / break range of sheet into managable chunks
% Define excel column headers (NOT r1c1 reference style..)
Alphabet=('A':'Z').';
[I,J] = meshgrid(1:26,1:26);
XX=[Alphabet(I(:)), Alphabet(J(:))];
XX=cellstr(strvcat(Alphabet,XX));
clear I J Alphabet
% Range
disp('Extracting range from excel sheet...')
RangeInString=e.ActiveSheet.UsedRange.Address;
range = strsplit(RangeInString,'$');
nrow = str2double(range(end));
ncol = find(strcmp(range(end-1),XX));
% Readuce the scale of the succeeding reads (Time / memory overheads).
if ncol > 20
splt = ceil(ncol/lump);
bite = floor(ncol / splt);
columns = [1:bite:ncol];
surp_col = ncol - columns(end);
columns(end) = columns(end) + surp_col;
end
clear splt surp_col bite range
%%Do some work
% loop through the sheet and extract the respective chunks
for i=1:length(columns)-1
disp(['processing columns ',num2str(columns(i)), ' through ' num2str(columns(i+1)),' of ',num2str(ncol), ' columns...'])
tic
% set the range for the current read / chunk
sheetR = [XX{columns(i),:},'1:',XX{columns(i+1),:},num2str(nrow)];
%
% eRange = get(e.Activesheet,'Range','A1:B2');
eRange = get(e.Activesheet,'Range',sheetR);
raw = eRange.Value;
clear raw sheetR
end
What I wish to achieve is to read the full xlsx file and place the data in a .mat. To reduce the overhead, on each loop the data 'raw' is written to a .mat using the -7.3 format. All non essential variable are removed using a 'clearvars -except' call.
The scrip runs fine of the first 'chunk', however I receive an error on the following iteration. What is odd to me is that each 'chunk' pulled in on the read call is precisely the same size (number of rows / columns).
The output of the script is as follows:
Loading log file: 20160719_0945.xlsx ...
Opening Excel application...
Extracting range from excel sheet...
processing columns 1 through 20 of 152 columns...
processing columns 20 through 39 of 152 columns...
Error: Not enough storage is available to complete this operation.
The script fails on the 'raw = eRange.Value' call.
A 'whos' on the preceding line (prior to the 'raw = eRange.Value') suggests little memory overhead.
Name Size Bytes Class Attributes
MatObj 1x1 112 matlab.io.MatFile
MetaData 1x1 3069 struct
RangeInString 1x15 30 char
UAVnav_Log_ID 1x1 138 cell
XX 702x1 81380 cell
columns 1x8 64 double
e 1x1 COM.excel_application
eRange 1x1 Interface.00020846_0000_0000_C000_000000000046
filename 1x18 36 char
i 1x1 8 double
lump 1x1 8 double
matf 1x13 26 char
ncol 1x1 8 double
nrow 1x1 8 double
pathAndFilename 1x71 142 char
w 1x1 Interface.000208DA_0000_0000_C000_000000000046
wb 1x1 Interface.000208DB_0000_0000_C000_000000000046
I am presently running the script on a 64bit machine with 8Gb of RAM. However, I would very much like to deploy this / similar scripts in the future on a smaller machine for those times when we are 'working in the field'. Any pointers would be greatly appreciated. A simple solution is to significantly reduce the number of columns read during any one call - however, surely there is a better way?
The example file I am running has 530,000 rows (530,000 rows x 158 columns) and generates the cell array;
raw 528439x20 1268253582 cell
  2 Comments
per isakson
per isakson on 6 Aug 2016
Edited: per isakson on 10 Aug 2016
  • "I am using xlsread" &nbsp However, the code doesn't show that you use xlsread. Why is that?
  • How should I interpret &nbsp "raw 528439x20 1268253582 cell" ?
  • What does &nbsp whos raw &nbsp show?
Jamie
Jamie on 17 Aug 2016
Hi Per, apologies for the tardy response. I was expecting an email if there was a post. Quite right! I had replaced the xlsread with the call 'eRange = get(e.Activesheet,'Range',sheetR)' in the hope that it would solve the problem. Both xlsread and activex calls resulted in similar memory errors. 'Raw' was the output values of the 'get' call above.
If memory serves the output of 'whos raw' was the above "raw 528439x20 1268253582 cell" which details the size and data format (cell). I did not find a solution / better approach. I left the above snippet should it prove useful to others..

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!