Path: news.mathworks.com!not-for-mail
From: "Thomas " <thomas.seers@postgrad.manchester.ac.uk>
Newsgroups: comp.soft-sys.matlab
Subject: textscan uses VAST amounts of memory with some larger text files
Date: Mon, 27 Jan 2014 20:27:07 +0000 (UTC)
Organization: Univ of Manchester
Lines: 39
Message-ID: <lc6fer$3r9$1@newscl01ah.mathworks.com>
Reply-To: "Thomas " <thomas.seers@postgrad.manchester.ac.uk>
NNTP-Posting-Host: rubyext-06-ls.mathworks.com
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: newscl01ah.mathworks.com 1390854427 3945 172.20.102.182 (27 Jan 2014 20:27:07 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Mon, 27 Jan 2014 20:27:07 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 3752776
Xref: news.mathworks.com comp.soft-sys.matlab:808220

Hi
I am currently using textscan to import non-rectangular text files into Matlab.  The data has the basic format (I have displayed with lower precision to aid readability:

NVM_V3  %header

3        % number of cameras followed by camera list (filename/extrinsics/intrinsics)
DSC05814.JPG 8774.7363 0.982 -0.099 -0.0984 -0.128 -0.174 0.008 -0.361 -0.312 0 
DSC05826.JPG 8719.6439 0.970 -0.039 -0.162 -0.170 -0.811 -0.668 -0.872 -0.289 0 
DSC05825.JPG 8718.2906 0.977 -0.059 -0.108 -0.176 -0.956 -0.083 -0.976 -0.286 0

10 % number of points followed by a list of row vectors (x/y/z/R/G/B/views/measurements)
3.706 0.009 5.521 147 116 87 2 1 4695 -10.072 829.875 2 4129 138.551 20.650
4.118 0.115 5.901 98 71 54 1 1 5698 1308.469 704.791 
3.680 -0.351 5.285 171 137 102 2 1 6613 -586.595 -81.978 3 4142 -489.869 -1032.766 
3.479 0.0586 5.469 49 30 21 1 2 6752 -574.997 1148.147 26 
3.417 -0.086 5.224 105 68 38 2 2 7826 -1111.384 885.410 3 4167 -979.546 -2.273 
3.964 0.059 5.749 120 88 65 2 1 7107 815.728 710.646 3 4171 959.160 -61.294 
4.032 0.139 5.837 51 33 22 2 2 5371 1090.961 839.350 3 4174 1242.225 89.978 
3.732 -0.132 5.410 195 165 141 2 1 5167 -153.592 457.226 3 4175 -16.533 -431.148 
3.68557024172 -0.126260038974 5.39401729277 109 76 51 1 1 5307 -282.079 513.668 
3.683 -0.094 5.410 90 58 42 2 2 5537 -247.375 598.106 3 4183 -106.569 -271.090

where the first list are cameras with intrinsic / extrinsic parameters and the second list is point xyz-rgb followed by a list of measurements . This second list can be have different numbers of measurements  between different points (i.e. it is non rectangular) and is several orders of magnitude longer than the camera list.
I want to read this entire file into Matlab with each row put into a separate cell as character array (I leave the camera info as string data but convert the point list into numeric data to do other operations on it. I can get this result using the following code:

% open model target nvm file
[filename, pathname] = uigetfile('*.nvm', 'Multiselect', 'off');
fullpath = strcat(pathname,filename);    
fid = fopen(fullpath,'r');
    c = textscan(fid,'%s','delimiter', '','whitespace','', 'HeaderLines', 1,'BufSize', 6500);  
    fclose(fid);
    
%Extract data in cells
C = c{1};
However, whilst this works for text files that are a few mb to a few 10s of mb in size, most of my data is 500mb+. Using the above code for files of this size results in memory being eaten up at an alarming rate: I tried it with a 500mb file on a 64gb workstation today and the entire physical memory was consumed in a couple of minutes!!! 
I'm not sure what the best approach is here? Would it be best to just bring in the camera data as strings and then import the larger point list as numeric data? I'm not really sure how this could best be achieved given that importdata() expects  rectangular data as input.
Any advice / solutions would be greatly appreciated
Thanks
Thomas