Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

To resolve issues starting MATLAB on Mac OS X 10.10 (Yosemite) visit: http://www.mathworks.com/matlabcentral/answers/159016

How increase bufsize for importdata

Asked by David on 12 Mar 2011

I am using the importdata function to import data from tab-separated and comma-separated text files. This works great for files up to at least 10Mb, but fails on files with an identical format in the 70Mb range with the following error.

Caused by: Error using ==> textscan Buffer overflow (bufsize = 1000005) while reading string from file (row 1, field 1). Use 'bufsize' option. See HELP TEXTSCAN.

Is there an easy way to increase bufsize directly in importdata call, without mucking around in the textscan function? I understand that as an alternative I can rewrite my code using textscan directly, but my current M-code file is working with importdata for smaller imports and I am looking for the simplest solution to allow import of larger data sets.

0 Comments

David

Products

No products are associated with this question.

2 Answers

Answer by Oleg Komarov on 12 Mar 2011
Accepted answer

You can try to edit line 319 of importdata:

bufsize = min(1000000, max(numel(fileString),100)) + 5;

Set the minimum threshold 1000000 to something higher.

EDIT 14 March 02:14 GMT 00

You had to specify that "NA" should be treated as empty:

fid = fopen('C:\Users\Oleg\Desktop\ancestry-probs-par2.tsv');
% Column headers
colHead = fgetl(fid); 
colHead = textscan(colHead,'%s');
colHead = colHead{1};  
% get # data columns
numH = length(colHead);
% make fmt
fmt = ['%s', repmat('%f',1,numH)];
  • Import the file in bulk (if enough memory)
% Import file
data = textscan(fid,fmt,'HeaderLines',1,'TreatAsEmpty','NA');
fid = fclose(fid);
  • Import line by line (26 seconds on my pc, preallocation doesn't give the boost since just 191 lines...)
% Import file
data = cell(0,2);
while ~feof(fid)
data = [data; textscan(fid,fmt,1,'HeaderLines',1,'TreatAsEmpty','NA','CollectOutput',1)];
end
fid = fclose(fid);
rowHead = cat(1,data{:,1});
data = cat(1,data{:,2});

Oleg

9 Comments

David on 14 Mar 2011

Perfect. This works with a slight modification.
data = cat(2,data{:,2:end});
instead of
data = cat(1,data{:,2});

Oleg Komarov on 14 Mar 2011

I forgot to put "'CollectOutput',1" in the bulk import with textscan.

Michael S on 15 Jun 2011

Thanks Oleg this was very helpful. To others if you have a CSV format do not forget that white space is the default delimiter so you need to add 'Delimiter',',' to the textscan arguments. i.e. textscan(fid,fmt,'HeaderLines',1,'Delimiter',',','CollectOutput',1)

Oleg Komarov
Answer by Walter Roberson on 12 Mar 2011

It looks to me as if it is thinking that the first line is more than 1000000 characters.

How long is the first line?

1 Comment

David on 13 Mar 2011

head -n 1 test_file.tsv | wc -m
1612061

So, I tried bufsize = 1612061 + 100
Same error.

Walter Roberson

Contact us