Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

parfor (file reading)

Asked by AP on 10 Nov 2011

Hi all,

I am trying to use parfor in order to speed up the reading of 1000 ascii files. Each file is in the following format:

  • 10 lines describing the data and is the header of the file.
  • the rest of the lines are in the format '%f %f %f %f' containing the values of x, y, z1, z2 variables. The number of these data are up to 10000.

x and y represents the rectangular domain in which z1 and z2 has been measured. Therefore, the domain remains the same among 1000 files. I want to use parfor and store one vector 10000×1 for x, one vector 10000×1 for y, one array 10000×1000 for z1 and one array 10000×1000 for z2.

I used the following pseudocode:

parfor i=1:1000
   fid=fopen(fname,'r')
   data=textscan(fid,'%f %f %f %f','HeaderLines',10);
   x=data{1}
   y=data{2}
   z1(:,i)=data{3}
   z2(:,i)=data{4}
end

I get the error "The variable z1 in a parfor cannot be classified". The error may arise from the indices which are restricted in parfor loop.

Is there a better way for reading these 1000 files in parallel?

Thanks.

1 Comment

Edric Ellis on 10 Nov 2011

That code should work - in your real code, are you using 'z1' in some other way within the loop?

AP

Tags

Products

No products are associated with this question.

1 Answer

Answer by Daniel on 10 Nov 2011

I am not sure how exactly MATLAB handles file reading and how hard drives handle multiple read request, but my guess is that distributing a job that is IO limited across multiple processors will not speed it up.

1 Comment

Walter Roberson on 10 Nov 2011

Surprisingly, you can get better performance with parallel reads -- at least if you are using SCSI drives with ENQ (enqueue) turned on which allows the drive to re-order read requests according to which destination is "closest" to where it currently is. In common situations, the performance increases up to four parallel reads; in some data access patterns, the performance can continue to climb beyond four parallel reads, but the performance improvement past 4 is not wonderful (but if you have terabytes to get through, you'll take whatever performance increase you can get.)

It also helps if the file you are reading is not compressed and you use scatter/gather I/O.

I do not have any information on drive queue management in the newer PC drives.

Daniel

Contact us