Openning 14000 tif files...

Hello,
I am trying to optimize the time taken to open ~14000 tif files of about 400KB each.
I have tried different approaches essentially based on killing what I do not need and monitoring that with the profiler. But the results seem inconsistent from one run to another.
Essentially my test is the following: I evaluate 3 different ways of operating (from the worst to the best I hope...), for the last ones I have copied rtifc.mexw64 in the current folder.
%%imread
im = zeros(424,424,14000,'uint16');
tic
for k = 1:length(fname)
tmpf = [Folder fname{k}];
tim = imread(tmpf);
im(:,:,k) = tim;
end
T = toc;
[T T/14]
clear im tim tmpf
%%feval tifread
im = zeros(424,424,14000,'uint16');
tic
tf = imformats('tif');
for k = 1:length(fname)
tmpf = [Folder fname{k}];
tim = feval(tf.read, tmpf,1 );
im(1:424,1:424,k) = tim;
end
T = toc;
[T T/14]
clear im tim tmpf
%%rtifc
im = zeros(424,424,14000,'uint16');
tic
tmp.index =1;
tmp.PixelRegion = {[1 424],[1 424]};
tmp.info = imfinfo([Folder fname{1}]);
for k = 1:length(fname)
tmp.filename = [Folder fname{k}];
[tim,trash1,trash2] = rtifc(tmp);
im(1:424,1:424,k) = tim;
end
T = toc;
[T T/14]
clear im tim tmpf
From there I have 2 points:
1. There is no big improvement between the 3...
  • 70s 67s 66s
  • 69s 63s 64s
  • 66s 64s 61s
2. From one run to another I have sometime massive differences for the 3 methods: 29s 18s 16s
Do you have any suggestion? What am I doing wrong? THANK YOU!!! :)

1 Comment

Jan
Jan on 18 Mar 2013
im(1:424,1:424,k) = tim; is slower than im(:,:,k) = tim;, but this will not effect the total runtime significantly.

Sign in to comment.

 Accepted Answer

Walter Roberson
Walter Roberson on 18 Mar 2013

1 vote

Part of the time in reading is getting the files into operating system memory cache. If you have read moderately sized files recently, then you might not be necessary to fetch them from hard disk to main memory; instead a memory-to-memory copy might be all that is needed.

8 Comments

Jobi
Jobi on 18 Mar 2013
The thing is that these files are coming from an acquisition system, so at some point they are copied on the hard-drive a computer which is used for analysis. So I am not sure that I can circumvent that if I have properly understood your idea?
I am answering to your question about the massive difference in run times.
If you need repeated fast response then you could obtain a Solid State drive.
Jobi
Jobi on 18 Mar 2013
Ok meaning that there is no way to set something like a priority somewhere? So at the end whatever the method I use, this will have no influence if the time needed by the OS is that variable?
If you are going to be using the same files a number of times in a relatively short period, you can "warm up" the disk cache by using some code that will read a bit from each file ahead of time. For example,
system('copy *.tif NUL:'); %MS Windows only
might do it.
If it is a disk effect you are seeing, then setting a priority is likely not going to help.
Jobi
Jobi on 18 Mar 2013
No a tif file is read only once in my application... No other idea? :)
Throw a solid state disk at it; the time you spend trying to work around hardware and operating system limits would be costly.
Jan
Jan on 18 Mar 2013
@Jobi: What is your actual problem? Does it matter if you wait for 30 or 60 seconds?
Jobi
Jobi on 19 Mar 2013
Yes because this code will run on different computers with different hardware configurations, but moreover the size of the tif (mainly pixels number) is not constant. This case with 14000 tif is not even our worst case.

Sign in to comment.

More Answers (2)

If you have the Parallel Computing Toolbox; how about using parfor?
doc parfor
If you do not have the PCT, I'm sure your friendly Sales Rep could set you up with a trial.

1 Comment

Jobi
Jobi on 18 Mar 2013
Hi have tried with 4 workers (4cpu) but it had actually the opposite effect (>300s...) :( If I am correct I just need to add matlabpool and parfor rather than for in the code?

Sign in to comment.

Jan
Jan on 18 Mar 2013

0 votes

accessing the hard disk is influenced by many different factors: Disk fragmentation, defragmentation tools, other jobs accessing the disk, weak blocks which are moved transparently, virus checkers which check modified files at the first access, downloads of updates in the background, other tasks which swap data to the disk, etc. Therefore a difference of 50% is not very surprising.
The best setup for a speed measurement with disk access is using a dedicated disk (not partition!) for the data.

1 Comment

Jobi
Jobi on 19 Mar 2013
ok I just thought there would be a way to set priorities. Or tricks that I have not thought about.

Sign in to comment.

Categories

Find more on Communications Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!