imread - for vs. parfor - not seeing any gains

Question

0 votes

I am processing a lot of images for work and wanted to take advantage of the parallel architecture to save some time:

With a for loop, my script takes 11.0621 hrs to finish. With a parfor loop, my script takes 10.9742 hrs to finish.

The gain there is so minimal that it could just be fluctuation. The code I'm using calls imread and then regionprops about 800 times per region, and there are 8 regions for 13 samples. Trying to get at what the problem might be, I tried substituting a comparable image that was already in my workspace in lieu of the imread call and in that case the parfor loop went way faster than the for loop (2.692 hrs vs. 4.9351 hrs). This makes me think that my code is fine in terms of the amount of information being passed between workers, that I have a parallel pool, etc.

Anyone have any idea why the gains of using the parallel structure are so minimal with the imread call? Is it my hardware? All code was run on a 64-bit Windows 7 environment with 32 GB of RAM, a 3.4 GHz Intel i7 processor, and two NVIDIA GeForce GTX 570 graphics cards.

Here is a skeleton of my code:

 Sample = {'H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'H7', 'H8', 'H9', 'H10', 'H11', 'H12', 'H13'};
 Regions = {'R1', 'R2', 'R3', 'R4', 'R5', 'R6', 'R7', 'R8'};
 try
   parpool('local', 4);
 catch
 end
 AllSamplesTic = tic;
 for S = Samples
   Sample = char(S);
   for R = Regions
     Region = char(R);
     cd(Sample '\' Region);
     load('RegionInfo.mat');
     TotalImages = RegionInfo(:,5);
     for SliceNum = 1:size(TotalImages,1)
          % Gather some info
          parfor Ind = 1:TotalImages(SliceNum,1)
               I = imread(['pic' num2str(i) '.png']); % or I = dummyImage;
               props = regionsprops(I, 'Area', 'Centroid', 'Eccentricity', 'Orientation');
               % Do some calculations here
               % Store results
          end
     end
   end
 end

Thanks!

4 Comments
Show 2 older comments Hide 2 older comments

Thomas Ibbotson on 13 Jun 2014

Open in MATLAB Online

Ok, I know this has nothing to do with your original problem, but you can use 'gcp' to do that. See:

help gcp

STBLer on 13 Jun 2014

Yeah. I guess that's safer than the try catch set up.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Matt J on 12 Jun 2014

Edited: Matt J on 13 Jun 2014

1 vote

My guess would be that the parallel labs are all fighting each other for access to your hard drive and/or you have a slow hard drive. It might be worth a try to read the images in serially into a cell array (or into the slices of a 3D array if all are the same size) and then broadcast them to parfor.

10 Comments
Show 8 older comments Hide 8 older comments

STBLer on 14 Jun 2014

Edited: STBLer on 14 Jun 2014

Ah - That was my fault for trying to simplify the code too much to post here and for the frustrated previous response. That parfor is for 1:numImages with each slice having its own specific number of images. You end up parallelizing on average 866 images with the code on my machine. I updated the skeleton of my code to reflect this. Sorry about that.

Also, I missed your response regarding the hard drive access: Is there a way to track the idle time of a worker to see if your guess checks out? From my understanding, the run and time function would only tell me what my local worker is doing and nothing about the parallel workers.

I guess I naively thought that you would be capable of writing/reading the hard drive simultaneously on each of the 4 CPUs at least from my understanding of how addressing memory works on a CPU.

Matt J on 14 Jun 2014

Edited: Matt J on 14 Jun 2014

Considering I gain 2 hrs when I use the dummy image I am inclined to disagree that 17 slices is too small to be parallelizing...

The gain of 2 hours represents a factor of 2 speed-up. It's something, but with a parpool of 4 workers, you would hope for something closer to a factor of 4. Clearly the ratio of computation to communication is still not terribly favorable and the size of the parfor loop has a bearing on this.

Is there a way to track the idle time of a worker to see if your guess checks out?

Comment out all the processing steps inside the parfor loop apart from the imread and see how much things speed up. Incidentally, you could then repeat the same, but with 'parfor' replaced with plain 'for'. This would give us an idea how well we do reading from the hard drive in parallel vs. serially.

Rather than waiting 5-10 hours, I'd of course recommend you do these comparisons with the outer loop restricted to a smaller number of Regions and Samples.

Sign in to comment.

Answer 2

Image Analyst on 12 Jun 2014

0 votes

Your hardware is pretty impressive. Doesn't seem like it should take 11 hours. What is the value of numSlices? The badly-named I is a binary image, right, not a gray scale image? Are there tons of regions in it? (That's the normal definition of regions, not your custom definition.) Perhaps some noise reduction would speed up the regionprops() if you're spending a lot of time measuring useless little bits of noise.

3 Comments
Show 1 older comment Hide 1 older comment

Image Analyst on 13 Jun 2014

You didn't answer the question about what numSlices is. Is it 800 slices? So you have 8*13*800 = 83,200 images to do regionprops() on?

STBLer on 13 Jun 2014

Edited: STBLer on 14 Jun 2014

Sorry about that. numSlices varies from sample to sample, but on average it is 16.8163 so call that 17. I oversimplified my skeleton and forgot to include a critical for loop. The code has been updated to reflect this. The number of images processed per slice also varies, but on average is 886, so ballpark there are 15,062 images that need regionprops performed on them for each region, bringing it to a total of 1,459,631 images that get processed over the 11 hrs.

Since you have a sufficiently high reputation, it might be helpful to clean this question up a bit. Also, what do you think of the suggestion that the parallel workers are spending a substantial amount of time being idle because of conflicts accessing the hard drive?

Sign in to comment.

imread - for vs. parfor - not seeing any gains

4 Comments
Show 2 older comments Hide 2 older comments

Answers (2)

10 Comments
Show 8 older comments Hide 8 older comments

3 Comments
Show 1 older comment Hide 1 older comment

Categories

Products

Tags

Community Treasure Hunt

imread - for vs. parfor - not seeing any gains

4 Comments Show 2 older comments Hide 2 older comments

Answers (2)

10 Comments Show 8 older comments Hide 8 older comments

3 Comments Show 1 older comment Hide 1 older comment

Categories

Products

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments

10 Comments
Show 8 older comments Hide 8 older comments

3 Comments
Show 1 older comment Hide 1 older comment