Why are relational operators so slow in this case?

7 views (last 30 days)
Oliver on 21 Feb 2013
I have a 2305-by-4-by-4455 array, "P", from which I want to generate a logical vector, "isdis", for indexing as follows:
isdis = (P(:,2,:) >= P(:,3,:) & P(:,3,:) >= P(:,4,:) & P(:,4,:) >= 0) &...
(P(:,2,:) <= (sqrt(2)-1)*P(:,1,:)) &...
(P(:,2,:)+P(:,3,:)+P(:,4,:) <= P(:,1,:));
Unfortunately, this takes a relatively long time to evaluate for some reason. On my machine (Windows 8, 4GB memory, Core i7 processor, Matlab R2012b), this statement takes about 0.8 sec to evaluate. Which may not seem like a long time, but it accounts for 55% (2 sec) of the larger function that it is a part of, which has to be called many times in my application (at 2 sec a pop, that's an expensive function call). I would have thought that relational operators like this (especially when vectorized as it is) would cost virtually nothing, can anyone help me understand why this is so slow?
If it helps, the one multiplication of:
(sqrt(2)-1)*P(:,1,:))
takes 0.07 sec. And the one addition of:
P(:,2,:)+P(:,3,:)+P(:,4,:)
takes 0.19 sec.
Any suggestions? This line needs to evaluate in at least an order of magnitude less time, and it seems like it ought to be possible to do it in at least 2 orders of magnitude less time.
Oliver on 22 Feb 2013
Thanks for the comment, I guess I just assumed that relational operators would be cheaper than they really are.

Jan on 21 Feb 2013
Creating the temporary array P(:,i,:) is time consuming, because the memory must be copied from non-contiguous blocks. After applying permute(P, [1,3,2]) the comparison is much faster.
P(:,2,:)+P(:,3,:)+P(:,4,:) is 30% slower than sum(P, 2) - P(:,1,:) for the same reason.
Jan on 22 Feb 2013
The leasing dimensions of a multi-dim array have contiguous memory. Therefore the permute operation swaps the 2nd and 3rd dimension:
P = rand(2305, 4, 4455);
Q = permute(P, [1,3,2]);
size(Q) % 2305, 4455, 4
Does this clear the ordering?

Cedric Wannaz on 21 Feb 2013
Edited: Cedric Wannaz on 21 Feb 2013
The first thing that comes to my mind is that you could store slices in 2D variables to eliminate repetitive similar block-indexing operations in the expression for isdis:
slice1 = P(:,1,:) ;
slice2 = P(:,2,:) ;
slice3 = P(:,3,:) ;
slice4 = P(:,4,:) ;
You would then have something like (to check on your side):
isdis = (slice2 >= slice3 & slice3 >= slice4 & slice4 >= 0) &...
(slice2 <= (sqrt(2)-1)*slice1) &...
(slice2+slice3+slice4 <= slice1);
You won't get an order of magnitude improvement in computation time with only that though (maybe 20-30%).
Cedric Wannaz on 23 Feb 2013
Actually your solution is the fastest; I tried to beat it using some hybrid approach, but I can't even get close to your permute().