Reading out vector by index matrix too slow?

Hey folks, I'm trying to implement a 3D acoustic source localization algorithm in real-time. Unfortunately, I've to handle a lot of data, i.e., big matrices. To avoid this problem and to avoid running out of memory, I generally work with index-matrices or -vectors. Right now, a special index-operation is the weak point of my implementation regarding processing time. The problem is as follows:
for k=1:270
... % Some code for block-processing (processing time negligible)
for j=1:5
for i=1:70
(1) tmp = sum( DataArray( idxCell{i,j}{1} ) , 2);
(2) ErgArray(:,i) = ErgArray(:,i) + tmp( idxCell{i,j}{2} );
end
end
end
I've pre-allocated memory for 'ErgArray'. I have to do the computations with a lot of data (which is already optimized):
size(tmp1) -> 142 x 1 (single)
size(idxCell{i,j}{1}) -> 142 x 5 (uint16)
size(idxCell{i,j}{2}) -> 51120 x 1 (uint16)
size(ErgArray) -> 51120 x 70 (single)
If I eliminate the second for-loop by vectorization, the whole computation lasts much longer. According to profile and tic/toc, (1) requires 4.8 s, and (2) 19.6 seconds. I work in single-core mode, i.e., using -singleCompThread. The whole process should should take ab to (at most) 3 seconds. Using mex-files doesn't reduce the processing time.
By the way: idxCell{i,j}{1} contains specific indices. Using this cell avoids using complicated filters, i.e., very (!) big matrices and for-loops.
Have you ever encountered such problem? Why does vectorization take more time than using an additional for-loop? Does a more efficient computation exist? If yes, which one? Is this way of filtering inefficient?
PS: Computer-Info: 1) AMD FX(tm)-8150 Eight-Core Processor 2) 1.4 GHz 3) 2048 kB Cache size 4) 15 GB RAM

 Accepted Answer

Jan
Jan on 4 Apr 2013
In general, vectorized methods require more time, when they have to create large intermediate arrays, because allocating memory can consume a lot of time.
What exactly does "big" mean? Some users of this forum are impressed by 100 elements already, others think, that everything, which matchs neatly in the 192GB RAM is not "big".
Nested cells are usually not efficient. What is the contents of idxCell{i,j}{2}? If they are index vectors, logical indexing could be more efficient.
You are calculating sum(DataArray(idxCell{i,j}{1}), 2) in each of the 270 iterations over k again. Wouldn't it be faster, if you avoid such repeated computations?

4 Comments

Hey Jan, thank you for response. Let me answer your questions step by step.
1) What exactly does "big" mean?
Up to 42 MB, i.e., far away from 192 GB.
2) Nested cells are usually not efficient.
The cell element idxCell{i,j}{2} contains a vector with numerical indices. And yes, I'm going to use a matrix instead, because all vectors in idxCell{:,:}{2} exhibit the same length; so using a matrix is feasible. Thank you for this hint!
3) What is the content of idxCell{i,j}{2}?
That's the tricky part. The content of idxCell{i,j}{2} is a numeric index vector, which exhibits more elements (51120 x 1) than the vector tmp (142 x 1). In idxCell{i,j}{2} there are repeated elements. I need this kind of numeric index vector to map data from an optimized domain (1) back to the original domain (2). I use this optimization to reduce the numbers of sums shown in the code above. Both, idxCell{i,j}{1} and idxCell{i,j}{2} are vectors generated by using unique(), i.e., [idxCell{i,j}{1}, ~, idxCell{i,j}{2}] = unique( ). Considering the optimized vector idxCell{i,j}{1} for all my computations dramatically reduces the number of sums and, consequently, processing time.
4) If they are index vectors, logical indexing could be more efficient.
Yes, you are right. But the problem is that vector tmp consists of 142 elements, and idxCell{i,j}{2} consists of 51120 elements. Thus, logical indexing is not valid anymore. Or is there a way to solve this problem by considering logical indexing?
Looking forward to your answer/ideas/recommendations.
And by the way: In the case mentioned above, the use of a cell instead of a matrix always yields a reduced processing time.
And I forgot to mention, that I have to use the sum in the innermost loop, because k represents the frame index; and I have to calculate the sum for each time-frame (I cannot pull it in front of the innermost loop).
Yes, logical indexing works only if no repetitions are required.
Instead of idxCell{:,:}{2} the unnested idxCell{:,:,2} could be faster, but I do not expect this to be significant here.
If the index vector contains repetitions and you want to calculate the sum only, a dot product might be faster:
x = 1:5;
idx = [1,1,2,2,3,4,4,5];
r1 = sum(x(idx))
r2 = x * [2; 2; 1; 2; 1]; % Summed implicitly
This would avoid to create larger intermediate arrays which contain redundant information. Here the vector for the dot product is the "runlength" of idx - I mention this, because I'm going to publish a fast runlength-Mex this week in the FEX...

Sign in to comment.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!