Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
please help optimize this ('find' is too slow)

Subject: please help optimize this ('find' is too slow)

From: Jayveer

Date: 2 Dec, 2008 20:40:04

Message: 1 of 22

can anyone please help me speed this up?the line with find is way too slow!

any suggestions would be much appreciated.

Description of inputs:
P: 3Xn matrix where each row represents x,y and z coordinates of P(n)
NCx,NCy,NCz: mX8 matrix each where each of the 8 columns represent the coordinate (x or y or z depending NCx or NCy or NCz) of nodes one to 8 of a cubic cell.

function [PiC]=PinC(P,NCx,NCy,NCz)
if length(NCx(:,1))<=32767,PiC=zeros(1,length(P),'int16');
else PiC=single(zeros(1,length(P)));end
% Simulataneously finding all Material Points that could be in cell c using
% the x,y and z limits of the cell
for c=1:length(NCx(:,1))% For each cell
    nx=find(P(1,:)>=min(NCx(c,:))&P(1,:)<max(NCx(c,:))); % bottleneck!
    if ~isempty(nx);
        ny=nx(P(2,nx)>=min(NCy(c,:))&P(2,nx)<max(NCy(c,:)));
        if ~isempty(ny);
            nz=ny(P(3,ny)>=min(NCz(c,:))&P(3,ny)<max(NCz(c,:)));
            if ~isempty(nz);PiC(nz)=c;end
        end
    end
end

---------older version - a lot slower!!!
function [PiC]=PinC(P,NCx,NCy,NCz)
PiC=zeros(1,length(P));
% Simulataneously findind all Material Points that could be in cell c using
% the x,y and z limits of the cell
for c=1:length(NCx(:,1))% For each cell
    n=logical(P(1,:)>=min(NCx(c,:))&P(1,:)<max(NCx(c,:))&P(2,:)>=min(NCy...
        (c,:))&P(2,:)<max(NCy(c,:))&P(3,:)>=min(NCz(c,:))&P(3,:)<max...
        (NCz(c,:)));PiC(n)=c;
end

Subject: please help optimize this ('find' is too slow)

From: Stefan

Date: 2 Dec, 2008 21:25:03

Message: 2 of 22

Some sample data would be helpful.

Regards,
Stefan

"Jayveer " <jveer@jveer.com> wrote in message <gh46f4$pg2$1@fred.mathworks.com>...
> can anyone please help me speed this up?the line with find is way too slow!
>
> any suggestions would be much appreciated.
>
> Description of inputs:
> P: 3Xn matrix where each row represents x,y and z coordinates of P(n)
> NCx,NCy,NCz: mX8 matrix each where each of the 8 columns represent the coordinate (x or y or z depending NCx or NCy or NCz) of nodes one to 8 of a cubic cell.
>
> function [PiC]=PinC(P,NCx,NCy,NCz)
> if length(NCx(:,1))<=32767,PiC=zeros(1,length(P),'int16');
> else PiC=single(zeros(1,length(P)));end
> % Simulataneously finding all Material Points that could be in cell c using
> % the x,y and z limits of the cell
> for c=1:length(NCx(:,1))% For each cell
> nx=find(P(1,:)>=min(NCx(c,:))&P(1,:)<max(NCx(c,:))); % bottleneck!
> if ~isempty(nx);
> ny=nx(P(2,nx)>=min(NCy(c,:))&P(2,nx)<max(NCy(c,:)));
> if ~isempty(ny);
> nz=ny(P(3,ny)>=min(NCz(c,:))&P(3,ny)<max(NCz(c,:)));
> if ~isempty(nz);PiC(nz)=c;end
> end
> end
> end
>
> ---------older version - a lot slower!!!
> function [PiC]=PinC(P,NCx,NCy,NCz)
> PiC=zeros(1,length(P));
> % Simulataneously findind all Material Points that could be in cell c using
> % the x,y and z limits of the cell
> for c=1:length(NCx(:,1))% For each cell
> n=logical(P(1,:)>=min(NCx(c,:))&P(1,:)<max(NCx(c,:))&P(2,:)>=min(NCy...
> (c,:))&P(2,:)<max(NCy(c,:))&P(3,:)>=min(NCz(c,:))&P(3,:)<max...
> (NCz(c,:)));PiC(n)=c;
> end

Subject: please help optimize this ('find' is too slow)

From: Jayveer

Date: 4 Dec, 2008 01:53:02

Message: 3 of 22


hey thank you for taking a look at my problem. the sample data, .m and profiled .jpg can be downloaded from http://idisk.mac.com/jveer-Public?view=web

i've also emailed you a copy of the files.


"Stefan" <gonseaw@yahoo.com> wrote in message <gh493f$j96$1@fred.mathworks.com>...
> Some sample data would be helpful.
>
> Regards,
> Stefan

Subject: please help optimize this ('find' is too slow)

From: Roger Stafford

Date: 4 Dec, 2008 06:02:03

Message: 4 of 22

"Jayveer " <jveer@jveer.com> wrote in message <gh46f4$pg2$1@fred.mathworks.com>...
> can anyone please help me speed this up?the line with find is way too slow!
> ........

  First of all, unless your cubic cell faces are each aligned with the three coordinate axes, your code does not appear to make much sense, so I assume that they are so aligned. If so, you should be dealing with six quantities per cell, not the 3 times 8 for NCx, NCy, and NCz. These should be for each cell, the lower and upper x-limits, the lower and upper y-limits, and the lower and upper z-limits, of that cell.

  Next, you should select one of the coordinates, say the x-axis, to be involved in a sort operation involving not only the x-values of P(1,:) but also the lower and upper x-limits for each of the cells. If Lx and Ux are respective row vectors of these limits, you could do a sort on [Lx,Ux,P(1,:)], and this need be done only once. After the sort is finished it is relatively easy to then extract the corresponding 'nx' for each cell with much less expenditure of execution time than doing a 'find' operation each time, making use of both the sorted list and the accompanying index vector of the sort command.

  After that, the number of points that need to be searched for each cell should be greatly reduced and you can proceed as you have already done in your code with finding ny and then nz.

  I realize the above "relatively easy" part leaves a lot of the work to you but I am hoping you will be able to carry it out on your own. If it gives you too much trouble, give a cry of distress and perhaps more help can be forthcoming.

Roger Stafford

Subject: please help optimize this ('find' is too slow)

From: Jayveer

Date: 4 Dec, 2008 07:12:08

Message: 5 of 22

Hello Roger

Thank you for the suggestion. However i dont see how it'll solve my problem.

1. yes the faces are aligned with the three coordinate axes
2. there are 8 nodes, hence the '8' in NCx,NCy and NCz but you are right: there are six limits given by max(NCx(1,:)) and so on.

The main flaw in your proposed solution is the sorting. The PinC function is part of a much larger program which calls this function hundreds of times. The index of the P values represent the associated particle number. The PiC end result needs to reflect that indexing otherwise it wont work with the rest of the program.

Please take a look at the sample data and solution provided on http://idisk.mac.com/jveer-Public?view=web

Have you got any other recommendations? I need the ouput PiC to be exactly the way this function outputs i.e with the correct indexing.


"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gh7ror$eno$1@fred.mathworks.com>...
> "Jayveer " <jveer@jveer.com> wrote in message <gh46f4$pg2$1@fred.mathworks.com>...

Subject: please help optimize this ('find' is too slow)

From: Roger Stafford

Date: 4 Dec, 2008 20:42:02

Message: 6 of 22

"Jayveer " <jveer@jveer.com> wrote in message <gh7vs8$fgq$1@fred.mathworks.com>...
> Thank you for the suggestion. However i dont see how it'll solve my problem.
> ........

  I stick by my recommendation, Jayveer. Below is an implementation of what I had in mind. You will recall how I defined the limits Lx and Ux. You'll note that the 'nx' generated will be the same as the one you generate except for a permutation. For each value of 'c' it references the same set of elements of P(1,:) as yours but in a different order. That shouldn't matter since you are placing the same value 'c' at each point of PiC. The order in which you do so shouldn't matter.

  In the "nx = nx(nx>2*m)-2*m;" line below there is of course a scanning that takes place similar to that of the 'find' function to test the inequality, but the advantage in execution time is that it occurs only over the number of elements defined by "s(c):s(c+m)" of the previous line, so it should execute much faster than your 'find', which has to scan the entire range of P(1,:) at each step in the for-loop. That is, the scanning/testing is limited to whatever elements of Lx, Ux, and P(1,:) happen to fall within a particular cell's lower and upper x-limits.

% An example:
P(1,:)= [9 6 2 13 4 3.4 6 11 7 4 1.2 5.7];
Lx = [2 4 9 1 8];
Ux = [6 11 13 14 7];

% The "engine"
n = size(P,2);
m = length(Lx);
[t,r] = sort([Lx,Ux,P(1,:)]);
s = 1:length(r); s(r) = s; % <-- s is the inverse of permutation r
for c = 1:m
 nx = r(s(c):s(c+m));
 nx = nx(nx>2*m)-2*m;
 P(1,nx) % <-- Display nx selections in P(1,:)
 % Then go on to use nx as you did in your code
end

Roger Stafford

Subject: please help optimize this ('find' is too slow)

From: Jayveer

Date: 4 Dec, 2008 23:50:19

Message: 7 of 22

i've changed my code to the following based on your recommendation. it doesnt produce the same result as the original one :S. i think i might be doing something wrong. could you please check it?

i understand the idea up to the sorting but can t really see whats going witht the remaining 3 lines you wrote. your solution gives a PiC full of zeros i.e it does go past the ifs!

new code based on your recommendation:

clear;load SampleData2.mat NCx NCy NCz P

m=length(NCx(:,1));
if m<=32767,PiC1=zeros(1,length(P),'int16');
else PiC1=single(zeros(1,length(P)));end

Lx=max(NCx,[],2)';Ux=min(NCx,[],2)'; % Generates lower and upper limits
n = size(P,2);
% m = length(Lx); % same as length(NCx(:,1));
[t,r] = sort([Lx,Ux,P(1,:)]);
s = 1:length(r); s(r) = s; % <-- s is the inverse of permutation r

% Simulataneously findind all Material Points that could be in cell c using
% the x,y and z limits of the cell
for c=1:m% For each cell
    nx = r(s(c):s(c+m));
    nx = nx(nx>2*m)-2*m;
% nx=find(P(1,:)>=min(NCx(c,:))&P(1,:)<max(NCx(c,:))); % bottleneck!
    if ~isempty(nx);
        ny=nx(P(2,nx)>=min(NCy(c,:))&P(2,nx)<max(NCy(c,:)));
        if ~isempty(ny);
            nz=ny(P(3,ny)>=min(NCz(c,:))&P(3,ny)<max(NCz(c,:)));
            if ~isempty(nz);PiC1(nz)=c;end
        end
    end
end

Subject: please help optimize this ('find' is too slow)

From: Jayveer

Date: 4 Dec, 2008 23:56:02

Message: 8 of 22

i changed my code to the following based on your recommendation. it produces a PiC full of zeros which implies its not going past the ifs. i must be doing something wrong. could you please have a run through it?

new code based on your recommendation:

clear;load SampleData2.mat NCx NCy NCz P

m=length(NCx(:,1));
if m<=32767,PiC1=zeros(1,length(P),'int16');
else PiC1=single(zeros(1,length(P)));end

Lx=max(NCx,[],2)';Ux=min(NCx,[],2)'; % Generates lower and upper limits
n = size(P,2);
% m = length(Lx); % same as length(NCx(:,1));
[t,r] = sort([Lx,Ux,P(1,:)]);
s = 1:length(r); s(r) = s; % <-- s is the inverse of permutation r

% Simulataneously findind all Material Points that could be in cell c using
% the x,y and z limits of the cell
for c=1:m% For each cell
    nx = r(s(c):s(c+m));
    nx = nx(nx>2*m)-2*m;
% nx=find(P(1,:)>=min(NCx(c,:))&P(1,:)<max(NCx(c,:))); % bottleneck!
    if ~isempty(nx);
        ny=nx(P(2,nx)>=min(NCy(c,:))&P(2,nx)<max(NCy(c,:)));
        if ~isempty(ny);
            nz=ny(P(3,ny)>=min(NCz(c,:))&P(3,ny)<max(NCz(c,:)));
            if ~isempty(nz);PiC1(nz)=c;end
        end
    end
end

Subject: please help optimize this ('find' is too slow)

From: Roger Stafford

Date: 5 Dec, 2008 00:29:02

Message: 9 of 22

"Jayveer " <jveer@jveer.com> wrote in message <gh9qmi$l5c$1@fred.mathworks.com>...
> i changed my code to the following based on your recommendation. it produces a PiC full of zeros which implies its not going past the ifs. i must be doing something wrong. could you please have a run through it?
> .......
> Lx=max(NCx,[],2)';Ux=min(NCx,[],2)'; % Generates lower and upper limits
> .......

  Jayveer, your Lx and Ux are defined the opposite of what I intended, so it's understandable that you got all zeros. I meant for Lx to have the lower limits and Ux to have the upper ones. You have them the other way around.

Roger Stafford

Subject: please help optimize this ('find' is too slow)

From: Jayveer

Date: 5 Dec, 2008 02:00:20

Message: 10 of 22

lol, so silly of me! you are right! :) thanks loads dude! the processing time went down from 28s to 13s!

however, considering right now i have a simulation running which initially said '10Hrs remaining', 13s is still quite slow coz the function is called 600k+ depending on the model being simulated.

you seem to have a pretty good understanding of code optimisation. have you got any more ideas how to make this even faster? i'm trying to bring this down to sort of 5s ish for this particular sample data.

this one appears to be the slowest one now -takes 6.7s out of 13s

        ny=nx(P(2,nx)>=min(NCy(c,:))&P(2,nx)<max(NCy(c,:)));

any ideas?

Subject: please help optimize this ('find' is too slow)

From: Roger Stafford

Date: 6 Dec, 2008 18:13:02

Message: 11 of 22

"Jayveer " <jveer@jveer.com> wrote in message <gha1vj$850$1@fred.mathworks.com>...
> lol, so silly of me! you are right! :) thanks loads dude! the processing time went down from 28s to 13s!
>
> however, considering right now i have a simulation running which initially said '10Hrs remaining', 13s is still quite slow coz the function is called 600k+ depending on the model being simulated.
>
> you seem to have a pretty good understanding of code optimisation. have you got any more ideas how to make this even faster? i'm trying to bring this down to sort of 5s ish for this particular sample data.
>
> this one appears to be the slowest one now -takes 6.7s out of 13s
>
> ny=nx(P(2,nx)>=min(NCy(c,:))&P(2,nx)<max(NCy(c,:)));
>
> any ideas?

  I notice a couple of other aspects of your code that you could change to possibly improve its timing efficiency, Jayveer.

  You are using the element-wise '&' operation rather than the short-circuit '&&' operation to do your AND-ing. Using the latter would avoid wasting time on testing the second operand whenever the first were false. That could make a noticeable difference on the line you mentioned:

 ny=nx(P(2,nx)>=min(NCy(c,:))&P(2,nx)<max(NCy(c,:)))

  Also this line might conceivably be speeded up if you wrote

 temp = P(2,nx);
 ny=nx(temp>=Ly(c) && temp<Uy(c));

so that P(2,nx) and the lower and upper limits need be calculated only once per loop circuit.

  Finally, I don't think any of those tests you're doing for empty arrays are necessary, such as "if ~isempty(nx)" or "~isempty(ny)". Matlab knows how to deal with an empty 'nx' in

 ny = nx(P(2,nx)>=...

It just comes up with 'ny' also empty, which is appropriate. You are wasting time on this testing where most of the cases are probably non-empty, at least for "~isempty(nx)" and probably for "~isempty(ny)".

  Of these various improvements, I would consider '&&' the most significant.

Roger Stafford

Subject: please help optimize this ('find' is too slow)

From: Jveer

Date: 6 Dec, 2008 21:36:03

Message: 12 of 22

hello Roger, thank you for the quick replies

i've tried your latest recommendations before. I'm afraid they either dont work or dont provide any speed improvements at all

1. the && doesnt work with vectors. it gives the following error msg: "??? Operands to the || and && operators must be convertible to logical scalar values."

2. as for keeping the min,max outside the loop, i did the following:
Uy=max(NCy,[],2)';Ly=min(NCy,[],2)';Uz=max(NCz,[],2)';Lz=min(NCz,[],2)'; (outside loop)
and then changed the inside loop statements to these:
        ny=nx(P(2,nx)>=Ly(c)&P(2,nx)<Uy(c));
            nz=ny(P(3,ny)>=Lz(c)&P(3,ny)<Uz(c));
however no incr in speed. the main drawback from this method is that i'll have FOUR additional very long variables coz the column length of P is typically more than 350k
for any decent simulation. therefore from a memory optimization perspective it is not desirable.

3. the ~isempty test hardly takes any time. also in MOST CASES they are empty. this test is kinda my workout not being able to use short-circuit &&.

removing the ~isempty tests dont increase speed. i understand your point that matlab understands how to deal with nx=[]. i included out of a 'paranoia' that matlab will run unnecessary of the inside brackets before realising that nx=[]; nx(P(2,nx)>=min(NCy(c,:))&P(2,nx)<max(NCy(c,:)));

correction from my last post the function PinC runs the same amount of the time steps, not 600k- more like 200-800 typically.

i was wondering if there any chance of using a similar method to the one u came up with to replace the 'find' function. i was well impressed by that and i still dont fully understand it.

the 2 time consuming lines in my current function are:
        ny=nx(P(2,nx)>=min(NCy(c,:))&P(2,nx)<max(NCy(c,:)));
            nz=ny(P(3,ny)>=min(NCz(c,:))&P(3,ny)<max(NCz(c,:)));

they both sort of do what that find function did except they use logicals. referring to an older version in the first post, i broke down that long logical so that the above two lines operate on subsets instead of the whole range of terms(typically 350k for every row)

any other recommendations you may have are most welcome.

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <ghefbe$ell$1@fred.mathworks.com>...
> "Jayveer " <jveer@jveer.com> wrote in message <gha1vj$850$1@fred.mathworks.com>...
> I notice a couple of other aspects of your code that you could change to possibly improve its timing efficiency, Jayveer.

Subject: please help optimize this ('find' is too slow)

From: Roger Stafford

Date: 7 Dec, 2008 04:42:01

Message: 13 of 22

"Jveer " <jveer@jveer.com> wrote in message <gher83$sfd$1@fred.mathworks.com>...
> .......
> i was wondering if there any chance of using a similar method to the one u came up with to replace the 'find' function. .........

  Sorry about the '&&' operator - my mistake. However, you can achieve the same effect by replacing

 ny = nx(P(2,nx)>=Ly(c) & P(2,ny)< Uy(c));

with

 ny = nx(P(2,nx)>=Ly(c));
 ny = ny(P(2,ny)< Uy(c));

The second line handles only those elements which passed the test in the first line, so there should be fewer comparisons made altogether. On my machine the two lines are in fact generally somewhat faster than the single line (though my version is admittedly rather ancient and therefore an unreliable measure of newer execution times.)

  As to coming up with a fundamentally improved method of indexing on the 'ny' line, I'm afraid that might be very difficult to do. This problem has come up in earlier threads and I was unable then to find a way of speeding up more than the first coordinate set computation. And unfortunately I'm no smarter now than I was then.

  The problem is that when one has done the all-important sort on x-values, then the corresponding y-values are not in sorted order, and the method used depends heavily on this sorted property in quickly picking out the x's that were in range. The corresponding y values can be all over the place and it's not easy to find those that also lie within the cell's lower and upper y-limits without tediously checking each one individually, as in the present code.

Roger Stafford

Subject: please help optimize this ('find' is too slow)

From: Roger Stafford

Date: 8 Dec, 2008 04:06:01

Message: 14 of 22

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message
> .......
> As to coming up with a fundamentally improved method of indexing on the 'ny' line, I'm afraid that might be very difficult to do. This problem has come up in earlier threads and I was unable then to find a way of speeding up more than the first coordinate set computation. And unfortunately I'm no smarter now than I was then.
> .......

  Perhaps I surrendered too soon in that previous article, Jayveer. A method has occurred to me that would first require doing the kind of sorting I previously mentioned three times rather than one time, once for each coordinate direction. That is, there would be three sorts done just one time each for a given run which look like this:

[t,rx] = sort([Lx,Ux,X]);
[t,ry] = sort([Ly,Uy,Y]);
[t,rz] = sort([Lz,Uz,Z]);

(where X = P(1,:), Y = P(2,:), and Z = P(3,:) in your terminology.)

  However sorts are time consuming operations and the question is whether doing these one time each would already make the total time excessive even before doing any other processing. How important is that one sort I previously recommended in the whole timing picture? You didn't mention how long it was taking as compared with other operations.

  Assuming the above timing is not excessive, then a for-loop can be devised which operates on the cells, fifty-three cells at a time, rather than one at a time. Each pass through the loop would take more time than one pass in your loop but there will be many fewer loops, 1/53-rd as many in fact.

  Just to give a tantalizing outline of the method, for each coordinate an array of elements as long as X, Y, or Z, is computed in which each element's 53 bits are 1 or 0 according as the corresponding point's coordinate falls within the 53 cells' various lower and upper limits. These three arrays are then combined using 'bitand' calls to obtain a single bit in each of the 53 bit positions that is 1 or 0 according as the corresponding whole cubic cell contains the point or not. Then the 'log2' function can be used to convert to the appropriate c value to be placed in the PiC array as you do in your code.

  As you can probably guess, the number 53 derives from the fact that this is the maximum number of bits that can be contained in any one double precision integer in Matlab's double format.

  So, does any of this interest you, or do the three fearsome sorts already rule this method out?

Roger Stafford

Subject: please help optimize this ('find' is too slow)

From: Jveer

Date: 9 Dec, 2008 04:06:02

Message: 15 of 22

hello Roger

thank you for the continued support. i apologise for not responding sooner.

well the sort worked miracles on the original version. it brought down the processing time on a sample data from 28s to 13s. Additionally, I've broken down all the & and replaced them with if ~isempty to mimick the &&. The profiler results on the sample data posted on http://idisk.mac.com/jveer-Public?view=web of the latest version of the code based on your recommendations were very satisfying. The processing time went down from 170s to 42s!

the latest code is as follows:

load SampleData.mat NCx NCy NCz P

m=length(NCx(:,1));
if m<=32767,PiC=zeros(1,length(P),'int16');
else PiC=single(zeros(1,length(P)));end

% Uy=max(NCy,[],2)';Ly=min(NCy,[],2)';Uz=max(NCz,[],2)';Lz=min(NCz,[],2)';
Ux=max(NCx,[],2)';Lx=min(NCx,[],2)';
[t,r]=sort([Lx,Ux,P(1,:)]);s=1:length(r);s(r)=s;

% Simulataneously findind all Material Points that could be in cell c using
% the x,y and z limits of the cell
for c=1:m% For each cell
    nx=r(s(c):s(c+m));nx=nx(nx>2*m)-2*m;
    if ~isempty(nx);
        ny=nx(P(2,nx)>=min(NCy(c,:)));
        if ~isempty(ny)
            ny=ny(P(2,ny)<max(NCy(c,:)));
            if ~isempty(ny);
                nz=ny(P(3,ny)>=min(NCz(c,:)));
                if ~isempty(nz)
                    nz=nz(P(3,nz)<max(NCz(c,:)));
                    if ~isempty(nz);
                        PiC(nz)=c;
                    end
                end
            end
        end
    end
end

Your new idea sounds very promising. I'm afraid i have no idea how to put that into code though. i reckon as long as the sort is outside the loop it'll be fine. it 0.09s on the sample data i used. profiler results .tiff can be found on the link above. i know it's a lot to ask but any chance you could put that into code?



"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <ghi6f9$3um$1@fred.mathworks.com>...
> "Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message

> Perhaps I surrendered too soon in that previous article, Jayveer. A method has occurred to me that would first require doing the kind of sorting I previously mentioned three times rather than one time, once for each coordinate direction. That is, there would be three sorts done just one time each for a given run which look like this:

Subject: please help optimize this ('find' is too slow)

From: Roger Stafford

Date: 10 Dec, 2008 20:35:04

Message: 16 of 22

"Jveer " <jveer@jveer.com> wrote in message <ghkqra$pq9$1@fred.mathworks.com>...
> .......
> Your new idea sounds very promising. I'm afraid i have no idea how to put that into code though. i reckon as long as the sort is outside the loop it'll be fine. it 0.09s on the sample data i used. profiler results .tiff can be found on the link above. i know it's a lot to ask but any chance you could put that into code?
> .......

  I worked out the details of that scheme I told you about, Jayveer, and include it below. However, I do so with misgivings, since my impression is that it is very much slower than your method. You can try it out and see.

  It does as I described with three preliminary sorts followed by a for-loop that handles fifty-three of the cubic cells at a time and therefore makes only some two percent as many trips through the loop. However the processing during each trip is much more time-consuming. Only at the point "q = find(r>0)" does it narrow down the length of the arrays it is dealing with to a small number. So use it at your own risk. I compared its results with those of yours on some fairly large random data sets and they do agree.

  Note that the routine fails unless the lower limit is less than or equal to the corresponding upper limit for each coordinate in each cell. The row vectors X, Y, and Z are your P(1,:), P(2,:), and P(3,:) and the row vectors Lx, Ly, Lz, Ux, Uy, and Uz come from your NCx, NCy, and NCz in the obvious way. The quantity called w refers to the number of bits available in one 'double' format number, and it must not exceed 53, though it may be made less if desired.

  If nothing else, this example can serve to show you things like 1) how to find the inverse of a permutation, 2) how 53 binary results can be packed into a single matlab 'double' number, or 3) how the 'cumsum' function can be used to assist in doing so.

n = length(X);
m = length(Lx);
w = 53;
[t,px] = sort([Lx,Ux,X]); px(px) = 1:2*m+n;
[t,py] = sort([Ly,Uy,Y]); py(py) = 1:2*m+n;
[t,pz] = sort([Lz,Uz,Z]); pz(pz) = 1:2*m+n;
PiC = zeros(1,n);
for k = 0:w:m-1
 u = k+1:min(k+w,m);
 v = 2.^(0:length(u)-1);
 u = [u u+m];
 v = [v -v];
 s = zeros(1,2*m+n);
 s(px(u)) = v;
 s = cumsum(s);
 rx = s(px(2*m+1:2*m+n));
 s = zeros(1,2*m+n);
 s(py(u)) = v;
 s = cumsum(s);
 ry = s(py(2*m+1:2*m+n));
 s = zeros(1,2*m+n);
 s(pz(u)) = v;
 s = cumsum(s);
 rz = s(pz(2*m+1:2*m+n));
 r = bitand(bitand(rx,ry),rz);
 q = find(r>0);
 [f,e] = log2(r(q));
 PiC(q) = e+k;
end

Subject: please help optimize this ('find' is too slow)

From: Jveer

Date: 11 Dec, 2008 18:48:02

Message: 17 of 22

First, thanks a lot for your time and effort, Roger. I really appreciate the help. The code you've provided is a very smart piece of coding and i humbly admit that i havent quite figured out how it works.

However, it isn't faster. it brought the time down by 2s on a sample data with 364685 particles and 18750 cells but increased the processing time by 20s on a sample with the same number of particles and 2535 cells. Very unfortunate to be honest coz i absolutely loved this solution.

do u think there is any way to optimize it wrt number of cells as well? there'll always be significantly more particles.

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <ghp95o$27o$1@fred.mathworks.com>...
> "Jveer " <jveer@jveer.com> wrote in message <ghkqra$pq9$1@fred.mathworks.com>...
> I worked out the details of that scheme I told you about, Jayveer, and include it below. However, I do so with misgivings, since my impression is that it is very much slower than your method. You can try it out and see.

Subject: please help optimize this ('find' is too slow)

From: Roger Stafford

Date: 11 Dec, 2008 20:37:02

Message: 18 of 22

"Jveer " <jveer@jveer.com> wrote in message <ghrn92$k9o$1@fred.mathworks.com>...
> .......
> However, it isn't faster. it brought the time down by 2s on a sample data with 364685 particles and 18750 cells but increased the processing time by 20s on a sample with the same number of particles and 2535 cells. Very unfortunate to be honest coz i absolutely loved this solution.
>
> do u think there is any way to optimize it wrt number of cells as well? there'll always be significantly more particles.
> ........

  I was afraid of that, based on some trials I ran. Here are some questions.

  Could you give me a feeling, not only for the numbers of your "particles" and "cells", but also the typical widths of these cells in relation to the typical coordinate ranges of the particles? I don't have a clear picture of how densely these particles are distributed with respect to the cell dimensions. How many particles are likely to find their way into each particular cell? That seems to me to be a crucial statistic.

  Also perhaps you could indicate the typical timing profiles you get with this latest solution of mine. It would be interesting to see what new bottlenecks I have created. I have particular doubts about the "bitand(bitand(rx,ry),rz)" step which takes two sweeps instead of one.

Roger Stafford

Subject: please help optimize this ('find' is too slow)

From: Roger Stafford

Date: 12 Dec, 2008 01:20:03

Message: 19 of 22

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <ghrtle$ngk$1@fred.mathworks.com>...
> I was afraid of that, based on some trials I ran. Here are some questions.
> .........

  Also there is one other question I neglected to ask. Can your cells ever overlap one another in three dimensional space? As things are at present in both your code and mine, the cell with the highest sequence number wins out in such a situation. Can we count on this never happening?

Roger Stafford

Subject: please help optimize this ('find' is too slow)

From: Roger Stafford

Date: 12 Dec, 2008 09:46:02

Message: 20 of 22

"Jveer " <jveer@jveer.com> wrote in message <ghrn92$k9o$1@fred.mathworks.com>...
> .......
> do u think there is any way to optimize it wrt number of cells as well? there'll always be significantly more particles.
> .......

  Here is my latest try, Jayveer. It differs from the most recent one in that 1) it reverts to taking cells one at a time in the for-loop, 2) it nevertheless still does three one-time sorts, 3) it does no size comparisons within the loop but depends entirely on index lists emanating from the sorts and the colon operator, and 4) it performs an AND on the three lists of resulting booleans, bx, by, & bz.

  Within the for-loop the three instructions to clear the three 2*m+n length booleans each time and the n-element AND operation may well be the most time-consuming steps. Each of the three steps that use the colon will presumably have a much reduced number of elements to deal with and be faster.

  The code assumes the same arrays as before: X, Y, Z, Lx, Ly, Lz, Ux, Uy, and Uz.

n = length(X);
m = length(Lx);

[nx,nx] = sort([Lx,Ux,X]);
lx = 1:2*m+n; lx(nx) = lx;
ux = lx(m+1:2*m)-1; lx = lx(1:m)+1;

[ny,ny] = sort([Ly,Uy,Y]);
ly = 1:2*m+n; ly(ny) = ly;
uy = ly(m+1:2*m)-1; ly = ly(1:m)+1;

[nz,nz] = sort([Lz,Uz,Z]);
lz = 1:2*m+n; lz(nz) = lz;
uz = lz(m+1:2*m)-1; lz = lz(1:m)+1;

br = 2*m+1:2*m+n;
PiC = zeros(1,n);

for c = 1:m
  bx = zeros(1,2*m+n);
  bx(nx(lx(c):ux(c))) = 1;
  by = zeros(1,2*m+n);
  by(ny(ly(c):uy(c))) = 1;
  bz = zeros(1,2*m+n);
  bz(nz(lz(c):uz(c))) = 1;
  PiC(bx(br)&by(br)&bz(br)) = c;
end

Roger Stafford

Subject: please help optimize this ('find' is too slow)

From: Roger Stafford

Date: 12 Dec, 2008 18:38:02

Message: 21 of 22

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <ghtbsq$p90$1@fred.mathworks.com>...
> ......
> Here is my latest try, Jayveer. It differs from the most recent one in that ......
> ......

  Jayveer, after a night's sleep I've made three changes to yesterday's code in the interests of possibly increased efficiency. A) I no longer reset bx, by, and bz to zeros with a 'zeros' command at the beginning of each loop. Instead, after adjusting PiC I just reset those elements of the b's which were previously set to ones using mx, my, and mz. B) Setting c values in PiC is no longer restricted to the range 2*m+1:2*m+n. After exiting the loop, the extra 1:2*m portion of PiC is eliminated. C) ux, uy, uz and lx, ly, lz no longer have 1's subtracted and added respectively in the beginning. It isn't really necessary.

  Of these changes, the one that might give you trouble is having arrays mx, my, and mz within the loop which are continually being allocated and reallocated to various different lengths. Perhaps that could cause memory fragmentation problems. If so, just eliminate mx and say simply "bx(nx(lx(c):ux(c)))=1" before, and "bx(nx(lx(c):ux(c)))=0" after, setting PiC, and similarly for my and mz.

n = length(X);
m = length(Lx);
m1 = 2*m+1; mn = 2*m+n;

[t,nx] = sort([Lx,Ux,X]);
lx = 1:mn; lx(nx) = lx;
ux = lx(m+1:2*m); lx = lx(1:m);

[t,ny] = sort([Ly,Uy,Y]);
ly = 1:mn; ly(ny) = ly;
uy = ly(m+1:2*m); ly = ly(1:m);

[t,nz] = sort([Lz,Uz,Z]);
lz = 1:mn; lz(nz) = lz;
uz = lz(m+1:2*m); lz = lz(1:m);

bx = zeros(1,mn);
by = zeros(1,mn);
bz = zeros(1,mn);
PiC = zeros(1,mn);

for c = 1:m
  mx = nx(lx(c):ux(c));
  my = ny(ly(c):uy(c));
  mz = nz(lz(c):uz(c));
  bx(mx) = 1;
  by(my) = 1;
  bz(mz) = 1;
  PiC(bx&by&bz) = c;
  bx(mx) = 0;
  by(my) = 0;
  bz(mz) = 0;
end
PiC = PiC(m1:mn);

Roger Stafford

Subject: please help optimize this ('find' is too slow)

From: Jveer

Date: 19 Dec, 2008 20:54:03

Message: 22 of 22

apologies for late reply. been away.

anyway i'm sorry to report that the code is slower. it's because of the use of '&'. somehow this is very slow. i tend to go around it by using ~if(isempty).

i've kinda given up coz i dont think it can be made to run faster. thanks a lot for trying.


"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <ghub2a$ned$1@fred.mathworks.com>...
> "Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <ghtbsq$p90$1@fred.mathworks.com>...
> > ......
> > Here is my latest try, Jayveer. It differs from the most recent one in that ......
> > ......
>
> Jayveer, after a night's sleep I've made three changes to yesterday's code in the interests of possibly increased efficiency. A) I no longer reset bx, by, and bz to zeros

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us