Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Help to improve speed of program

Subject: Help to improve speed of program

From: Dips Bhatia

Date: 12 Nov, 2009 16:56:02

Message: 1 of 10

Hi,
I have written a small script to carry out a certain task. However though the script does the task it is very very slow and more than 4 hours to finish it. The problem description and the script as as below.

Problem: There is a large matrix (200000 rows) containing 3 main columns. ID, Startnode,endnode.Each ID has a different startnode and endnode and all ids are unique.
The matrix has to be sorted out so that the endnode of the previous row conincides with the startnode of the following row. The example is shown below:
              Id Start End
Row 1: 1000469 930 931
Row2 : 1765957 931 2080
..................and so on
The problem is that all ends may not have a start so the series may terminate and new series starts. At the end of the program all unique ids must be represented in the sorted matrix.

My Solution:

B(1,1:3)=A(1,:); %Creating a copy of the matrix using unique ids
A(1,4)=1; % This is a flag which shows where new series will start-
flag=1;
i=1;

while (all(ismember(A(:,1),B(:,1)))~=1) %to ensure that all ids are represented
       [C,IA,IB]=intersect(B(i,3),A(:,2));
       if isempty(C) %incase the end node has no corresponding start node
           flag=flag+1;
           index=find(~ismember(A(:,1),B(:,1)), 1, 'first');
           B(i+1,1:3)= A(index,:);
           B(i+1,4)=flag; %this is to start the new series with an unused id
       else
            B(i+1,1:3)=A(IB,:); %#ok<AGROW>
            B(i+1,4)=flag;
       end
       i=i+1;
end
 
I know its a bit much to ask but if anyone has the time to give some suggestions to speed up my code, it will help me not only to save time but also gain some matlab knowledge.
I am including some sample data.
Any help will really be highly welcome

Sample data: (already sorted) Can be mixed up by sorting by id or similar.

Id Start End Flag
1000469 930 0931 0
1765957 931 2080 1
1032650 2080 2081 1
1000566 1120 1121 2
1530723 1121 8799 2
1827876 8799 3345 2
1451943 3345 5798 2
1323822 5798 7821 2
1636548 7821 0575 2
1065754 0575 2303 2
1001667 3297 3298 3
1687175 3298 3929 3
1696979 3929 1252 3
1298407 1252 1253 3
1001932 3824 3825 4
1765370 3825 4080 4
1712338 4080 6713 4
1265276 6713 6714 4
1566546 6714 5103 4
1002172 4299 4300 5
1129698 4300 8931 5
1002356 4663 4664 6
1002410 4771 4772 7
1264944 4772 6362 7
1775290 6362 9383 7
1715202 9383 5699 7
1853406 5699 1994 7
1568874 1994 0793 7
1473095 0793 7575 7
1046851 7575 7576 7

Eagerly waiting for some help..

Subject: Help to improve speed of program

From: Miroslav Balda

Date: 12 Nov, 2009 17:32:03

Message: 2 of 10

"Dips Bhatia" <deepakdbhatia@gmail.com> wrote in message <hdhen2$bbd$1@fred.mathworks.com>...
SNIP

Hi,
I would recommend you to show very short example of your data and what you expect to get. Then, you may expect some response.
Mira

Subject: Help to improve speed of program

From: Dips Bhatia

Date: 12 Nov, 2009 18:03:19

Message: 3 of 10

"Miroslav Balda" <miroslav.nospam@balda.cz> wrote in message <hdhgqi$nqk$1@fred.mathworks.com>...
> "Dips Bhatia" <deepakdbhatia@gmail.com> wrote in message <hdhen2$bbd$1@fred.mathworks.com>...
> SNIP
>
> Hi,
> I would recommend you to show very short example of your data and what you expect to get. Then, you may expect some response.
> Mira

Hi Mira,
Thanks for your answer. The short sample of my data is shown at the beginning of my query. The longer sample was in case anyone wanted to try out the script I had written to suggest any improvement. However taking your suggestion in some time I shall upload a short sample of my raw data. Thanks a lot...

Subject: Help to improve speed of program

From: Dips Bhatia

Date: 12 Nov, 2009 22:32:03

Message: 4 of 10

"Dips Bhatia" <deepakdbhatia@gmail.com> wrote in message <hdhen2$bbd$1@fred.mathworks.com>...
> Hi,
> I have written a small script to carry out a certain task. However though the script does the task it is very very slow and more than 4 hours to finish it. The problem description and the script as as below.
>
> Problem: There is a large matrix (200000 rows) containing 3 main columns. ID, Startnode,endnode.Each ID has a different startnode and endnode and all ids are unique.
> The matrix has to be sorted out so that the endnode of the previous row conincides with the startnode of the following row. The example is shown below:
> Id Start End
> Row 1: 1000469 930 931
> Row2 : 1765957 931 2080
> ..................and so on
> The problem is that all ends may not have a start so the series may terminate and new series starts. At the end of the program all unique ids must be represented in the sorted matrix.
>
> My Solution:
>
> B(1,1:3)=A(1,:); %Creating a copy of the matrix using unique ids
> A(1,4)=1; % This is a flag which shows where new series will start-
> flag=1;
> i=1;
>
> while (all(ismember(A(:,1),B(:,1)))~=1) %to ensure that all ids are represented
> [C,IA,IB]=intersect(B(i,3),A(:,2));
> if isempty(C) %incase the end node has no corresponding start node
> flag=flag+1;
> index=find(~ismember(A(:,1),B(:,1)), 1, 'first');
> B(i+1,1:3)= A(index,:);
> B(i+1,4)=flag; %this is to start the new series with an unused id
> else
> B(i+1,1:3)=A(IB,:); %#ok<AGROW>
> B(i+1,4)=flag;
> end
> i=i+1;
> end
>
> I know its a bit much to ask but if anyone has the time to give some suggestions to speed up my code, it will help me not only to save time but also gain some matlab knowledge.
> I am including some sample data.
> Any help will really be highly welcome
>
> Sample data: (already sorted) Can be mixed up by sorting by id or similar.
>
> Id Start End Flag
> 1000469 930 0931 0
> 1765957 931 2080 1
> 1032650 2080 2081 1
> 1000566 1120 1121 2
> 1530723 1121 8799 2
> 1827876 8799 3345 2
> 1451943 3345 5798 2
> 1323822 5798 7821 2
> 1636548 7821 0575 2
> 1065754 0575 2303 2
> 1001667 3297 3298 3
> 1687175 3298 3929 3
> 1696979 3929 1252 3
> 1298407 1252 1253 3
> 1001932 3824 3825 4
> 1765370 3825 4080 4
> 1712338 4080 6713 4
> 1265276 6713 6714 4
> 1566546 6714 5103 4
> 1002172 4299 4300 5
> 1129698 4300 8931 5
> 1002356 4663 4664 6
> 1002410 4771 4772 7
> 1264944 4772 6362 7
> 1775290 6362 9383 7
> 1715202 9383 5699 7
> 1853406 5699 1994 7
> 1568874 1994 0793 7
> 1473095 0793 7575 7
> 1046851 7575 7576 7
>
> Eagerly waiting for some help..

Hi Friends,
I am back with my Problem. As suggested I am simplifying the sample data hoping forsome solution. I have a mixed matrix as below containing Id, Start-ref and end-ref.
The matrix has to be sorted so that the start ref of each row has the end ref from the row above it. I have written a program but I have 200000 rows and it takes more than 4 hours with my script. Please could someone suggest an easier method. My script is included above.
 Id StartRef EndRef
469 930 931
566 120 121
650 080 081
822 798 821
943 345 798
723 121 799
957 931 080
876 799 345


Required Solution.

Id Start-Ref End-Ref
469 930 931
957 931 080
650 080 081
566 120 121
723 121 799
876 799 345
943 345 798
822 798 821

I am really in need of help with this one.

Subject: Help to improve speed of program

From: Matt

Date: 12 Nov, 2009 22:37:02

Message: 5 of 10

I propose the following. Basically, I pre-allocated B and got rid of all calls to ismember (lots of overhead from that).

It will not give the same sorting as your original code (the solution is not unique), but it satisfies the rules you gave.




A(1,4)=1; % This is a flag which shows where new series will start-

flag=1;


B=zeros(size(A)); %PREALLOCATE
B(1,1:4)=A(1,:); %Creating a copy of the matrix using unique ids

for i=1:size(A,1) %to ensure that all ids are represented

       IB=find(B(i,3)==A(:,2),1);
       if isempty(IB) %incase the end node has no corresponding start node
           flag=flag+1;
           index=find(~isinf(A(:,1)),1);
           B(i+1,1:4)= A(index,:);
            A(index,:)=inf;
       else
              B(i+1,1:4)=A(IB,:); %#ok<AGROW>
                A(IB,:)=inf;
       end

      B(i+1,4)=flag;
      
end

Subject: Help to improve speed of program

From: Matt

Date: 12 Nov, 2009 22:55:05

Message: 6 of 10

Sorry. A few minor fixes below. Anyway, I test it on a worst case data set
A=rand(200000,3)
and it runs in under 8 min. A lot better than 4 hours!!


A(1,4)=1; % This is a flag which shows where new series will start-

flag=1;


B=zeros(size(A)); %PREALLOCATE
B(1,1:4)=A(1,:); %Creating a copy of the matrix using unique ids
A(1,:)=inf;


for i=1:size(A,1)-1 %to ensure that all ids are represented

       IB=find(B(i,3)==A(:,2),1);
       if isempty(IB) %incase the end node has no corresponding start node
           flag=flag+1;
           index=find(~isinf(A(:,1)),1);
           B(i+1,1:4)= A(index,:);
            A(index,:)=inf;
       else
              B(i+1,1:4)=A(IB,:); %#ok<AGROW>
                A(IB,:)=inf;
       end

      B(i+1,4)=flag;
      
end

Subject: Help to improve speed of program

From: Dips Bhatia

Date: 13 Nov, 2009 01:09:02

Message: 7 of 10

"Matt " <xys@whatever.com> wrote in message <hdi3o9$6hm$1@fred.mathworks.com>...
> Sorry. A few minor fixes below. Anyway, I test it on a worst case data set
> A=rand(200000,3)
> and it runs in under 8 min. A lot better than 4 hours!!
>
>
> A(1,4)=1; % This is a flag which shows where new series will start-
>
> flag=1;
>
>
> B=zeros(size(A)); %PREALLOCATE
> B(1,1:4)=A(1,:); %Creating a copy of the matrix using unique ids
> A(1,:)=inf;
>
>
> for i=1:size(A,1)-1 %to ensure that all ids are represented
>
> IB=find(B(i,3)==A(:,2),1);
> if isempty(IB) %incase the end node has no corresponding start node
> flag=flag+1;
> index=find(~isinf(A(:,1)),1);
> B(i+1,1:4)= A(index,:);
> A(index,:)=inf;
> else
> B(i+1,1:4)=A(IB,:); %#ok<AGROW>
> A(IB,:)=inf;
> end
>
> B(i+1,4)=flag;
>
> end

Hi Matt,
Thanks a million for taking out the time and revamping my code. I certainly did pick up a few pointers and of course the code is running hundreds of times faster now....

Subject: Help to improve speed of program

From: Dips Bhatia

Date: 13 Nov, 2009 22:50:18

Message: 8 of 10

"Matt " <xys@whatever.com> wrote in message <hdi3o9$6hm$1@fred.mathworks.com>...
> Sorry. A few minor fixes below. Anyway, I test it on a worst case data set
> A=rand(200000,3)
> and it runs in under 8 min. A lot better than 4 hours!!
>
>
> A(1,4)=1; % This is a flag which shows where new series will start-
>
> flag=1;
>
>
> B=zeros(size(A)); %PREALLOCATE
> B(1,1:4)=A(1,:); %Creating a copy of the matrix using unique ids
> A(1,:)=inf;
>
>
> for i=1:size(A,1)-1 %to ensure that all ids are represented
>
> IB=find(B(i,3)==A(:,2),1);
> if isempty(IB) %incase the end node has no corresponding start node
> flag=flag+1;
> index=find(~isinf(A(:,1)),1);
> B(i+1,1:4)= A(index,:);
> A(index,:)=inf;
> else
> B(i+1,1:4)=A(IB,:); %#ok<AGROW>
> A(IB,:)=inf;
> end
>
> B(i+1,4)=flag;
>
> end

Hi Matt,
I have one more question. At the end of the program there will be a huge matrix with flag values from 1 to maybe 10000
eg. id start end flag
     11 23 34 1
     .. ... .... 1
     22 54 55 2
     .. ... ... 2
How can I perform certain mathematical operations on the entities having the same flag.
Basically I want to separate all the rows having the same flags..
I just want to know which type of data storage should I use eg. cell array, structure etc..
What I did was..
for i=1:max(B(:,4))
route{i}=B((B(:,4)==i),1:3); %#ok<AGROW>
end
and how can I preallocate? or can you suggest some other way to process the information.

Subject: Help to improve speed of program

From: Matt

Date: 13 Nov, 2009 23:05:19

Message: 9 of 10

"Dips Bhatia" <deepakdbhatia@gmail.com> wrote in message <hdknra$h54$1@fred.mathworks.com>...

> How can I perform certain mathematical operations on the entities having the same flag.
> Basically I want to separate all the rows having the same flags..
> I just want to know which type of data storage should I use eg. cell array, structure etc..
> What I did was..
> for i=1:max(B(:,4))
> route{i}=B((B(:,4)==i),1:3); %#ok<AGROW>
> end
> and how can I preallocate? or can you suggest some other way to process the information.

That's pretty much what I would do. You can pre-allocate the pointers used by route via

route=cell(1,max(B(:,4)));

Subject: Help to improve speed of program

From: Dips Bhatia

Date: 13 Nov, 2009 23:17:04

Message: 10 of 10

"Matt " <xys@whatever.com> wrote in message <hdkonf$al2$1@fred.mathworks.com>...
> "Dips Bhatia" <deepakdbhatia@gmail.com> wrote in message <hdknra$h54$1@fred.mathworks.com>...
>
> > How can I perform certain mathematical operations on the entities having the same flag.
> > Basically I want to separate all the rows having the same flags..
> > I just want to know which type of data storage should I use eg. cell array, structure etc..
> > What I did was..
> > for i=1:max(B(:,4))
> > route{i}=B((B(:,4)==i),1:3); %#ok<AGROW>
> > end
> > and how can I preallocate? or can you suggest some other way to process the information.
>
> That's pretty much what I would do. You can pre-allocate the pointers used by route via
>
> route=cell(1,max(B(:,4)));

Hi Matt,
Thanks a lot.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us