Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Vactorizating code / For loops

Subject: Vactorizating code / For loops

From: Marin Troselj

Date: 17 Dec, 2008 22:01:07

Message: 1 of 6

Hi there, I am new at using Matlab and I need some help!

Is there a way to vectorizate this code:
k=1;
for i=1:3841
for j=k:3841
distance(i,j)=strdist(strings{j,1}, string{j+1,1});
end
k=k+1;
end

I have putted different strings into cell array <strings> and i want to compute Levenshtein distance betwen every string in cell array <strings> getting as result matrix 3841x3841 with zeroes under main diagonal.

Thanks!
Marin

Subject: Vactorizating code / For loops

From: someone

Date: 17 Dec, 2008 22:38:03

Message: 2 of 6

"Marin Troselj" <mtroselj.einstein@gmail.com> wrote in message <gibsr3$r3o$1@fred.mathworks.com>...
> Hi there, I am new at using Matlab and I need some help!
>
> Is there a way to vectorizate this code:
> k=1;
> for i=1:3841
> for j=k:3841
> distance(i,j)=strdist(strings{j,1}, string{j+1,1});
> end
> k=k+1;
> end
>
> I have putted different strings into cell array <strings> and i want to compute Levenshtein distance betwen every string in cell array <strings> getting as result matrix 3841x3841 with zeroes under main diagonal.
>
> Thanks!
> Marin

% Without knowing what strdist is (toolbox?) or what it returns,
% I can only offer the following modifications:

distance = zeros(3841);
for i=1:3841
for j=i:3841 % replace k with i
distance(i,j)=strdist(strings{j,1}, strings{j+1,1});
end
end

% Notes:
% 1. pre-allocating distance should give a big performance boost
% 2. I assume string{j+1,1} should be strings{j+1,1}
% 3. make sure length(strings{:,1}) > 3841 or strings{j+1,1} will give you an error

Subject: Vactorizating code / For loops

From: Roger Stafford

Date: 18 Dec, 2008 00:14:03

Message: 3 of 6

"Marin Troselj" <mtroselj.einstein@gmail.com> wrote in message <gibsr3$r3o$1@fred.mathworks.com>...
> Hi there, I am new at using Matlab and I need some help!
>
> Is there a way to vectorizate this code:
> k=1;
> for i=1:3841
> for j=k:3841
> distance(i,j)=strdist(strings{j,1}, string{j+1,1});
> end
> k=k+1;
> end
>
> I have putted different strings into cell array <strings> and i want to compute Levenshtein distance betwen every string in cell array <strings> getting as result matrix 3841x3841 with zeroes under main diagonal.
>
> Thanks!
> Marin

  Assuming there is no error in your expression

 distance(i,j)=strdist(strings{j,1},string{j+1,1})

then you are computing the right side a very large number of times for each of the pairs j and j+1, a huge waste of computing power. Just compute it once for each pair and then extend copies in the rows of 'distance'.

 s = zeros(1,3841);
 for j = 1:3841
  s(j) = strdist(strings{j,1},string{j+1,1});
 end
 d = zeros(3841);
 for i = 1:3841
  d(i,i:3841) = s(i:3841);
 end

('d' is your 'distance' array.)

Roger Stafford

Subject: Vactorizating code / For loops

From: Roger Stafford

Date: 18 Dec, 2008 01:09:03

Message: 4 of 6

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gic4kb$jj5$1@fred.mathworks.com>...
> ........
> s = zeros(1,3841);
> for j = 1:3841
> s(j) = strdist(strings{j,1},string{j+1,1});
> end
> d = zeros(3841);
> for i = 1:3841
> d(i,i:3841) = s(i:3841);
> end
> ..........

  It occurs to me I didn't actually vectorize your code as you requested. I think you have no hope of vectorizing the calculation involving 'strdist'. However the second for-loop could be eliminated using the 'hankel' function:

 n = 3841;
 s = zeros(1,n);
 for j = 1:n
  s(j) = strdist(strings{j,1},string{j+1,1}); % <-- strings{j+1,1} ?
 end
 d = hankel([s,0],zeros(1,n));
 d = reshape(d(1:n^2),n,n).';

Possibly you would be better off with the previous for-loop, however.

Roger Stafford

Subject: Vactorizating code / For loops

From: Marin Troselj

Date: 18 Dec, 2008 18:40:34

Message: 5 of 6

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gic7rf$48g$1@fred.mathworks.com>...
> "Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gic4kb$jj5$1@fred.mathworks.com>...
> > ........
> > s = zeros(1,3841);
> > for j = 1:3841
> > s(j) = strdist(strings{j,1},string{j+1,1});
> > end
> > d = zeros(3841);
> > for i = 1:3841
> > d(i,i:3841) = s(i:3841);
> > end
> > ..........
>
> It occurs to me I didn't actually vectorize your code as you requested. I think you have no hope of vectorizing the calculation involving 'strdist'. However the second for-loop could be eliminated using the 'hankel' function:
>
> n = 3841;
> s = zeros(1,n);
> for j = 1:n
> s(j) = strdist(strings{j,1},string{j+1,1}); % <-- strings{j+1,1} ?
> end
> d = hankel([s,0],zeros(1,n));
> d = reshape(d(1:n^2),n,n).';
>
> Possibly you would be better off with the previous for-loop, however.
>
> Roger Stafford

I made a mistake in algorithm that I posted.
It should go like this:

for i = 1 : 3841
for j = i : 3841
distance( i , j )=strdist(strings{ i ,1 }, strings{ j+1 ,1 });
end
end

Example:
Let's say that we have 4 strings instead of 3842 and let's say that
a{1,1}='abcd'
a{2,1}='aabb'
a{3,1}='aaaa'
a{4,1}='abcd'

What I want to do is to compute Levenshtein distance from all strings getting a matrix as result wich should be like:
strdist(a{1,1},a{2,1}) | strdist(a{1,1},a{3,1}) | strdist(a{1,1},a{4,1})
             0 | strdist(a{2,1},a{3,1}) | strdist(a{2,1},a{4,1})
             0 | 0 | strdist(a{3,1},a{4,1})

or for this example I should get:
 3 3 0
 0 2 3
 0 0 0

It is obvious problem of time needed to execute computation if we have 3842 strings. For executing inner loop takes around 30 secodns wich means it would take around 37 hours to complete computation. So I was wondering if it is possible to vectorizate the loops and lower time needed. Perhaps there is some other way to make it faster?

Thank you for your time!

Marin Troselj
 

Subject: Vactorizating code / For loops

From: Roger Stafford

Date: 19 Dec, 2008 02:46:45

Message: 6 of 6

"Marin Troselj" <mtroselj.einstein@gmail.com> wrote in message <gie5f2$816$1@fred.mathworks.com>...
> .......
> I made a mistake in algorithm that I posted.
> .......

  I had a feeling your code had an error in it, Marin!

  In its corrected form I think you have no hope of vectorizing it in any significant way. As has been suggested, you should of course preallocate the 'distance' array in advance to its required size. However, you are finding the Levenshtein distance between every possible pairing of 3,842 strings, which amounts to a grand total of 7,378,561 pairings. Each string pairing requires an order m*n algorithm where m and n are the lengths of the two strings, and would therefore need a substantial amount of execution time for reasonably sizable strings. I would hazard the guess that the vast majority of those 37 hours would be used in executing 'strdist' itself and the remaining minuscule portion in carrying out everything else. It matters little how you organize things in loops.

  Your time would be better spent trying to optimize the 'strdist' algorithm itself. The wikipedia article at

 http://en.wikipedia.org/wiki/Levenshtein_distance

has a discussion of possible improvements to the algorithm as presented there.

  By the way, how do you arrive at a "distance" of zero between a{3,1} and a{4,1}?

Roger Stafford

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us