Optimize code: find longest common sequence of two strings

Question

ely may on 25 Nov 2015

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/257487-optimize-code-find-longest-common-sequence-of-two-strings

Commented: ely may on 26 Nov 2015

Trajectory.mat

Hi all! I have a problem: I create this code to compare strings in Trajectory.mat (attached) and find the longest common sequence. I have used the function at this link http://www.mathworks.com/matlabcentral/fileexchange/24559-longest-common-subsequence

% delete cell of Trajctory that are empty  
empties = find(cellfun(@isempty,Trajectory));
Trajectory(empties) = [];
% Compare strings in Trajectory: find LCS (Longest Common Sequence)
[D, dist, aLongestString] = LCS(Trajectory{1,1},Trajectory{2,1});
% Count patterns in LCS
LCS=size(aLongestString,2);
% Count patterns in strings that I compare
Q=size(Trajectory{2,1},2);
P=size(Trajectory{1,1},2);
% Partecipation ratio of the common part to a pattern P
RatioQ = LCS./Q;
RatioP = LCS./P;
% MSTP-Similarity Equal Aveage
EA=(RatioP + RatioQ)/2;
% MSTP-Similarity Weighted Aveage
WA=(P*RatioP+Q*ratioQ)/(P+Q);

The code works right but I want to optimize that: I want to compare all the strings in Trajectory because with this code I have to write a lot of times the same code for every two strings. Can you give me some suggestion to optimize the code and compare all the strings in Trajectory? I try to use a for loop but with disastrous results.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 25 Nov 2015

1
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/257487-optimize-code-find-longest-common-sequence-of-two-strings#answer_201197

Open in MATLAB Online

One thing you should be doing is using

[uTrajectory, ~, uTrajidx] = unique(Trajectory);

Now uTrajectory will be only the unique strings and uTrajidx will be a list the same size as Trajectory that tells you which entry in uTrajectory that each entry in Trajectory ended up as. There is no point in re-doing the LCS calculation between strings that have been done before: do the unique ones and then copy the results if you need to.

After that:

nTraj = length(uTrajectory);
EA = zeros(nTraj, nTraj);
WA = zeros(nTraj, nTraj);
for pairidx1 = 1 : nTraj - 1
    string1 = uTrajectory{pairidx1};
    P = length(string1);
    for pairidx2 = pairidx1 + 1 : nTraj
      string2 = uTrajectory{pairidx2};
      Q = length(string2);
      [D, dist, aLongestString] = LCS(string1, string2);
      nLCS = length(aLongestString);    %do not call the result LCS, that is the function name!
      RatioQ = nLCS/Q;
      RatioP = nLCS/P;
      % MSTP-Similarity Equal Average
      this_EA = (RatioP + RatioQ)/2;
      EA(pairidx1, pairidx2) = this_EA;
      EA(pairidx2, pairidx1) = this_EA;
      % MSTP-Similarity Weighted Average
      this_WA = (P*RatioP+Q*ratioQ)/(P+Q);
      WA(pairidx1, pairidx2) = this_WA;
      WA(pairidx1, pairidx2) = this_WA;
    end

Outputs are EA and WA, symmetric matrices of the ratios, one row (or column) for each unique string. You could expand that out to have duplicate entries or to be in the original order if you need to.

1 Comment
Show -1 older commentsHide -1 older comments

ely may on 26 Nov 2015

Very helpful!

Sign in to comment.

Optimize code: find longest common sequence of two strings

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Optimize code: find longest common sequence of two strings

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments