Optimize code: find longest common sequence of two strings

2 views (last 30 days)
Hi all! I have a problem: I create this code to compare strings in Trajectory.mat (attached) and find the longest common sequence. I have used the function at this link http://www.mathworks.com/matlabcentral/fileexchange/24559-longest-common-subsequence
% delete cell of Trajctory that are empty
empties = find(cellfun(@isempty,Trajectory));
Trajectory(empties) = [];
% Compare strings in Trajectory: find LCS (Longest Common Sequence)
[D, dist, aLongestString] = LCS(Trajectory{1,1},Trajectory{2,1});
% Count patterns in LCS
LCS=size(aLongestString,2);
% Count patterns in strings that I compare
Q=size(Trajectory{2,1},2);
P=size(Trajectory{1,1},2);
% Partecipation ratio of the common part to a pattern P
RatioQ = LCS./Q;
RatioP = LCS./P;
% MSTP-Similarity Equal Aveage
EA=(RatioP + RatioQ)/2;
% MSTP-Similarity Weighted Aveage
WA=(P*RatioP+Q*ratioQ)/(P+Q);
The code works right but I want to optimize that: I want to compare all the strings in Trajectory because with this code I have to write a lot of times the same code for every two strings. Can you give me some suggestion to optimize the code and compare all the strings in Trajectory? I try to use a for loop but with disastrous results.

Accepted Answer

Walter Roberson
Walter Roberson on 25 Nov 2015
One thing you should be doing is using
[uTrajectory, ~, uTrajidx] = unique(Trajectory);
Now uTrajectory will be only the unique strings and uTrajidx will be a list the same size as Trajectory that tells you which entry in uTrajectory that each entry in Trajectory ended up as. There is no point in re-doing the LCS calculation between strings that have been done before: do the unique ones and then copy the results if you need to.
After that:
nTraj = length(uTrajectory);
EA = zeros(nTraj, nTraj);
WA = zeros(nTraj, nTraj);
for pairidx1 = 1 : nTraj - 1
string1 = uTrajectory{pairidx1};
P = length(string1);
for pairidx2 = pairidx1 + 1 : nTraj
string2 = uTrajectory{pairidx2};
Q = length(string2);
[D, dist, aLongestString] = LCS(string1, string2);
nLCS = length(aLongestString); %do not call the result LCS, that is the function name!
RatioQ = nLCS/Q;
RatioP = nLCS/P;
% MSTP-Similarity Equal Average
this_EA = (RatioP + RatioQ)/2;
EA(pairidx1, pairidx2) = this_EA;
EA(pairidx2, pairidx1) = this_EA;
% MSTP-Similarity Weighted Average
this_WA = (P*RatioP+Q*ratioQ)/(P+Q);
WA(pairidx1, pairidx2) = this_WA;
WA(pairidx1, pairidx2) = this_WA;
end
Outputs are EA and WA, symmetric matrices of the ratios, one row (or column) for each unique string. You could expand that out to have duplicate entries or to be in the original order if you need to.

More Answers (0)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!