MATLAB Answers

Maximum Edit Distance code

30 views (last 30 days)
abdulla elmourcy
abdulla elmourcy on 8 Aug 2018
Edited: Stephen Cobeldick on 13 Aug 2018
nwalign function calculates the minimum edit distance between two sequences to be like each other by calculating every possible alignment and choose the one with minimum modifications can I make matlab present these all possible alignment he did as I need to choose the one with maximum value


Sign in to comment.

Answers (1)

Stephen Cobeldick
Stephen Cobeldick on 8 Aug 2018
Edited: Stephen Cobeldick on 8 Aug 2018
The maximum edit distance between any two strings (even two identical ones) is infinity, unless you add some kind of restrictions on repetitions of edits. Even then you can create an arbitrarily large edit distance, with any arbitrarily large set character set. So unless you have some other restrictions the maximum edit distance is not likely to be very useful.
The minimum edit distance can easily be calculated using the Wagner–Fischer algorithm:
>> C = {'live','eve','believe','belive'};
>> cellfun(@(c)wfEdits(c,'beleive'),C)
ans =
3 4 2 1
Thus showing that 'belive' is the closest to 'beleive'. The function is:
function d = wfEdits(S1,S2)
% Wagner–Fischer algorithm to calculate the edit distance / Levenshtein distance.
N1 = 1+numel(S1);
N2 = 1+numel(S2);
D = zeros(N1,N2);
D(:,1) = 0:N1-1;
D(1,:) = 0:N2-1;
for r = 2:N1
for c = 2:N2
D(r,c) = min([D(r-1,c)+1, D(r,c-1)+1, D(r-1,c-1)+~strcmpi(S1(r-1),S2(c-1))]);
d = D(end);
If you are interested in the different possible direct edit paths (i.e. without repetition or unnecessary diversion to other characters) then you should investigate the Needleman–Wunsch algorithm.


abdulla elmourcy
abdulla elmourcy on 13 Aug 2018
Thanks a lot for the great answer, I just meant when using an edit distance algorithm like Needleman-wunsch`s, the algorithm compares two sequences and suggests many edits to make the 2 sequence much similar to each other and the best answer is that with the least edits. In my situation I need the opposite, I need to choose from these suggests the answer with the highest edit number. Is there any way to do that in Matlab? thanks a lot
abdulla elmourcy
abdulla elmourcy on 13 Aug 2018
In another way, the algorithm aligns the 2 sequences together and then suggests to insert, delete, mutate letters in certain positions to finally get the best solution with the least edit numbers, can I reverse these steps, as programming the algorithm when it should be insertion, do deletion instead, and when it should be deletion do insertion instead. could this be done by Matlab? sorry for being complicated but it is my situation
Stephen Cobeldick
Stephen Cobeldick on 13 Aug 2018
"In my situation I need the opposite, I need to choose from these suggests the answer with the highest edit number."
You need to be more specific than this, because currently this could be solved by several different interpretations. Are intermediate characters allowed (e.g. 'a' -> 'b' -> 'c')? Can edits be repeatedly applied (i.e. 'a' -> 'b' -> 'a' etc.)? Even without any intermediate characters and no edit repetitions, this still has a trivial solution:
  • delete all of the first string
  • add all of the second string
giving an edit distance of numel(str1)+numel(str2). If that is not what you want then you need to define your problem better.
Perhaps the Needleman–Wunsch algorithm might do what you want, once to you adjust the scoring values to suit your problem requirements.
Clearly you will have to do some experimentation yourself, e.g. trying different scoring schemes, and learn up about this algorithm and how it works. Before using nwalign (or perhaps something from FEX) you could try this on one of the several online Needleman–Wunsch tools, which graphically show the path (these are easy to find with an internet search). Keep experimenting!

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!