File Exchange

## Sequence alignment with arbitrary steps

version 1.0 (2.37 KB) by

Determine the best alignment, with allowed 'steps' <S>, between two sequences

Updated

Determine the best alignment, with allowed 'steps' <S>, between two sequences, and under a given similarity function.

In the special case where S=[1 0; 0 1; 1 1] (and for appropriate similarity function), the algorithm is identical to the classical Needleman-Wunsch sequence alignment procedure.

This can equivalently be used for computing a generalized edit distance between two sequences.

The implementation is based on:

Steffen Eger, Sequence alignment with arbitrary steps and further generalizations, with applications to alignments in linguistics. Information Sciences (2013), 237: 287--304.

B. John Oommen, String Alignment With Substitution, Insertion, Deletion, Squashing, and Expansion Operations. Information Sciences (1995), 83: 89--107.

iuvaris

### iuvaris (view profile)

In the below example, the alignment will be (formatting has been lost below)

x -- y

1 -- 1 3
2 -- 2
2 3 -- 3 2
3 -- 3
2 -- 2
1 -- 1
2 -- 2

iuvaris

### iuvaris (view profile)

Sample usage would be

x=[1,2,2,3,3,2,1,2];
y=[1,3,2,3,2,3,2,1,2];

steps = [1 1; % substitution/identity
1 2; % expansion
2 1; % squashing
2 2]; % e.g. transposition

[a,v]=seqAlign(x,y,steps,@mysimilarity);
% Output will be in this case
%a =
%
% 1 1 2 1 1 1 1
% 2 1 2 1 1 1 1
%
%v = 18
%
% Return value <a> corresponds to the
% alignment:
%
% 1 2 2 3 3 2 1 2
% 1 3 2 3 2 3 2 1 2
end

function sim=mysimilarity(sub1,sub2);
% just a very silly similarity function
% <sub1> and <sub2> are subsequences

sim=-2;
k1 = length(sub1);
k2 = length(sub2);

if isequal([k1 k2],[1 1])
if sub1==sub2 % 1-1 identity (match)
sim = 3;
end
end

if isequal([k1 k2],[2 2])
if sub1(1)==sub2(2) && sub1(2)==sub2(1) % transposition
sim = 2;
end
end

if isequal([k1 k2],[1 2]) || isequal([k1 k2],[2 1])
sim = sum(sub1==sub2); % in case of squashing or expansion, we let similarity be the number of matches

end

end

##### MATLAB Release
MATLAB 7.11 (R2010b)