Be the first to rate this file! 9 Downloads (last 30 days) File Size: 2.37 KB File ID: #44617

Sequence alignment with arbitrary steps

by

 

Determine the best alignment, with allowed 'steps' <S>, between two sequences

| Watch this File

File Information
Description

Determine the best alignment, with allowed 'steps' <S>, between two sequences, and under a given similarity function.

In the special case where S=[1 0; 0 1; 1 1] (and for appropriate similarity function), the algorithm is identical to the classical Needleman-Wunsch sequence alignment procedure.

This can equivalently be used for computing a generalized edit distance between two sequences.

The implementation is based on:

Steffen Eger, Sequence alignment with arbitrary steps and further generalizations, with applications to alignments in linguistics. Information Sciences (2013), 237: 287--304.

See also:
B. John Oommen, String Alignment With Substitution, Insertion, Deletion, Squashing, and Expansion Operations. Information Sciences (1995), 83: 89--107.

Required Products MATLAB
MATLAB release MATLAB 7.11 (R2010b)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (2)
06 Dec 2013 iuvaris

In the below example, the alignment will be (formatting has been lost below)

x -- y

1 -- 1 3
2 -- 2
2 3 -- 3 2
3 -- 3
2 -- 2
1 -- 1
2 -- 2

06 Dec 2013 iuvaris

Sample usage would be

x=[1,2,2,3,3,2,1,2];
y=[1,3,2,3,2,3,2,1,2];

steps = [1 1; % substitution/identity
1 2; % expansion
2 1; % squashing
2 2]; % e.g. transposition

[a,v]=seqAlign(x,y,steps,@mysimilarity);
% Output will be in this case
%a =
%
% 1 1 2 1 1 1 1
% 2 1 2 1 1 1 1
%
%v = 18
%
% Return value <a> corresponds to the
% alignment:
%
% 1 2 2 3 3 2 1 2
% 1 3 2 3 2 3 2 1 2
end

function sim=mysimilarity(sub1,sub2);
% just a very silly similarity function
% <sub1> and <sub2> are subsequences

sim=-2;
k1 = length(sub1);
k2 = length(sub2);

if isequal([k1 k2],[1 1])
if sub1==sub2 % 1-1 identity (match)
sim = 3;
end
end

if isequal([k1 k2],[2 2])
if sub1(1)==sub2(2) && sub1(2)==sub2(1) % transposition
sim = 2;
end
end

if isequal([k1 k2],[1 2]) || isequal([k1 k2],[2 1])
sim = sum(sub1==sub2); % in case of squashing or expansion, we let similarity be the number of matches

end

end

Contact us