Sequence Distance
3 views (last 30 days)
Show older comments
I am sort of confused on how matlab gets its answers for various distance methods. My boss wants to know how matlab arrives at various answers.
I set up matlab to give me answers in fractions, so when I analyze two sequences of the same length, it gives me the denominator of the fraction to be length of the sequences (for example, if both amino acid sequences have a length of 327 then the answer has a denominator of 327). I understood this until when I analyzed two amino acid sequences with each having a different length, one being 369 amino acids long, and another being 379 amino acids long. It gave me the answer: 209/398. I don't understand how it got to having a denominator of 398 (I specifically asked it to use p-distance). When I type in "help seqpdist", it does not give me very clear explanation on how the p-distance works.
So can some one please help me out? I would greatly appreciate it!
0 Comments
Answers (1)
Lucio Cetto
on 20 Jul 2011
When you are comparing sequences it is common to first align them using a dynamic programing algorithm. SEQPDIST uses NWALIGN to pair-wise align all possible pairs of sequences and then takes the measure from the alignment.
Consider:
seqpdist({'AACGT','AAGT','AAT'},'alpha','nt','square',1,'method','p-dist')
The alignment between 1 and 2 is 'AACGT' and 'AA-GT' =>1/5
The alignment between 1 and 3 is 'AACGT' and 'AA--T' =>2/5
The alignment between 1 and 3 is 'AAGT' and 'AA-T' =>1/4
HTH
0 Comments
See Also
Categories
Find more on Data Import and Export in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!