What format does the MSA data need to be in order to calculate pair-wise distances with seqpdist?

3 views (last 30 days)
I am reading a clustalw text format msa with multialignread. I have tried splitting the msaread data into two cells, and keeping the structure in tact, neither method has been successful. Seqpdist does accept the sequence cell output of fastaread fasta text file of the same sequences. %This works... [heads,seqs]=fastaread('fastaformat.fasta'); distancematrix=seqpdist(seqs,'method',pam(250),'squareform',1); %This does not... [heads,seqs]=multialignread('clustalwmsa.aln1'); distancematrix=seqpdist(seqs,'method',pam(250),'squareform',1);
This is the error message:
??? Error using ==> cell.strmatch at 21
Requires character array or cell array of strings as inputs.
Error in ==> seqpdist at 258
distMethod = strmatch(lower(pval),distMethods);
Error in ==> cscalc at 14
dmat=seqpdist(seqs,'method',pam(250),'squareform',1);

Accepted Answer

Walter Roberson
Walter Roberson on 11 Jun 2011
What you pass for 'method' must be a string.
The reference to pam appears to be something appropriate for a 'ScoringMatrix' parameter and the parameter you would pass for that would be the string 'pam250'
  1 Comment
Adam Quintero
Adam Quintero on 11 Jun 2011
that is absolutely correct. thank you, i am getting the semantics of this function wrong. this set seqpdist to find the pairwise distance matrix of the MSA using pam250 units.
thank you

Sign in to comment.

More Answers (1)

Adam Quintero
Adam Quintero on 11 Jun 2011
To calculate a scoring matrix from a MSA based on pam250 scoring, the input needs to be a cell array of the sequence strings. Multialignread formats the sequences from a clustalw msa file (*.aln1) into a sructure with headers and sequences, or separate cell arrays with each.
The reason seqpdist could not read the sequences is because of an incorrect use of its arguments. The 'method' argument is only used if the input sequences are not already aligned. By using the sequence data from multialignread, and trying to align it again with 'method' caused the error.
The correct argument to use in this case is 'scoringmethod', where the pre-aligned sequences are re-scored using the 'scoringmethod' value.
pamdistancematrix=seqpdist(sequence,...
'scoringmethod',pam250,'squareform',1)
  2 Comments
Adam Quintero
Adam Quintero on 11 Jun 2011
Wow, sorry. That was NOT the correct answer. Walter Roberson is correct in that I should use 'scoringmatrix' instead of 'method', so that the input is handled as MSA sequences and not raw FASTA.
My apologies, Robert.

Sign in to comment.

Categories

Find more on Genomics and Next Generation Sequencing in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!