How to correct nt2aa to skip codons with gaps?

1 view (last 30 days)
Hello!
I am using MATLAB to analyze a large number of gene sequences in a .fasta file. Part of my analysis then requires the amino acid sequences coded by the genes. I am using the nt2aa function in MATLAB. However, at least one of the sequences has a gap in at least one of its codons (A-A). As such, I am receiving the following error:
"Error using nt2aa (line 116) The sequence includes a codon A-A containing a gap. Gaps are supported only when a complete codon is made up of gaps (---)."
Any suggestions as to how I may be able to get around this? I am very hesitant to start messing with MATLAB's nt2aa function.
Thank you in advance for all of your time and attention!
Best,
Kendall

Answers (1)

Cedric
Cedric on 19 Jan 2013
Edited: Cedric on 21 Jan 2013
I don't know nt2aa, but I just had a fast look. Do you want to:
  • Modify nt2aa so it eliminates codons with gaps? Not sure what the license says about it, but I guess that it could be done.
  • Find a specialist who could tell you how to do it correctly with the bioinformatics toolbox? In that case, you might want to check what folks from the newsgroup have to say I guess. It is certainly possible, maybe even with nt2aa as its seems that it has features for managing ambiguous sequences.
  • Build some solution by yourself to pre-process or post-process your codons/AA chains?
If you are game for the latter option, we can discuss some solution a bit in the style of this post.
For example, if you have your codons in a cell array like
NT = {'AAA','AAT','AAG','A-T','AGC','--G'} ;
you can easily find cells that contain a codon with one or more '-':
>> hasDash = cellfun(@(x)any(x=='-'), NT)
hasDash = 0 0 0 1 0 1
and remove these cells:
>> NTclean = NT ; % In case you want to keep
>> NTclean(hasDash) = [] % the original cell array.
NTclean = 'AAA' 'AAT' 'AAG' 'AGC'
Then you can feed nt2aa with the 'cleaned' version of NT:
>> AAclean = nt2aa(NTclean)
AAclean = 'K' 'N' 'K' 'S'
If you wanted to insert empty cells in AAclean afterwards at locations where there were codons with gaps (to have a record), you could do as follows:
>> buffer = 1:numel(NT) ;
>> validId = buffer(~hasDash) ;
>> AA = cell(1, numel(NT)) ;
>> AA(validId) = AAclean(:)
AA = 'K' 'N' 'K' [] 'S' []
Cheers,
Cedric

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!