Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

New to MATLAB?

How to correct nt2aa to skip codons with gaps?

Asked by Kendall

Kendall (view profile)

on 19 Jan 2013

Hello!

I am using MATLAB to analyze a large number of gene sequences in a .fasta file. Part of my analysis then requires the amino acid sequences coded by the genes. I am using the nt2aa function in MATLAB. However, at least one of the sequences has a gap in at least one of its codons (A-A). As such, I am receiving the following error:

"Error using nt2aa (line 116) The sequence includes a codon A-A containing a gap. Gaps are supported only when a complete codon is made up of gaps (---)."

Any suggestions as to how I may be able to get around this? I am very hesitant to start messing with MATLAB's nt2aa function.

Thank you in advance for all of your time and attention!

Best,

Kendall

0 Comments

Kendall

Kendall (view profile)

Products

No products are associated with this question.

1 Answer

Answer by Cedric Wannaz

Cedric Wannaz (view profile)

on 19 Jan 2013
Edited by Cedric Wannaz

Cedric Wannaz (view profile)

on 21 Jan 2013

I don't know nt2aa, but I just had a fast look. Do you want to:

  • Modify nt2aa so it eliminates codons with gaps? Not sure what the license says about it, but I guess that it could be done.
  • Find a specialist who could tell you how to do it correctly with the bioinformatics toolbox? In that case, you might want to check what folks from the newsgroup have to say I guess. It is certainly possible, maybe even with nt2aa as its seems that it has features for managing ambiguous sequences.
  • Build some solution by yourself to pre-process or post-process your codons/AA chains?

If you are game for the latter option, we can discuss some solution a bit in the style of this post.

For example, if you have your codons in a cell array like

 NT = {'AAA','AAT','AAG','A-T','AGC','--G'} ;

you can easily find cells that contain a codon with one or more '-':

 >> hasDash = cellfun(@(x)any(x=='-'), NT)
 hasDash =  0     0     0     1     0     1

and remove these cells:

 >> NTclean = NT ;                               % In case you want to keep 
 >> NTclean(hasDash) = []                        % the original cell array.
 NTclean = 'AAA'    'AAT'    'AAG'    'AGC'

Then you can feed nt2aa with the 'cleaned' version of NT:

 >> AAclean = nt2aa(NTclean)
 AAclean = 'K'    'N'    'K'    'S'

If you wanted to insert empty cells in AAclean afterwards at locations where there were codons with gaps (to have a record), you could do as follows:

 >> buffer = 1:numel(NT) ;
 >> validId = buffer(~hasDash) ;
 >> AA = cell(1, numel(NT)) ;
 >> AA(validId) = AAclean(:)
 AA = 'K'    'N'    'K'    []    'S'    []

Cheers,

Cedric

0 Comments

Cedric Wannaz

Cedric Wannaz (view profile)

Contact us