How do I make a cell with the following contents?
1 view (last 30 days)
Show older comments
Mohannad Abboushi
on 15 Jan 2017
Commented: Guillaume
on 16 Jan 2017
I am making a program that basically takes a string s as a single strand of DNA and returns the amino acid sequence of the longest gene it finds. Whereby, a gene is defined as a reading frame that: starts with AUG codon, ends with one of UAA,UAG, or UGA codon.
I tried making a cell of different "frames" but since they are not the same length I can't put them into an array. How do i work around this? Here's my code:
function [ptn]=Seq_transcribe2(x)
y=seq_transcribe1(x);
frames={};
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end)
y(3:end)};
starts=[];
stops=[];
allorfs={};
for i=1:3:numel(frames)-2
codon= frames([i i+1 i+2])
if codon=='AUG'
starts(end+1)=codon;
if strcmp(codon,'UAA') || strcmp(codon,'UAG') || strcmp(codon,'UGA')
stops(end+1)=codon;
end
stops= find(stops>starts,1)
lengthofthisstart=stops-starts
allorfs{end+1}=frame(starts:stops-1)
2 Comments
Accepted Answer
Guillaume
on 15 Jan 2017
If I understood correctly, a simple way to find all genes would be:
[genesequences, starts, stops] = regexp(x, 'AUG.*?(UAA|UAG|UGA)', 'match', 'start', 'end');
And the longest sequence is of course:
[~, longestidx] = max(stops - starts);
longestsequence = genesequences{longestidx}
2 Comments
Arthur Goldsipe
on 16 Jan 2017
I think you need a slight change to account for the fact that all codons are 3 characters long:
[genesequences, starts, stops] = regexp(x, 'AUG(...)*?(UAA|UAG|UGA)', 'match', 'start', 'end');
Guillaume
on 16 Jan 2017
Oh yes, as I know nothing about genes and codons, I didn't know that the number of characters between the start and end codon must be a multiple of three, but I should have inferred that from the original code.
Thanks.
More Answers (1)
Niels
on 15 Jan 2017
if i understood you right your problem is in one of the following lines:
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end)
allorfs{end+1}=frame(starts:stops-1)
if so, i cant replicate your problem, in cell arrays the length of the elements is irrelevant
>> a=1:3;
>> b=1:4;
>> c=1:5;
>> cell={a b c}
cell =
1×3 cell array
[1×3 double] [1×4 double] [1×5 double]
%=================================
>> a={}
a =
0×0 empty cell array
>> a{end+1}=1
a =
cell
[1]
>> a{end+1}=2
a =
1×2 cell array
[1] [2]
>> a{end+1}=[2 1]
a =
1×3 cell array
[1] [2] [1×2 double]
2 Comments
Guillaume
on 15 Jan 2017
If the following line
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end)
y(3:end)};
is indeed written on two lines, then yes matlab is going to issue a concatenation error since the line return is interpreted as a vertical concatenation.
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end) y(3:end)};
or
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end) ...
y(3:end)};
would fix the error
See Also
Categories
Find more on Bioinformatics Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!