I have a loop that creates random sequences of strings. I need to save this into a matrix so that I can make sure that none of the lines repeat. How can I take the sequences I make in this loop and save it in a matrix?

2 views (last 30 days)
I eventually need to make 1,000,000 unique sequences using these 2 letters (representing amino acids). Right now, I am only making 100 so my code can run faster. I can make all of these sequences and display them in the loop, but what I want to do is take these sequences and put them in a matrix so that I can use the unique function to compare all the rows in the matrix and ensure that these sequences are in fact unique and that there are no repeats. How can I take the random sequences I create and put them in a matrix fromt he loop?
N = 50; %length of sequence
Num_Sequences = 100; %number of sequences to analyze later
q = 1; %counter for for loop
Num_N = 40; %number of Asparagines in the sequence
Num_V = 10; %number of Valines in the sequence
pf = zeros(1,N); %pf displays the location that valines were placed in sequence
aacids = []; %aacids is the string that contains the sequence
A = [];
while q <= Num_Sequences
aacids = 'NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN';
for i = 1:Num_V
m = randi(N);
while ismember(m,pf) %this while loop is to check if the location of aacids has already
m = randi(N); %been selected and then picks another random number
end
aacids(m) = "V";
pf(i) = m;
end
q = q+1;
end
disp(aacids);
%unique(aacids)
>> Random_Sequences
NNNVNNNNNNNVNNNNNVNNNVNVVVVNNNNNVNNNNNNNNVNNNNNNNN
  2 Comments
Image Analyst
Image Analyst on 29 Dec 2018
Edited: Image Analyst on 29 Dec 2018
What is a repeat and what is not a repeat? Obviously in your output above, the sequence "NVN" is repeated several times. So is the sequence "NNNN", etc.
Katy Anderson
Katy Anderson on 29 Dec 2018
The repeat would look like this
NNNVVVNNVNV...
NNNVVVNNVNV...
where 2 lines are repeating. The pattern of the output is supposed to be random and there does not need to be any pattern for each line. The reason I put the output of the code was to show that my output is only one line, when I want it to be several lines long.

Sign in to comment.

Accepted Answer

Rik
Rik on 30 Dec 2018
Edited: Rik on 30 Dec 2018
You can use something like the function below.
function seq=Random_Sequences(sequence_length,Num_sequences,Num_N)
%each row of the ouput matrix is a unique sequence
%check if it is even possible to have a unique output
if nchoosek(sequence_length,Num_N)<Num_sequences
error('unique output not possible')
end
if sequence_length<Num_N
error('input error')
end
%pre-allocate to cell (easier for ismember)
seq=repmat('N',1,sequence_length);
seq=repmat({seq},Num_sequences,1);
Num_V=sequence_length-Num_N;
seq{1}(randperm(sequence_length,Num_V))='V';
for n=2:Num_sequences
%set the requested number of positions to a V
seq{n}(randperm(sequence_length,Num_V))='V';
while ismember(seq(n),seq(1:(n-1)))
%repeat until unique (guaranteed to happen)
seq{n}(randperm(sequence_length,Num_V))='V';
end
end
seq=cell2mat(seq);%convert back to single array
end

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!