1 view (last 30 days)

I have sequences as character arrays. I need to search particular characters and change them with vectors(Boolean representations).

So finally i need 3 D matrix.

It worked for one sequences but i have 96000 more. I tried to do with loops but i get error.

Theese are my code for one sequences bu i need to do for 96000 sequences.

I need your help about that issue, Thanks in advance

p1_1=sequences;

% first sequence selected and converted to character array

Chp1_1=char(p1_1(1,:));

% from first character to end of sequences search for every character to replace boolean representation

SeqL = length(Chp1_1);

for i=1:SeqL

X = Chp1_1(1,i)

switch X

case 'A'

M(i,:) = A1;

case 'C'

M(i,:) = C1;

case 'D'

M(i,:) = D1;

case 'E'

M(i,:) = E1;

case 'F'

M(i,:) = F1;

case 'G'

M(i,:) = G1;

case 'H'

M(i,:) = H1;

case 'I'

M(i,:) = I1;

case 'K'

M(i,:) = K1;

case 'L'

M(i,:) = L1;

case 'M'

M(i,:) = M1;

case 'N'

M(i,:) = N1;

case 'P'

M(i,:) = P1;

case 'Q'

M(i,:) = Q1;

case 'R'

M(i,:) = R1;

case 'S'

M(i,:) = S1;

case 'T'

M(i,:) = T1;

case 'V'

M(i,:) = V1;

case 'W'

M(i,:) = W1;

case 'Y'

M(i,:) = Y1;

end

end

Guillaume
on 25 Nov 2019

First, probably the most important thing: numbered or sequentially named variables are always a very bad idea. they always make the code more complicated, not easier, to write. For example, with your protein_1, protein_2, ... protein_96000 you cannot easily apply the same code to each variable, whereas if you just had one variable, for example a cell array called protein, you could just use a loop to apply the same code to each:

for p = 1:numel(protein)

dosomethingwith(protein{p});

end

Same with your horrible switch...case and your A1, C1, etc. You end up rewriting many times the same thing with only one variation, with increased risk that you make a mistake on one line. Computers are very good at doing repetitive things, so why do you end up doing the repetition yourself.

Anything that is numbered or sequentially named should be just one variable that you index instead.

So, with regards to your transformation, first create two variables, the first one the list of letters to transform and the second one what they need to be transformed into, eg:

letters = 'ACDEFGHIKLMNPQSTVWY'.'; %column vector of letters

acid = [1 0 0 0 0;

0 1 0 0 0;

0 0 1 0 0;

0 0 0 1 0;

..etc.

];

For pretty display we could even put them into a table:

map = table(letters, acid);

Now that we have that transforming a sequence of letters into a 2D matrix is trivial:

prot = 'ACDKLMEGAC'; %content and length doesn't matter

[found, whichrow] = ismember(prot, map.letters); %find which row of letters correspond to each letter of prot

assert(all(found), 'some letters of the input are invalid');

transformed = map.acid(whichrow, :); %and use the correspond row of acid instead

%all done!

And assuming protein is the above mentioned cell array where all the sequences are the same length, then:

transformed = zeros(numel(protein{1}, size(map.acid, 2), numel(protein))); %preallocated 3D array

for p = 1:numel(protein)

[found, whichrow] = ismember(protein{p}, map.letters); %find which row of letters correspond to each letter of prot

assert(all(found), 'some letters of protein %d are invalid', p);

transformed(:, :, p) = map.acid(whichrow, :); %and use the correspond row of acid instead

end

See how short the code can be once you don't have numbered variables and use indexing instead?

Philippe Lebel
on 25 Nov 2019

I am not sure what you are trying to do as a whole, but if you want to quickly find where there are occurences of a certain string, use strfind().

a = 'aasdasffwfdasda';

your_sequence_of_bools_for_letter_a = [true false true];

idx = strfind(a,'a')

ans =

1 2 5 12 15

M=cell(1,length(a));

for i=1:length(idx)

M{idx(i)} = your_sequence_of_bools_for_letter_a;

end

Philippe Lebel
on 25 Nov 2019

Now i understand.

Here is a solution that you can easily expand.

clear

protein(1).name = 'A';

protain(1).bool_value = [1 0 0];

protein(2).name = 'B';

protain(2).bool_value = [0 1 0];

protein(3).name = 'C';

protain(3).bool_value = [0 0 1];

protein_name_list = [protein.name];

sequences = ['ABC';'CCC';'CAB'];

M=cell(1,length(sequences));

for i=1:length(sequences)

resulting_bool = [];

sequence = sequences(i,:);

for j = 1:length(sequence)

idx = strfind(protein_name_list, sequence(j));

resulting_bool = [resulting_bool ;protain(idx).bool_value];

end

M{i} = resulting_bool;

end

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 4 Comments

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/493024-how-can-i-change-an-indice-in-matrix-as-vector#comment_771187

⋮## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/493024-how-can-i-change-an-indice-in-matrix-as-vector#comment_771187

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/493024-how-can-i-change-an-indice-in-matrix-as-vector#comment_771266

⋮## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/493024-how-can-i-change-an-indice-in-matrix-as-vector#comment_771266

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/493024-how-can-i-change-an-indice-in-matrix-as-vector#comment_771498

⋮## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/493024-how-can-i-change-an-indice-in-matrix-as-vector#comment_771498

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/493024-how-can-i-change-an-indice-in-matrix-as-vector#comment_771507

⋮## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/493024-how-can-i-change-an-indice-in-matrix-as-vector#comment_771507

Sign in to comment.