Improving efficiency of a char array function
3 views (last 30 days)
Show older comments
I have built a function to be run 100000 times in a loop. Its input, "motifs", is a full 20x15 char array, containing only four types of characters, 'A', 'C', 'G', 'T'. I am wondering if my implementation, below, can be improved of if it is pretty much as fast as it gets:
% Find consensus string and score of a motifs array
function [score, count, consensus] = scoremotifs_2(motifs)
count = [ sum(motifs == 'A',1) ;
sum(motifs == 'C',1) ;
sum(motifs == 'G',1) ;
sum(motifs == 'T',1) ];
[count_max, consensus_num] = max(count);
consensus(consensus_num == 1) = 'A';
consensus(consensus_num == 2) = 'C';
consensus(consensus_num == 3) = 'G';
consensus(consensus_num == 4) = 'T';
score = sum(sum(motifs ~= consensus));
Accepted Answer
the cyclist
on 8 Jan 2017
Edited: the cyclist
on 8 Jan 2017
This is starting to get a little obfuscated, but it's significantly faster:
list ='ACGT';
count = sum(motifs == reshape(list,[1 1 4]));
[count_max, consensus_num] = max(count,[],3);
consensus = list(consensus_num);
score = sum(sum(motifs ~= consensus));
The key algorithmic difference here is that I am able to compare against all four elements 'ACGT' in parallel, by permuting them into a third dimension.
You need R2016b or later in order for MATLAB to make the implicit dimension expansion to compare the 20x15x1 array against the 1x1x4 array. If you have an older version, you will explicitly need to use repmat.
2 Comments
More Answers (1)
the cyclist
on 8 Jan 2017
It's somewhat faster to calculate consensus like this:
list = 'ACGT';
consensus = list(consensus_num);
0 Comments
See Also
Categories
Find more on Operators and Elementary Operations in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!