The Challenge is to Rapidly find matches of DNA sequences, Length=6, in a 1,800,000 long DNA file.
At IMACST the paper An Intelligent and Efficient Matching Algorithm to Finding a DNA Pattern claimed an astounding time improvement from 9.94 seconds to 7.84 seconds, 21% time reduction, to match six segments of length 6 in a 1.8M long DNA file. Basic probability asserts 1.8M/4^6 * 6 = 2637 matches. The paper's test case produced 2346 matches. The method employed used text processing in C++. The paper's L=25 and L=50 cases will be later challenges.
Matlab can achieve matching a six pattern set of L=6 in <15 msec (i5/16GB). This is merely a 99.8% time reduction.
Challenge Description: DNA is made of letters ACGT, wiki DNA, which for the purposes of this Matlab Cody Challenge are given values 0 thru 3. (ACGT= 0123)
Input: [DNA, DNA_ID, Patterns]
Locations of all start indices that match any of the patterns
Scoring: Average Time (msec) for a block of L=6 patterns
Coming soon: Genome DNA sequencing of PhagePhix174 and Haempphilus Influenza
MATLAB and Simulink resources for Arduino, LEGO, and Raspberry PiLearn more
Opportunities for recent engineering grads.Apply Today
New to MATLAB?Learn MATLAB today!
Play games and win prizes!Learn more