An open exchange for the MATLAB and Simulink user community |
Hosted by The MathWorks |
RulesThe ChallengeYou are working on the human genome project, and your team is closing in on the last few base pairs needed to complete the project and go on to certain glory and perhaps a Nobel prize. Researchers on your team have managed to obtain several sets of segments needed to sequence the last remaining gene, but your help is urgently needed to lead the effort. For each set of segments given, you must devise an algorithm to sequence them in the correct order. If you succeed, your name will appear in all biology textbooks from this day forward. If you are too slow, someone else will take your place. Good luck.ProblemYour job is to write a MATLAB M-file function which can reconstruct the gene sequence using the given fragmentary segments of the larger gene. Note: While this problem is posed in the spirit of a real-life challenge, the problem has been modified to suit a MATLAB contest rather than test your knowledge of gene sequencing techniques. Make sure to look over the rules below!Gene SegmentsThe gene segments given to you have been chemically snipped from a single gene and sequenced individually. Your job is to reassemble them jigsaw-style into the most likely single sequence they could have come from. The gene segments will be presented to your function as a char (or string) matrix. For example, an input could look like:>> Segments ans = TCGG GCAG ATCG CAGC GATCOnly four letters will be used. Each stands for one of the nitrogenous bases in DNA: A (adenine), C (cytosine), G (guanine), and T (thymine). The segments in any single test given to your function have the same length (since they're passed as a char matrix). But that length, as well as the number of segments, will vary across the many tests included in our test suite - in other words, your function will be given inputs of different, unknown sizes. In this case, we have 5 segments of length 4. Another test case might provide 20 segments of length 12 (To provide a sense of scale, the entire human genome is approximately three billion base pairs long). Your function should return the shortest gene that can be made from these pieces. The rules for assembling a sequence are:
TCGG GCAG ATCG CAGC GATC ==============. CAGCAGATCGG
TCGG GCAG ATCG CAGC GATC ==============. GATCGGCAGC
TCGG GCAG ATCG CAGC GATC ====================. TCGGGCAGATCGCAGCGATC
Keep in mind that length isn't the only criteria. Your
entry will be judged also on its speed. So while a shorter result is better,
entries that take a long time will also be penalized.
Function headerIn MATLAB syntax, the function header for your solution should look like this:function Sequence = mygene(Segments)Your function should return a 1 row char array representing its solution. Developing and TestingTo help in developing your sequencing program, we've created a file you can use for running trials of your entry. Just click on the link below to download it:runtrials.m:
This function allows you to run your own trials of the contest.
Download the files above and place it on your MATLAB path, then write your contest entry. Here we're assuming you're calling it testsoln. Now you can test your entry with: % Run 10 trials using testsoln.m
Result = runtrials('testsoln',10);
Result will be a matrix with your score and runtimes. The
example here uses a fixed input size to your entry, so you may want to
experiment with different size inputs to see how your code performs.
The contest test suite contains a fixed (nonrandom) set of inputs of various
sizes.
Judging CriteriaThere are two criteria for the judging of this contest: length, the length of the sequence
that you produced (shorter is better)
time, the execution time of your algorithm (faster is better)score = (alpha * length) + (beta * time) where alpha and beta are constant scaling factors. The winning entry is the one with the lowest average score over the test suite. The MATLAB Contest server will also place a cap on the maximum time allowed for each entry over the test suite. Entries which exceed 180 seconds for the test suite will time out and fail. The test suite will be made available after the contest has been completed.
You might also want to look at the Contest FAQ for some commonly asked
questions which you might have. If your questions aren't answered there,
you can e-mail contest@mathworks.com
and we'll get back to you as soon as possible.
Optimizing your codeTry using the MATLAB profile command to help you optimize your code. For more information on the MATLAB profiler, type helpwin profile at the command line or doc profile to read the online documentation. The profiler has been enhanced significantly in MATLAB 5.3 (R11).DeadlineThe contest will close on Friday, March 24, 2000 at 5 PM EDT (21:00 GMT).At that time, the contest queue will be closed to new entries. All remaining entries in the queue will continue to be processed, after which we will declare a winner. Collaboration and editing existing entriesOnce an entry has been submitted, it cannot be changed. However, any entry can be viewed, edited, and resubmitted as a new entry. You are free to view and modify any entries in the queue. The contest server maintains a history for each modified entry. If your modification of an existing entry improves its score, then you are the "author" for the purpose of determining the winners of this contest. We encourage you to examine and optimize existing entries.We also encourage you to discuss your solutions and strategies with others. Consider using the Usenet newsgroup comp.soft-sys.matlab to share ideas. The PrizesIn addition to the grand prize, interim prizes will be awarded at various times throughout the period of the contest to the final author of the current highest ranking entry. Watch the Message of the Day for details and deadlines.The author of the final winning entry in the contest will receive their choice of:
Fine PrintThe allowable functions are those contained in the basic MATLAB package available in $MATLAB/toolbox/matlab, where $MATLAB is the root MATLAB directory. Functions from other toolboxes will not be available. Entries will be tested against MATLAB version 5.3.1 (R11.1).The following are prohibited: MEX-files
eval, feval, etc.
Shell escape such as !, dos, unix
Handle Graphics commands
ActiveX commands
File I/O commands
Debugging commands Printing commands Simulink commands Benchmark commands such as tic, toc, and flops FAQCheck our FAQ for answers to frequently asked questions about the contest.DisclaimerThis contest is not supported or endorsed by the Genome Data Base in any way. Further, we do not claim that this contest represents a real problem faced by the Genome Data Base. For more information on the GDB, visit the Genome Data Base Page. |
|
|||||||
| Related Topics |
| New Products | Support | Documentation | Training | Webinars | Careers | Newsletters | RSS |
| Problems? Suggestions? Contact us at contest@mathworks.com | © 1994-${current_year} The MathWorks, Inc. Trademarks Privacy Policy |