MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn moreOpportunities for recent engineering grads.

Apply TodayThe Melopsittacus undulates genome, Parrot Budgerigar, was successfully sequenced in July 2012 using long 3rd Gen sequences provided by PacBio. The Assemblathon Genome Contest led the team of Phillippy, Koren and Jarvis to successfully Sequence Parrot DNA using the PacBio 3rd Generation data and Illumina 2nd Gen data.

The 3rd gen PacBio data is very long, 1K-20K, but has 15% error rate. The Illumina data is 100-500 long with <1% error rate. Jarvis and his team combined this data to achieve < 0.1% error rate.

Genome Challenge 004 is the correction of simplified PacBio simulated reads with high error rate.

**Input:**

Call 1: empty array, segment Width, Flag=0

Call 2: N PacBio DNA vectors (N x width), Segment Width, Flag=1

**Output:**

Call 1: empty vector, Number of Requested Vectors

Call 2: Corrected DNA vector, Number of Requested Vectors

**Score:** Number of N vectors used to produce correct vector for w=1024 case

The first call to the PacBio_fix routine returns the number of vectors requested to produce a final product. This may be a function of w.

The second call to PacBio_fix will have a DNA matix (N x width) and flag=1.

The response to the second call is the fixed DNA sequence, vector of width w.

**example:**
First call return : N=3

01230123111122223333 Truth Input example 01232123112122221332 Injected errors 01130123111122123323 11230133121122223333

Output: 01230123111122223333 Truth, hopefully

This data is simplified by only having simple substitutions and the data sets are provided pre-aligned.

The real PacBio data is quite a bit more complicated. Values may be added, deleted, substituted, and are of varying lengths. This causes alignment issues.

Follow-Up Challenges: Sample Data from the PacBio site for Lambda Phage will be molded into various Challenges. Possible challenges are correcting individual long segments and assembling multiple long segments into the full Lambda Phage genome. The Parrot genome is too big for Cody to solve in 50 seconds.

4 correct solutions
14 incorrect solutions

Last solution submitted on Nov 08, 2013