Winner MR Keenan (Santa Claus)
This contest is similar to the last one in that it deals with molecules, but the problem is more current: protein folding. Proteins, which are initially synthesized in the cell as long chains of amino acid building blocks, automatically fold down into more compact working shapes. Exactly how they do this, and how they do it quickly and consistently, is poorly understood.
The problem is so notoriously difficult that even dramatic simplifications are difficult to solve. One stripped-down version of the protein-folding problem is the so-called HP lattice model introduced by Lau and Dill in 1989 (See the references at the bottom of this page). Our contest is based on a version of this problem.
[0 1 1 1 1 0]where hydrophobic amino acids are shown in red and hydrophilic amino acids are shown in blue.
Think of the amino acids that make up the protein as links in a chain. Each link may bend by a multiple of 90 degrees, but the distance between links may not change (and is assumed to be 1). No self-intersections are permitted, and the final result is constrained to lie in a single plane. Thus you can think of the chain as a line snaking through a Manhattan-style grid, one amino acid at a time. This is sometimes referred to as a self-avoiding walk.
We will refer to the complex plane to simplify the configuration of any given chain. The folding sequence your function returns will define the turn taken at each link by means of the complex numbers 1 (east), i (north), -1 (west), and -i (south). If you are given the amino acid sequence a
>> a = [0 1 1 1 1 0];you might return the configuration sequence
>> c = [1 1 1 1 1];which defines the straight line configuration shown in the diagram above. The configuration sequence is necessarily one element shorter than the sequence itself, since each element defines the relationship between two amino acids. If we assume the first amino acid is always anchored at 0, then the position of each amino acid can be succintly calculated as
>> p = [0 cumsum(c)]
p =
0 1 2 3 4 5
So c can be interpreted as: "from your starting location, go east five consecutive times."
To introduce a small turn in the configuration, we can put a northward turn (given by i) in the fourth element of the configuration vector. We expect to see a picture like the one below.
And indeed, the following MATLAB code confirms our expectation.
>> c = [1 1 i 1 1];
>> p = [0 cumsum(c)]
p =
0 1 2 2+1i 3+1i 4+1i
>> plot(p,'.-')
>> axis([-2 6 -2 3])
Complex arithmetic makes the folding calculations straightforward. If we want to fold the entire right half of the original sequence up by 90 degrees, we can multiply the original c vector by a fold vector that bends it like so
>> c = [1 1 1 1 1];
>> f = [1 1 i i i];
>> c2 = c.*f
c2 =
1 1 i i i
>> p2 = [0 cumsum(c2)]
p2 =
0 1 2 2+1i 2+2i 2+3i
thereby making the protein shown below.
By applying successive folding vectors, your code should try to minimize the result.
Here is one final configuration vector that tries to minimize the fold energy by bringing the four red amino acids close together. We apply yet another 90 degree turn to the last two elements.
>> f2 = [1 1 1 i i];
>> c3 = c2.*f2
c3 =
1 1 i -1 -1
>> p3 = [0 cumsum(c3)]
p3 =
0 1 2 2+1i 1+1i 1i
>> p3 = p3(logical(a));
>> [x,y] = meshgrid(1:length(p3));
>> dist = abs(p3(x) - p3(y))
dist =
0 1.0000 1.4142 1.0000
1.0000 0 1.0000 1.4142
1.4142 1.0000 0 1.0000
1.0000 1.4142 1.0000 0
Here is the distance matrix. The result is the sum of all remaining distances: 1 + 1 + 1 + 1 + sqrt(2) + sqrt(2)
>> result = sum(sum(dist))/2
result =
6.8284
This result, along with the associated runtime, gets passed to our scoring algorithm before returning a final score, according to the equation
score = k1 * result + k2* e (k3*runtime)
We don't publish the values k1, k2, and k3.The files you need to get started on the contest are included in a ZIP-file available here at the MATLAB Central File Exchange. If you download and uncompress this zip-file, you will have the files described below.
The routine that does the algorithmic work is solver.m, and the answer it returns is very simple: a list of ones. This means, take the amino acid vector a and string it out eastward.
function c = solver(a) % Direction: 1 = east, i = north, -1 = west, -i = south c = [ones(length(a)-1,1)];
A few points to keep in mind about this function and your contest entries:
function c = solver(a)
To test this function with the test suite in the zip-file, run runcontest.m.
>> [output, message] = runcontest
The first column of results contains your result, and the second column gives you an estimate of the runtime. message returns a string describing the overall statistics of your submission.
It's sometimes useful to visualize what your entry is doing; you can do this by passing runcontest an input of 1.
>> [output, message] = runcontest(1)This provides a plot of the resulting "protein" as each test is completed.
We also encourage you to discuss your solutions and strategies with others. You can do this by posting to the comp.soft-sys.matlab thread that we've started from our newsreader.
The following are prohibited:
Contests are divided into segments where some or all of the scores and code may be hidden for some users. Here are the segments for this contest: