Sudoku: Contest EvolutionOver the course of eight days, hundreds of contestants submitted thousands entries. With so much activity, it is hard to follow the action. This report will pick out some statistics and draw some pictures that give some indication of the progress. ContentsSubmissions Over TimeIt's impressive to look at the sheer volume of entries. This area plot shows how the total number of entries grew as the contest progressed. The green area represents the entries that passed the test suite, and the red area shows those that failed. Overall 2439 of 3061 entries passed the test suite, for a pass rate of 80%. Activity by HourThe previous area plot shows a lot of "lumpiness". A steepening slope shows an increase in activity. A histogram shows this more directly. Each bar represents an hour's worth of entries. The contest had three major phases. The first day was in "darkness", where contestants could submit entries but they couldn't see any of the entries or their scores. To win this phase, an entry must be general and robust. The second day was "twilight", where we showed the scores but not yet the code. This allowed contestants to develop their algorithm without anyone else being able to leverage their ideas. On the histogram, the darkness and twilight phases are the two large blue boxes on the left. The boxes call out two other time periods: the Sunday Push (we offered a prize to the greatest cumulative improvement to the score in 24 hours) and the Leap (where we recognized the biggest single improvement). The vertical gray lines show the other deadlines: Past the Post, Midnight, and Final. Activity of the winnersLooking at the activity of the contest winners shows their different participation styles. Each winner is listed in order and a red line marks the deadline for the phase he won. Most active participantsThis bar plot shows the number of entries submitted by our most prolific authros. Participants per DayWe know that one participant may submit hundreds of entries. Let's look at the number of unique participants on each day of the contest. The first few days draw entries from the most participants. Perhaps this shows that more people prefer the Darkness and Twilight phases of the contest, or that the cost of entry seems higher once the top entry gets complicated. ScoreThis is the most useful diagram for visualizing the contest. It shows the dramatic improvements that occurred over time. Each passing entry is a dot, with its submission time on the x-axis and it's score on the y-axis. Since a lower score is better, the dots push down further as time goes on. All entries that took the lead are colored red and the red lines marks the best score at any time. The sample entry is the leftmost red dot and the winning entry is the last red dot in the lower right. Some of the leading entries are circled and labeled with the author's name. They show who was making the biggest improvements in score (represented in this plot as a vertical drop in the red line) at any point in the contest. The improvement in score happens over a huge dynamic range. Early in the contest, it is easy to make big improvements in the score. As the algorithms get better, improvements become increasingly difficult. To show this, we normalize the scores so the best (smallest) score is 1 and the worst score is some power of 10. Then we plot them on a logarithmic scale. This exaggerates the improvements at the end of the contest. By increasing the number of decades we spread the scores over, we increasingly exaggerate the smaller improvements made at the end of the contest. Percent improvementHere is a plot of the percent improvement generated by each new leader relative to the previous leader. This lets us see who is responsible for the biggest changes over the contest. The upper frontier of this plot is a sort of hall of fame, and someone whose name appears there more than once managed to make several significant improvements to the score. results vs. cpu timeOne of the interesting aspects of the contest is that entries needed to minimize two things at once. Getting the best possible answer must be weighed against the time an entry takes to run. As discussed above, the entry's score is a combination of these two factors. Plotting these two against each other yields a very different picture of the contest. The leader line is shown in red again in this picture. The gray contours show lines of constant score. In general, the best score is somewhere along the lower-left frontier of shortest time and lowest results. Big improvements tend to move down and to the right, and they are followed by tweaking battles in which the new algorithm claws its way back down the time axis. In order to get a feel for pace of the contest, let's label the entry that was in the lead at the end of each day. Notice the huge gains in score made in the first few days. Here we zoom into just the Daylight portion of the contest to show more detail. The following zig-zag pattern appeared on Saturday. It starts with a classic tweak battle on the upper right. All those overlapping names represent contestants making small speed improvements, but not changing the answer. Then, unusually, we see lots of latteral movement along the contour lines of equal score. It makes me think of switchbacks climbing down into a valley. At first this next picture looked like a mistake. The third of Guy's submissions to take the lead looks like it is behind first and second. When we add the contour lines of equal score, we can see that this one is actually barely down the slope. Improvements by dayHighlighting all the entries submitted on each day in red shows shows how the overall progression down and to the left. A black circle indicates the leader at the end of each day. Entry lengthHere we look at a plot that shows how many characters of code are in each of the leading entries. In regions where you see entries of more or less the same length there are very few differences from one entry to the next. In other places you can see the code getting shorter or longer. The density of the lines also shows how often the lead is changing. It's impressive to see that several times during the contest shorter code ousted longer code from the top spot, either by pruning unneeded computation or by introducing new algorithms. The red line at the top shows the cap on entry length. The winning entry grew moderately long by the end of the contest. In the past, however, we've even bumped into the limit. Contributions per dayThese plots and statistics show the total contribution of each contestant to each day's improvement in score. They show which contestants did the heavy lifting for each day. Wed, 02-Nov-2005 61.71% 2 wuzhili 32.81% 1 Cobus Potgieter 2.09% 2 Nicke 1.57% 1 quo 1.56% 2 Per Rutquist 0.26% 2 GreatRumpuscat Thu, 03-Nov-2005 94.75% 1 Per Rutquist 5.25% 1 Marko Stefanovic Fri, 04-Nov-2005 56.30% 7 Stijn Helsen 20.66% 3 Guy Incognito 18.88% 3 Marko Stefanovic 2.66% 1 the cyclist 1.50% 1 Timothy Alderson Sat, 05-Nov-2005 62.59% 1 Johan Svahn 31.61% 5 Stijn Helsen 5.03% 2 Timothy Alderson 0.50% 1 Niilo Sirola 0.28% 1 Francis Esmonde-White Sun, 06-Nov-2005 75.57% 7 Niilo Sirola 9.38% 1 Claus Still 5.26% 1 Tristrom Cooke 3.41% 8 Stijn Helsen 2.55% 4 jan langer 2.37% 5 Timothy Alderson 1.45% 4 Other Mon, 07-Nov-2005 51.29% 2 Guy Incognito 34.20% 11 Timothy Alderson 4.57% 1 Brian Jones 3.23% 1 Niilo Sirola 2.74% 1 Fyodor 2.19% 2 ismail meriç can uygan 1.79% 4 Other Tue, 08-Nov-2005 64.33% 3 Claus Still 12.44% 8 Stijn Helsen 10.86% 2 nathan q 4.86% 6 Guy Incognito 3.49% 4 Timothy Alderson 2.01% 1 Tristrom Cooke 2.02% 9 Other Wed, 09-Nov-2005 51.92% 3 Satya 13.83% 1 Yi Cao 13.23% 1 Niilo Sirola 6.01% 4 grn 4.71% 1 Anders Björling 4.64% 7 Amr Tawfik 5.67% 4 Other ClansSubmitting an entry by using the "edit" button on an existing entry marks the new entry as a child of the first. Tracing each entry back from parent to parent identifies its oldest ancestor. All the entries that have the same oldest ancestor are in the same "clan". This plot draws lines between each child and its parent and colors the six largest clans. Entries in the same clan that don't have a line between them are connected by an entry that didn't pass (so it doesn't have a score to plot). The orange clan in the center claims the entry in the upper right corner, far out of the lead, as its ultimate ancestor. See the initial spray down and to the right that eventually takes the lead? This is "The Bomb" series that Niilo Sirola dropped just after midnight at the start of the Big Sunday Push challenge. Differences from winnerOne difficulty of the clan plot is that it relies on authors to properly credit their submission. A break between clans may be spurious. Also, some entries may be based on more than one parent. Another way to see groups of entries is to look at the percent-difference between each entry and the contest winner. In this scatter plot, the color of the dot represents the percentage of code that is different from the winning entry and the size of the dot is proportional to the number of lines of code it contains. Tracking down authorsIt is very difficult to track down who contributed what code to the final entry. We're always looking for new ways to help us sift through the data. Here's an annotated listing of the winning entry. The last column is the line of code from the winning entry. The middle column shows the author and entry ID when a leading entry first used this line of code. The first column shows, if different, the first time this line of code was used in any entry. There's a lot of noise in the result, but blocks of names indicate probable original authorship for that section of the entry (or at least the last person to tweak it). ConclusionFor analysis, including a listing of all the leaders, the author's contributions to the score by day, and other metrics, see the i page. Thanks for participating, whether by entering the contest many times, once, or merely checking out the site every now and again. Be sure to sign up on our notify list so you'll be ready to play the next contest: Send an e-mail to lists@mathworks.com with "subscribe contest-announce" in the body. ColophonThis contest analysis was calculated and published entirely from MATLAB. We used the Database Toolbox to pull the information directly from the contest database. This HTML document was automatically generated from a MATLAB script using "Publish to HTML", a feature introduced in R14. |