MATLAB Examples

Contest Statistics

Contents

Prepare the Data

contestName = 'jumping';
[s,p,leaders] = prepareData;
orders = 3;

Score

This is the most useful diagram for visualizing the contest. It shows the dramatic improvements that occur over time. Each passing entry is a dot, with its submission time on the x-axis and it''s score on the y-axis. Since a lower score is better, the dots push down further as time goes on. All entries that took the lead are colored red and the red lines marks the best score at any time. The sample entry is the leftmost red dot and the leader is the last red dot in the lower right.

Some of the leading entries are circled and labeled with the author''s name. They show who was making the biggest improvements in score (represented in this plot as a vertical drop in the red line) at any point in the contest.

The improvement in score happens over a huge dynamic range. Early in the contest, it is easy to make big improvements in the score. As the algorithms get better, improvements become increasingly difficult. To show this, we normalize the scores so the best (smallest) score is 1 and the worst score is some power of 10. Then we plot them on a logarithmic scale. This exaggerates the improvements at the end of the contest. By increasing the number of decades we spread the scores over, we increasingly exaggerate the smaller improvements made at the end of the contest.

clf
scoreStair(p,orders)

results vs. cpu time

One of the interesting aspects of the contest is that entries needed to minimize two things at once. Getting the best possible answer must be weighed against the time an entry takes to run. The entry''s score is a combination of these two factors. Plotting these two against each other yields a very different picture of the contest.

The leader line is shown in red again in this picture. The gray contours show lines of constant score. In general, the best score is somewhere along the lower-left frontier of shortest time and lowest results. Algorithmic improvements tend to move down and to the right, and they are followed by tweaking battles in which the new algorithm claws its way back down the time axis.

clf
zigzag(p,orders)

Submissions Over Time

This area plot shows how the total number of entries grows. The green area represents the entries that passed the test suite, and the red area shows those that failed.' ...

clf
submissionsOverTime(s)

Activity by Hour

Each bar represents an hour''s worth of entries. The contest has three major phases. The first day is in "darkness", where contestants can submit entries but they can''t see any of the entries or their scores. To win this phase, an entry must be general and robust. The second day is "twilight", where we show the scores but not yet the code. This allows contestants to develop their algorithm without anyone else being able to leverage their ideas.

On the histogram, the darkness and twilight phases are the two boxes on the left. The other boxes and vertical gray lines call out other mid-contest challenges.

clf
activityByHour(s,contestName)

Participants per Day

We know that one participant may submit hundreds of entries. Let''s look at the number of unique participants on each day of the contest.

clf
participantsByDay(s)

Most active participants

This bar plot shows the number of entries submitted by our most prolific authors.

clf
mostActive(s)

Entry length

This plot shows how many characters of code are in each of the leading entries. In regions where you see entries of more or less the same length there are very few differences from one entry to the next. In other places you can see the code getting shorter or longer. The density of the lines also shows how often the lead is changing. It''s most impressive when shorter code takes the top spot, either by pruning unneeded computation or by introducing new algorithms. The red line at the top shows the cap on entry length.

clf
entryLength(p)

Percent improvement

This is a plot of the percent improvement generated by each new leader relative to the previous leader. This lets us see who is responsible for the biggest changes over the contest. The upper frontier of this plot is a sort of hall of fame, and someone whose name appears there more than once managed to make several significant improvements to the score.

clf
percentImprovement(leaders)

Improvements by day

Highlighting all the entries submitted on each day in red shows shows how the overall progression down and to the left. A black circle indicates the leader at the end of each day.

clf
dailyActivity(p)

Clans

Submitting an entry by using the "edit" button on an existing entry marks the new entry as a child of the first. Tracing each entry back from parent to parent identifies its oldest ancestor. All the entries that have the same oldest ancestor are in the same "clan". This plot draws lines between each child and its parent and colors the six largest clans. Entries in the same clan that don''t have a line between them are connected by an entry that didn''t pass (so it doesn''t have a score to plot).

clf
parents(s,p,orders)
legend off