Rules

The Data Visualization Contest is the 19th MATLAB Online Programming Contest.

The Problem: Visualizing the Contest

This contest, an open-ended data visualization challenge, is very different from our previous contests. We want you to visualize patterns of participation in the Peg Solitaire contest from May of 2007. We will provide the data. Since the goal is open-ended, a computer is not qualified to do the judging. Instead, you all will be the judges.

Here's how it's going to work. Instead of relying on a special contest page, you'll just submit your entry to the File Exchange. Give it the tag "vis2009" so that we'll know it's intended for the contest. You can enter as many times as you like. Your entry must contain two things:

Please do not include the dataset itself. We don't want to end up with hundreds of copies of the exact same dataset. The thing you will be judging is the HTML document resulting from the published M-file. You can make the document long or short. You can use a lot of text or no text at all. It's up to the group to decide what format is best. We will provide some examples of code and HTML files to work from.

Note: If you miss the old contest format, don't worry. We are overhauling our contest machinery and plan to bring it back in the Fall.

How It Works

You will be given a dataset that contains two variables.
  • d - This structure is an array that contains detailed information about every entry to the contest. Its fields are enumerated below.
  • allLineList - Contains every line written by anyone during the entire contest. Each entry indexes back into this list so you can reconstruct individual entries.
The structure d contains the following fields.
  • id - Unique identifier for each entry.
  • title - The title of the entry.
  • author - Who wrote it.
  • timestamp - The time in MATLAB format at which the entry was submitted. Use datestr to get the readable time format.
  • passed - 1 if the entry passed. 0 if it failed.
  • result - As explained in the scoring section of the Peg Solitaire rules, the result is the average score across all game boards. It is the part of the score that depends only on the quality of the answer provided by the algorithm.
  • cpu_time - How long it took to run.
  • score - The score is a function primarily of the cpu_time and the result but also includes a penalty for cyclomatic complexity. See the scoring section of the Peg Solitaire rules
  • parent - This is a pointer back to a previously existing entry that this entry was cloned from. This field may be empty.
  • darkness - 1 if submission occurred during darkness, 0 otherwise.
  • twilight - 1 if submission occurred during twilight, 0 otherwise.
  • daylight - 1 if submission occurred during daylight, 0 otherwise.
  • maxComplexity - The maximum cyclomatic complexity of any function in the entry. To see the cyclomatic complexity numbers for a file, use M-Lint with the -cyc switch. mlint('foo.m','-cyc')
  • charCount - The number of characters in the entry.
  • nodeCount - The number of nodes in the parse tree. This is a measure of how much code there is. It's calculated like this: t = mtree('foo.m','-file'); length(t.nodesize). Note that nodeCount did not figure into the scoring for the Peg Solitaire contest.
  • mlintMsgCount - A measure of the number of M-Lint messages generated by this entry. It's calculated like this: m = mlint('foo.m'); length(m). Note that mlintMsgCount did not figure into the scoring for the Peg Solitaire contest.
  • lines - This list indexes into the allLineList variable supplied in contest_data.mat. The number of items in the list equals the number of lines in the file.

The Details

  • To enter the contest, submit your file to the File Exchange and give it the tag vis2009.
  • When you submit your code, you are giving it to the group. As with the regular contest, everyone is encouraged to share, borrow, and build upon the code of every other entry.
  • The winner is the entry with the most "vis2009" tags applied as of 12 PM EDT on April 10, 2009. Ties will be broken by time priority: whoever submitted their entry first wins.
  • All the code necessary to regenerate the published HTML must be present in the entry.
  • Do not include the data file in your entry. One of the first lines of your entry will probably be load contest_data.
  • Except for minor changes, please don't update an entry once it has been submitted.

Judging and Voting

Judging criteria are up to you. Things you might consider are:
  • Novel insight
  • Clarity of explanation
  • Robust, well-documented code
  • Beautiful graphics

We encourage you to discuss your criteria and strategies with others. You can do this by posting to the newsgroup thread that we've started from our newsreader or in the comments on the blog.

Anyone with a MathWorks Account can vote. Vote by applying the "vis2009" tag to any entry that you like. You can vote for as many entries as you like, but you may not vote for any entry more than once.

Examples

As an example, this code
load contest_data
semilogy([d.timestamp],[d.score],'.')
datetick
results in the following plot.

If we put this into cell-mode format, the whole entry might look like this.
%% Data Visualization Contest
% This is a very simple entry: a plot of the scores of incoming entries
% versus when they first appeared.

load contest_data
t = [d.timestamp];
s = [d.score];
semilogy(t, s, '.') 
datetick

%%
% If we normalize the score to make the lowest answer equal to one, it's
% easier to see what's going on.

minScore = min(s);
normScore = s-minScore+1;
semilogy(t, normScore, '.') 
datetick
ylabel('Normalized Score')

%%
% Find the leaders

best = d(1).score;
bestIndexList = 1;
for n = 1:length(d),
    if d(n).score < best,
        best = d(n).score;
        bestIndexList=[bestIndexList n];
    end
end

%%
% Now plot them in red on top of the previous plot

hold on
plot(t(bestIndexList),normScore(bestIndexList),'r.-')
hold off

When published, the entry looks like this.

Themes and Ideas

Here are some ideas for what you might want to show in your entries.

  • Which entries were especially influential?
  • Which lines appeared in the most entries?
  • What characteristics are common to leading entries?
  • How do novel lines of code reach the leading entries?
  • How did one person's code change?

Developing Your Entry

Find the data you need in this File Exchange submission.

Entry requirements:
  • ZIP file containing a single published M-file and any supporting files.
  • Do not include the data set.
  • With the addition of the data, it must contain everything to run.
  • It may use any MathWorks product.
  • Tag it with vis2009
  • Set "inspired by" to the data set file id or the IDs of any submissions that inspired you.

Schedule

All times are Eastern Daylight Time.
  • Wednesday, April 1st, 12 PM. Data and full rules released.
  • April 8th, 12 PM. Deadline for submissions.
  • April 10st, 12 PM. Voting complete. Winners announced.
All times are Eastern Daylight Time, US.