Contents

Data Visualization Contest: Some of My Plots

load contest_data

% First load all the fields into corresponding variables.

score=[d.score];
timestamp=[d.timestamp];
passed=[d.passed];
title=[d.title];
author=[d.author];
result=[d.result];
cpu_time=[d.cpu_time];
id=[d.id];
parent=[d.parent];
darkness=[d.darkness];
twilight=[d.twilight];
daylight=[d.daylight];
nodeCount=[d.nodeCount];
mlintMsgCount=[d.mlintMsgCount];
maxComplexity=[d.maxComplexity];
%lines=[d.lines];
charCount=[d.charCount];

Strategy Evolution Waterfall !!

The following 3D plot visualizes the strategy. In this plot, the score, result and CPU time are plotted along X-, Y- and Z-axis respectively.

We can see a heavy parabolic cluster at the front of the graph. This shows that one of the final strategy to decrease the score was to settle at some result, then decreasing the cpu time (by tweaking other entries ;)-). As we know, this happened during the end phase of the contest. If things were ideally general, one would expect a diagonal cluster, i.e. connecting the two ends of the parabola.

figure('Position',[50 100 756 588])
plot3(score,result,cpu_time,'.');grid on;xlabel('Score'),ylabel('Result'),zlabel('CPU time (s)'),view([-57,28]);

Submission Distribution

Let's see how many submissions were there per day. First day is tough, so there is minimal number of entry, daylight raises interest, hence maximal entry, twilight as well as end phase of contest show similar amount of entries. However, number of twilight entries could be resuting from different authors posting little number of entries, while during end phase, same authors submitted large number of entries. So, the ratio of number of entries to the number of authors could be very interesting.

md=datevec(timestamp);

entry=[];
for i=9:16
    entry(end+1)=numel(find(md(:,3)==i)>0);
end
contestdate=9:16;
figure('Position',[50 100 800 800])
subplot(221),plot(contestdate,entry,'*-'),axis([9 16 min(entry)-10 max(entry)+10]),ylabel('Total number of entries'),xlabel('Day of May-2007');
subplot(222),plot(contestdate,cumsum(entry),'*-'),axis([9 16 0 sum(entry)+50]),ylabel('Cumulative number of entries'),xlabel('Day of May-2007');
subplot(223),pie(entry,{'May 9','May 10','May 11','May 12','May 13','May 14','May 15','May 16'});
subplot(224),pie(entry);

Code Complexity

Judging the complexity of code development is pretty interesting. The following plot shows the variance of the cyclomatic complexity for all the entries submitted per day. As expected, codes of different complexity were submitted during the darkness and twilight, hence the variances are high for those phases. On the other hand, during daylight onwards till the end phase, code development concentrated on some particular good entries from the previous phases. Hence, the complexity settle down to a steady state.

A comment: if at later stage, somehow the contest testsuite can be modified so that again entries of different complexities need to be developed, things would be very interesting.

ce=cumsum(entry);
index=[[1,ce(1:end-1)+1]; ce]';

var_cmplx=zeros(1,8); %for eight days
for i=1:8
    var_cmplx(i)=var(maxComplexity(index(i,1):index(i,2)));
end
clf
plot(contestdate,var_cmplx,'*-'),ylabel('Variance of cyclomatic complexity'),xlabel('Day of May-2007'),title('Code complexity');