Contents
Data visualization contest : Been there done that
It seemed to me from past contests that most novel code development occurred during the darkness and twilight phases of the contest. During daylight relatively few strategies were explored and then tweaked to death. I wondered to what extent having code in the open influenced others and how much novel code arose during daylight. I thought it might be interesting to try and visualize these trends.
This entry takes a similar approach to quantifying originality of the code as the Creativity entry by Rafal Kasztelanic. However instead of trying to identifying the most original authors the aim here is to depict the temporal progression of code originality.
load contest_data n = length([d.id]); t=[d.timestamp]; nlines=length(allLineList); % cleanup author names cleannames = regexprep(lower({d.author}), '_|\.|&| ', '')'; % author labels lbls = strcat(cleannames,textwrap({sprintf(':%.5d',[d.id])}, 6)); % contest phases [v,ind]=sort(find([d.twilight])); twilight=v(ind([1;end])); [v,ind]=sort(find([d.daylight])); daylight=v(ind([1;end]));
Line history
Examine all unique lines of code and locate the earliest timestamp when a particular line of code was first observed [linefirst].
Admittedly this is a simplistic approach to a rather difficult problem since it does not distinguish between trivial and large changes in code. Nor does it attempt to compare code for functional similarity. Nonetheless some interesting patterns rise from the data.
linefirst=ones(nlines,1)*max(t); for ii=1:n; linefirst(d(ii).lines)=min(linefirst(d(ii).lines), ii); end ut0 = t(min(linefirst)); utmax = t(max(linefirst)); % discretize the vertical axis to build a display matrix minutes_per_row=5; nut=round((utmax-ut0)*24*(60/minutes_per_row)); bins=linspace(ut0,utmax, nut); board=zeros(nut, n, 'uint8'); for ii=1:n idx=hist(t(linefirst(d(ii).lines)),bins)>0; board(idx,ii)=1; end
Progression of code novelty
This figure depicts all the lines of code written over the duration of the contest. Each pixel represents a line of code in a given entry. The timestamp of each entry is represented on the x-axis. A vertical row of pixels represents the linefirst timestamps of all lines for a given submission.
Horizontal streaks indicate that particular code segments were retained across multiple entries. This is quite noticeable in the daylight phase. There are two rather conspicuous breaks in the continuity of the streaks. One just after the twilight phase in 3/11, when the code became viewable for the first time. The other during the 1000 character challenge on 3/15 where the code needed to be refactored significantly.
The envelope of the plot indicates the rate at which new lines are being added. There is the initial jump at the start of the contest after which it plateaus for a while. Things pick up again during the last three days of the contest, with regular bumps occurring in the early hours of the morning prior to the submission deadline for the day. This is followed by a plateau where the tweakers take over.
figure; patch([twilight(1),twilight(1),twilight(2),twilight(2)],[1,nut,nut,1],[0.75,0.75,0.75]); hold on spy(board) axis xy axis square xt=floor(linspace(1,n,7)); yt=floor(linspace(1,nut,7)); set(gca,'tickdir','out','ytick',yt,'yticklabel',datestr(bins(yt), 'mm/dd')); set(gca,'xtick',xt,'xticklabel',datestr(t(xt),'mm/dd')); title('Progression of code novelty', 'fontsize', 12, 'fontweight', 'bold') xlabel('submission time', 'fontweight', 'bold') ylabel('linefirst timestamp', 'fontweight', 'bold') text(mean(twilight)-250, nut-200, 'Twilight', 'fontweight', 'bold') text(mean(daylight)-250, nut-200, 'Daylight', 'fontweight', 'bold')
Novelty of entries
The novelty of a submission is the fraction of lines that are original (i.e. their linefirst timestamp matches the entries' timestamp)
entrylen=cellfun(@length, {d.lines});
entrynovelty=zeros(n,1);
for ii=1:n
entrynovelty(ii) = nnz(t(linefirst(d(ii).lines))==d(ii).timestamp)/entrylen(ii);
end
% discard some bad apples
entrynovelty(isnan(entrynovelty))=0;
A measure of novelty
A plot of the novelty measure. As one would expect the entries in the twilight phase are novel. However there are bursts of novel entries during daylight as well. Most notably during the 1k challenge.
fig=figure; % keep all passing entries with code length of atleast 20 lines keep=(entrylen>19 & [d.passed])'; %keep=true(n,1); y=entrynovelty.*keep; patch(([twilight(1),twilight(1),twilight(2),twilight(2)]),[0,1,1,0],[0.75,0.75,0.75]); hold on plot(y); xt=floor(linspace(1,n,7)); set(gca,'xtick',xt,'xticklabel',datestr(t(xt),'mm/dd')); xlim(gca, ([twilight(1), daylight(2)])) text(mean(twilight)-250, 0.9, 'Twilight', 'fontweight', 'bold') text(mean(daylight)-250, 0.9, 'Daylight', 'fontweight', 'bold') xlabel ('submission time', 'fontweight', 'bold') ylabel ('novelty measure', 'fontweight', 'bold') title('Novelty of entries', 'fontweight', 'bold', 'fontsize', 12) % add a custom datacursor ud.author=lbls; set(fig, 'userdata',ud); h = datacursormode(fig); set(h,'UpdateFcn',@figupdatefcn,'SnapToDataVertex','on'); datacursormode on
