Meet the family
This entry is an effort to construct a family tree of entries using some code analysis as well as the "based on" field. The results let us visualise some of the interesting exchanges in the soliatire contest.
Contents
Housekeeping
Organise the data into some useful arrays (this takes about an hour to run the first time - might as well enjoy the absence of CPU time penalties).
prepData;
Identifying influences without the "based on" label
The code below looks for likely parents of entries that are not "based on" declared parents, based on the fraction of lines in common. It turns out that there are 130 entries with no parents. There are 126 unique author names in the contest - so not surprisingly, most contestants don't code from scratch more than once.
parent2 = findParent(parent,nshare,nULines); children2 = findChildren(parent2);
Lay out the family tree
We'll plot entries in 2D space with time on the horizontal axis. The y axis has no meaning - drawtree4 attempts to assign y values that result in a nicely laid out tree.
y=drawtree4(parent,parent2,children,children2,t,nDesc); ylead = y(bestIndexList);
Draw the family tree
Every entry is a pale grey dot, large coloured dots for contest leaders. A line coloured by author links every entry to its parent (for reasons of space, only contest leaders are identified in the legend). Influences declared by the author using the "based on" field get a solid line. Likely influences are shown with a dotted line.
cmap=colormap(jet(max(authorid))); dots1 = line(t,y,'marker','.','color',[0.8 0.8 0.8],'markerfacecolor',[0.8 0.8 0.8],... 'marker','o','markersize',4,'linestyle','none'); plotColorAuthor1; for i=1:numel(bestIndexList); hleader(i)=line(tlead(i),ylead(i),'color',cmap(authorid(bestIndexList(i)),:),... 'markerfacecolor',cmap(authorid(bestIndexList(i)),:),'marker','o'); end [a,iunique] = unique(authorid(bestIndexList)); hleg=legend(hleader(iunique),uAuthors(authorid(bestIndexList(iunique))),'fontsize',6); set(hleg,'position',[0.77 0.25 0.1862 0.6246]) axis ([0 7.999 0 360]); xlabel('time (days)'); set(gca,'ytick',[]);
Changes of leader
Let's look more closely at some action. In this region of the tree on the last day of competition, the lead changes several times and each new leader prompts a spate of derivatives.
axis([5.96 6.54 116 196])
Below the radar
This window on the contest shows frantic work by David Jones (blue) starting from entry 2601 by Steve Hoelzer, with the cyclist (red), SY (yellow), Jon Davidson (cyan) and others picking up these entries. No leading entries are involved - the contestants are working on code down the leaderboard.
axis([5.1 5.5 50 110]);
Hybrids
The contest doesn't provide a formal mechanism for authors to declare where an entry is a hybrid based on multiple parents, even if they want to. The function findMultiParent analyses the code to look for likely multiple influences. If an entry uses lines that last appeared in code other than its declared or likely parents, it may have been influenced by more than one previous entry. Some generic lines like while 1 are ignored in the calculation. This is not a foolproof algorithm - it's likely that other simple lines were created independently more than once. However, it should give at least a relative measure of the degree of hybridisation. By this definition, 37% of entries draw on more than one source, and 8% use more than two. The average number of parents per entry is 1.45.
parent2 = findParent(parent,nshare,nULines); children2 = findChildren(parent2); [multiParents,nParents] = findMultiParent(parent,parent2,A);
Sunday Push
The map of multiple parents leads to new visualisations of competitive exchanges of code. Markus Buehren has described how his innovations in markus9 were bypassed by subsequent contestants, allowing him to recombine his code with more recent leaders to win the Sunday Push. Entries are plotted on the familiar logarithmic score scale. The prize-winning code is highlighted along with all of the entries that influenced it over several previous generations, coloured by author, with key entries labelled.
clf p = [ findAncestry1(2015,3,multiParents) findAncestry1(1731,12,multiParents)]; scoreStair2((t-3)*24,scoreN,bestIndexList) [hLines,hDots,aDots] = plotColorAuthor2(p,multiParents,authorid,(t-3)*24,log10(scoreN)); [a,iunique] = unique(aDots); axis([15 26 0.52 0.85]); set(gca,'ytick',[]); hleg= legend(hDots(iunique),uAuthors(aDots(iunique)),'fontsize',6); set(hleg,'position',[0.80 0.4 0.1839 0.5540]) ids=[1713 1747 1975 2015]; for i=1:length(ids); j=ids(i); ht(i)=text((t(j)-3)*24,log10(scoreN(j)),[d(j).title ' '],... 'horizontalalignment','right','rotation',30,'interpreter','none');end ylabel('normalized logarithmic score'); xlabel('time (hours after start of day 4)');
Grand Prize
Yi Cao's overall winner combined several late developments, including some by matlabboy, Jacob and tgs (whos shares an email address with Yi) that were disabled for low-visibility testing on the contest server. The ancestry of the overall winner, Buy a ticket, is mapped in this plot.
clf p = [ findAncestry1(3840,3,multiParents) ]; scoreStair2((t-6)*24,scoreN,bestIndexList) [hLines,hDots,aDots] = plotColorAuthor2(p,multiParents,authorid,(t-6)*24,log10(scoreN)); [a,iunique] = unique(aDots); axis([6 28 -0.2 3]); hleg= legend(hDots(iunique),uAuthors(aDots(iunique)),'fontsize',6); set(hleg,'position',[0.8 0.3 0.1839 0.6444]) ids=[ 3840 3651 3407 ]; for i=1:length(ids); j=ids(i); ht(i)=text((t(j)-6)*24,log10(scoreN(j)),[d(j).title ' '],... 'horizontalalignment','right','rotation',0,'interpreter','none');end ylabel('normalized logarithmic score'); xlabel('time (hours after start of day 6)');