Visualization of Algorithm Development and Identification of Good Ones
Contents
Algorithm Development vs Tweak
% It is always interesting to differentiate between algorithm development % and tweaking. For that, I plot below the 3D plots of result vs score vs % CPU time, for the (i)darkness, (ii)twilight, (iii)daylight 1 and (iv)daylight end. % We can notice significant differences among the plots. In the darkness % period, we see the points predominantly in scattered form. This is % because people are developing different algorithms to reduce score with % different speed. During twilight, the scores etc are visible. Therefore, % people who developed good entries to decrease the result in acceptable time, % concentrate more on improving time. That's why the scatter curve of % darkness gets the parabolic compact shape (compact in terms of result % reduction at reduced speed). Once daylight begins, people can see others' % entries, so they start combining the 'good' entries with their own/other % entries, that's the start of tweaking. That's why again some scattered % points appear in the bottom left plot. During the end-phase of the contest, % the compact nature of the curve gets really tight (bottom right plot), % with few tweaking efforts going wrong. Nevertheless, we can visualize how % algorithms were developed. The twilight good entries really motivates the % contest solution. The point of inflexion of the parabolic curve in twilight % probably points towards the most influential developments. load contest_data score=[d.score]; timestamp=[d.timestamp]; passed=[d.passed]; result=[d.result]; cpu_time=[d.cpu_time]; charCount=[d.charCount]; md=datevec(timestamp); %date vector entry=[]; for i=9:16 entry(end+1)=numel(find(md(:,3)==i)>0); %third column of date vector gives day info end contestdate=9:16; %9-16 May 2007 ce=cumsum(entry); index=[[1,ce(1:end-1)+1]; ce]'; %mapping date (9-16 May) to entries (1 to 3914) figure('Position',[50 100 1000 777]) subplot(2,2,1),plot3(score(index(1,1):index(1,2)),result(index(1,1):index(1,2)),cpu_time(index(1,1):index(1,2)),'.');grid on; xlabel('Score (Darkness)');ylabel('Result (Darkness)');zlabel('CPU time (Darkness)');view([-57,28]); subplot(2,2,2),plot3(score(index(2,1):index(2,2)),result(index(2,1):index(2,2)),cpu_time(index(2,1):index(2,2)),'.');grid on; xlabel('Score (Twilight)');ylabel('Result (Twilight)');zlabel('CPU time (Twilight)');view([-57,28]); subplot(2,2,3),plot3(score(index(3,1):index(3,2)),result(index(3,1):index(3,2)),cpu_time(index(3,1):index(3,2)),'.');grid on; xlabel('Score (Daylight 1)');ylabel('Result (Daylight 1)');zlabel('CPU time (Daylight 1)');view([-57,28]); subplot(2,2,4),plot3(score(index(end,1):index(end,2)),result(index(end,1):index(end,2)),cpu_time(index(end,1):index(end,2)),'.');grid on; xlabel('Score (Daylight end)');ylabel('Result (Daylight end)');zlabel('CPU time (Daylight end)');view([-57,28]);
'Leading' entries
% We have seen how good entries during the twilight phase influence the % key development of the contest. How do we judge whether an entry is good % once daylight breaks? From my personal experience, it is impossible to % look at all entries. Rather, I think, it is more useful to optimize the % choice in terms of performance and code length (easiness to read). So, let's % quantitatively visualize that. We would first take out the length of the % codes and scores of the passing entries at twilight: day=2; %1:9 May, 2:10 May, ...,8:16 May p=passed(index(day,1):index(day,2)); %passing entries at twilight c=charCount(index(day,1):index(day,2)); %codelengths of entries at twilight cp=c(p==1); %codelength of passing entries at twilight s=score(index(day,1):index(day,2)); %scores of entries at twilight sp=s(p==1); %scores of passing entries at twilight % Next, we form an optimization variable by linearly combining the % codelength with a factor and the score, and minimize it. % The codelength factor starts at 1, indicating that we're just interested % in the best entries, i.e., short entries with reduced score. % We plot then the score vs code length, and the 10 red circles % (at the bottom left corner) indicate which 10 'leading' entries we would % like to follow. number_of_entries_to_lookat=10; codelength_factor=1; xp=cp*codelength_factor+sp; %optimization variable [i,j]=sort(xp,'ascend'); %minimize figure('Position',[50 100 756 588]) plot(cp,sp,'.');grid on;hold on; plot(cp(j(1:number_of_entries_to_lookat)),sp(j(1:number_of_entries_to_lookat)),'ro'),ylabel('Score of passing entries at twilight'),xlabel('Code length of passing entries at twilight'),title(strcat('Optimal ',num2str(number_of_entries_to_lookat), ' entries, with codelength factor= ',num2str(codelength_factor)));
'Good' entries
% Now, we might be interested to relax the code length factor a bit so that % we want to see some more complicated codes which also achieved good scores. % We see the optimal choice (red circles at bottom left corner) move upwards % pointing towards some 'good' entries but little bit longer to read. codelength_factor=8; xp=cp*codelength_factor+sp; %optimization variable [i,j]=sort(xp,'ascend'); %minimize figure('Position',[50 100 756 588]) plot(cp,sp,'.');grid on;hold on; plot(cp(j(1:number_of_entries_to_lookat)),sp(j(1:number_of_entries_to_lookat)),'ro'); ylabel('Score of passing entries at twilight'),xlabel('Code length of passing entries at twilight'),title(strcat('Optimal ',num2str(number_of_entries_to_lookat), ' entries, with codelength factor= ',num2str(codelength_factor)));
Performance of the 'good' entries
% Once we identify (programmatically) which entries we would like to know % more about, we would be interested to know more about those, so we plot % the result vs cpu time. We see that some (around cpu time 36 sec) of % the 10 good entries are quite similar in terms of result and speed, % while the others are different. r=result(index(day,1):index(day,2)); %results of entries at twilight rp=r(p==1); %results of passed enries at twilight cpu=cpu_time(index(day,1):index(day,2)); %cpu times of entries at twilight cpup=cpu(p==1); %cpu times of passed entries at twilight figure('Position',[50 100 756 588]) plot(rp(j(1:number_of_entries_to_lookat)),cpup(j(1:number_of_entries_to_lookat)),'*');grid on;ylabel('CPU time of good entries at twilight'),xlabel('Result of good entries at twilight'),title(strcat('Optimal ',num2str(number_of_entries_to_lookat), ' entries, with codelength factor= ',num2str(codelength_factor))); % These plots show the twilight analysis, changing the 'day' variable, we % could also look at different stages of the contest.
