image thumbnail
from How Things Develop & Which Ones to Follow by Abhisek Ukil
Visualization of Algorithm Development and Identification of Good Ones

Visualization of Algorithm Development and Identification of Good Ones

Visualization of Algorithm Development and Identification of Good Ones

Contents

Algorithm Development vs Tweak

% It is always interesting to differentiate between algorithm development
% and tweaking. For that, I plot below the 3D plots of result vs score vs
% CPU time, for the (i)darkness, (ii)twilight, (iii)daylight 1 and (iv)daylight end.

% We can notice significant differences among the plots. In the darkness
% period, we see the points predominantly in scattered form. This is
% because people are developing different algorithms to reduce score with
% different speed. During twilight, the scores etc are visible. Therefore,
% people who developed good entries to decrease the result in acceptable time,
% concentrate more on improving time. That's why the scatter curve of
% darkness gets the parabolic compact shape (compact in terms of result
% reduction at reduced speed). Once daylight begins, people can see others'
% entries, so they start combining the 'good' entries with their own/other
% entries, that's the start of tweaking. That's why again some scattered
% points appear in the bottom left plot. During the end-phase of the contest,
% the compact nature of the curve gets really tight (bottom right plot),
% with few tweaking efforts going wrong. Nevertheless, we can visualize how
% algorithms were developed. The twilight good entries really motivates the
% contest solution. The point of inflexion of the parabolic curve in twilight
% probably points towards the most influential developments.

load contest_data

score=[d.score];
timestamp=[d.timestamp];
passed=[d.passed];
result=[d.result];
cpu_time=[d.cpu_time];
charCount=[d.charCount];

md=datevec(timestamp); %date vector

entry=[];
for i=9:16
    entry(end+1)=numel(find(md(:,3)==i)>0);  %third column of date vector gives day info
end
contestdate=9:16; %9-16 May 2007

ce=cumsum(entry);
index=[[1,ce(1:end-1)+1]; ce]'; %mapping date (9-16 May) to entries (1 to 3914)


figure('Position',[50 100 1000 777])
subplot(2,2,1),plot3(score(index(1,1):index(1,2)),result(index(1,1):index(1,2)),cpu_time(index(1,1):index(1,2)),'.');grid on; xlabel('Score (Darkness)');ylabel('Result (Darkness)');zlabel('CPU time (Darkness)');view([-57,28]);
subplot(2,2,2),plot3(score(index(2,1):index(2,2)),result(index(2,1):index(2,2)),cpu_time(index(2,1):index(2,2)),'.');grid on; xlabel('Score (Twilight)');ylabel('Result (Twilight)');zlabel('CPU time (Twilight)');view([-57,28]);
subplot(2,2,3),plot3(score(index(3,1):index(3,2)),result(index(3,1):index(3,2)),cpu_time(index(3,1):index(3,2)),'.');grid on; xlabel('Score (Daylight 1)');ylabel('Result (Daylight 1)');zlabel('CPU time (Daylight 1)');view([-57,28]);
subplot(2,2,4),plot3(score(index(end,1):index(end,2)),result(index(end,1):index(end,2)),cpu_time(index(end,1):index(end,2)),'.');grid on; xlabel('Score (Daylight end)');ylabel('Result (Daylight end)');zlabel('CPU time (Daylight end)');view([-57,28]);

'Leading' entries

% We have seen how good entries during the twilight phase influence the
% key development of the contest. How do we judge whether an entry is good
% once daylight breaks? From my personal experience, it is impossible to
% look at all entries. Rather, I think, it is more useful to optimize the
% choice in terms of performance and code length (easiness to read). So, let's
% quantitatively visualize that. We would first take out the length of the
% codes and scores of the passing entries at twilight:

day=2; %1:9 May, 2:10 May, ...,8:16 May
p=passed(index(day,1):index(day,2)); %passing entries at twilight
c=charCount(index(day,1):index(day,2)); %codelengths of entries at twilight
cp=c(p==1); %codelength of passing entries at twilight
s=score(index(day,1):index(day,2)); %scores of entries at twilight
sp=s(p==1); %scores of passing entries at twilight

% Next, we form an optimization variable by linearly combining the
% codelength with a factor and the score, and minimize it.
% The codelength factor starts at 1, indicating that we're just interested
% in the best entries, i.e., short entries with reduced score.
% We plot then the score vs code length, and the 10 red circles
% (at the bottom left corner) indicate which 10 'leading' entries we would
% like to follow.

number_of_entries_to_lookat=10;
codelength_factor=1;
xp=cp*codelength_factor+sp; %optimization variable
[i,j]=sort(xp,'ascend'); %minimize
figure('Position',[50 100 756 588])
plot(cp,sp,'.');grid on;hold on; plot(cp(j(1:number_of_entries_to_lookat)),sp(j(1:number_of_entries_to_lookat)),'ro'),ylabel('Score of passing entries at twilight'),xlabel('Code length of passing entries at twilight'),title(strcat('Optimal ',num2str(number_of_entries_to_lookat), ' entries, with codelength factor= ',num2str(codelength_factor)));

'Good' entries

% Now, we might be interested to relax the code length factor a bit so that
% we want to see some more complicated codes which also achieved good scores.
% We see the optimal choice (red circles at bottom left corner) move upwards
% pointing towards some 'good' entries but little bit longer to read.

codelength_factor=8;
xp=cp*codelength_factor+sp; %optimization variable
[i,j]=sort(xp,'ascend'); %minimize
figure('Position',[50 100 756 588])
plot(cp,sp,'.');grid on;hold on; plot(cp(j(1:number_of_entries_to_lookat)),sp(j(1:number_of_entries_to_lookat)),'ro');
ylabel('Score of passing entries at twilight'),xlabel('Code length of passing entries at twilight'),title(strcat('Optimal ',num2str(number_of_entries_to_lookat), ' entries, with codelength factor= ',num2str(codelength_factor)));

Performance of the 'good' entries

% Once we identify (programmatically) which entries we would like to know
% more about, we would be interested to know more about those, so we plot
% the result vs cpu time. We see that some (around cpu time 36 sec) of
% the 10 good entries are quite similar in terms of result and speed,
% while the others are different.

r=result(index(day,1):index(day,2)); %results of entries at twilight
rp=r(p==1); %results of passed enries at twilight
cpu=cpu_time(index(day,1):index(day,2)); %cpu times of entries at twilight
cpup=cpu(p==1); %cpu times of passed entries at twilight

figure('Position',[50 100 756 588])
plot(rp(j(1:number_of_entries_to_lookat)),cpup(j(1:number_of_entries_to_lookat)),'*');grid on;ylabel('CPU time of good entries at twilight'),xlabel('Result of good entries at twilight'),title(strcat('Optimal ',num2str(number_of_entries_to_lookat), ' entries, with codelength factor= ',num2str(codelength_factor)));

% These plots show the twilight analysis, changing the 'day' variable, we
% could also look at different stages of the contest.

Contact us at files@mathworks.com