Ants: Contest Evolution
Over the course of eight days, 153 contestants submitted 2206 entries to the Ants contest. With so much activity, it is hard to follow the action. This report will pick out some interesting statistics and draw some pictures that show how the contest progressed.
It’s impressive to look at the sheer volume of entries. This area plot shows how the total number of entries grew as the contest progressed. The green area represents the entries that passed the test suite, and the red area shows those that failed. Overall 2206 of 2440 entries passed the test suite, for a pass rate of 90%.
The previous area plot shows a lot of “lumpiness”. A steepening slope shows an increase in activity. A histogram shows this more directly. Each bar represents an hour’s worth of entries.
The contest had three major phases. The first day was in “darkness”, where contestants could submit entries but they couldn’t see any of the entries or their scores. To win this phase, an entry must be general and robust. The second day was “twilight”, where we showed the scores but not yet the code. This allowed contestants to develop their algorithm without anyone else being able to leverage their ideas.
On the histogram, the darkness and twilight phases are the two large blue boxes on the left. The boxes call out two other time periods: the Sunday Push (we offered a prize to the greatest cumulative improvement to the score in 24 hours) and the Three Minute Challenge (when we ignored the time penalty and rewarded only results). The vertical gray lines show the other deadlines: Midnight in Chemnitz, Crack the Nut, and Final.
Looking at the activity of the contest winners shows their different participation styles. Each winner is listed in order and a red line marks the deadline for the phase he won.
This bar plot shows the number of entries submitted by each author. Of our 153 contestants, these were the most active. Note that the top five are all contest winners.
This is the most useful diagram for visualizing the contest. It shows the dramatic improvements that occurred over time. Each passing entry is a dot, with its submission time on the x-axis and it’s score on the y-axis. Since a lower score is better, the dots push down further as time goes on. All entries that took the lead are colored red and the red lines marks the best score at any time. The sample entry is the leftmost red dot and the winning entry is the last red dot in the lower right.
Some of the leading entries are circled and labeled with the author’s name. They show who was making the biggest improvements in score (represented in this plot as a vertical drop in the red line) at any point in the contest.
The improvement in score happens over a huge dynamic range. Early in the contest, it is easy to make big improvements in the score. As the algorithms get better, improvements become increasingly difficult. To show this, we normalize the scores so the best (smallest) score is 1 and the worst score is some power of 10. Then we plot them on a logarithmic scale. This exaggerates the improvements at the end of the contest. By increasing the number of decades we spread the scores over, we increasingly exaggerate the smaller improvements made at the end of the contest.
Unusal in this contest is the flat stretch in score near last two days in the contest. In this period, noone could improve on the result. All the changes in leadership were due to an improvement in running time.
Here is a plot of the percent improvement generated by each new leader relative to the previous leader. This lets us see who is responsible for the biggest changes over the contest. The upper frontier of this plot is a sort of hall of fame, and someone whose name appears there more than once managed to make several significant improvements to the score.
One of the interesting aspects of the contest is that entries needed to minimize two things at once. Getting the best possible answer must be weighed against the time an entry takes to run. As discussed above, the entry’s score is a combination of these two factors. Plotting these two against each other yields a very different picture of the contest.
The leader line is shown in red again in this picture. The gray contours show lines of constant score. In general, the best score is somewhere along the lower-left frontier of shortest time and lowest results. Big improvements tend to move down and to the right, and they are followed by tweaking battles in which the new algorithm claws its way back down the time axis.
allEntriesHandle = 154.0139
In order to get a feel for pace of the contest, let’s label the entry that was in the lead at the end of each day. Notice that there was almost no movement on Saturday or Tuesday.
Highlighting all the entries submitted on each day in red shows shows how the overall progression down and to the left. A black circle indicates the leader at the end of each day.
Here we look at a plot that shows how many characters of code are in each of the leading entries. In regions where you see entries of more or less the same length there are very few differences from one entry to the next. In other places you can see the code getting shorter or longer. The density of the lines also shows how often the lead is changing. It’s impressive to see that several times during the contest shorter code ousted longer code from the top spot, either by pruning unneeded computation or by introducing new algorithms.
The code was much shorter in this contest than in past contests. In the past, entries tended to grow by adding a new algorithm, running the old algorithm and the new algorithm, and picking the best. In this contest, each ant could only make a local decision and couldn’t judge the overall effect of its move, making this technique impossible.
Submitting an entry by using the “edit” button on an existing entry marks the new entry as a child of the first. Tracing each entry back from parent to parent identifies its oldest ancestor. All the entries that have the same oldest ancestor are in the same “clan”. This plot draws lines between each child and its parent and colors the six largest clans. Entries in the same clan that don’t have a line between them are connected by an entry that didn’t pass (so it doesn’t have a score to plot).
Notice the spray of red from a single entry. This is Timothy Alderson’s “TLL280″, which held the top spot for more than 16 hours.
The parentage is often accurate enough to notice interesting interactions. For example, let’s zoom into the leaders on the final day of the contest.
Notice that JohanH’s “soolver2″, the entry that broke through the result barrier that had lasted two days, was based on Timothy Alderson’s “TLL280″ (the heart of the red clan). “TLL280″, however, was no longer the best entry. Timothy Alderson’s “TLL337″ (the heart of the orange clan) had taken the lead by improving the running time. JohanH quickly noticed that “soolver2″ lacked the many improvements in “TLL280″, so he quickly isolated his change and applied it to “TLL337″, resulting in “tweaky4″. A couple minutes later the cyclist did the exact same thing with “soo sneaky”. Although it was submitted after, it was luckier with the timing and took first place.
One difficulty of the clan plot is that it relies on authors to properly credit their submission. A break between clans may be spurious. Also, some entries may be based on more than one parent. Another way to see groups of entries is to look at the percent-difference between each entry and the contest winner. In this scatter plot, the color of the dot represents the percentage of code that is different from the winning entry and the size of the dot is proportional to the number of lines of code it contains.
It is very difficult to track down who contributed what code to the final entry. We’re always looking for new ways to help us sift through the data. Here’s an annotated listing of the winning entry. The last column is the line of code from the winning entry. The middle column shows the author and entry ID when a leading entry first used this line of code. The first column shows, if different, the first time this line of code was used in any entry. There’s a lot of noise in the result, but blocks of names indicate probable original authorship for that section of the entry (or at least the last person to tweak it).
For analysis, including a listing of all the leaders, the author’s contributions to the score by day, and other metrics, see the Statistics page.
Thanks for participating, whether by entering the contest many times, once, or merely checking out the site every now and again.
Be sure to sign up on our notify list so you’ll be ready to play the next contest: Send an e-mail to email@example.com with “subscribe contest-nnounce” in the body.
This contest analysis was calculated and published entirely from MATLAB. We used the Database Toolbox to pull the information directly from the contest database. This HTML document was automatically generated from a MATLAB script using “Publish to HTML”, a feature introduced in R14.
Published with MATLAB® 7.1