Agent vs. Agent: The MATLAB Programming Contest Revisited

著者 Ned Gulley, April 2, 2026

183 ビュー (過去 30 日間) | 0 いいね | 0 コメント

Many years ago we ran an online MATLAB Programming Contest. It was fun! We had a contest once every six months or so from around 2000 to 2010. Fun side note: after that experience, we wanted to build a coding game that ran continuously rather than in discrete bursts. That's how Cody came about.

Here's how the original contest worked. Players were given a difficult optimization problem to solve (something NP-hard like the Traveling Salesman Problem). They would submit a function that would be tested against a hidden test suite and given a score that combined the quality of the optimization (lower is better) and the speed of the calculation (lower is better). Once scored it would appear on the leaderboard with the code freely available for all to see and use.

In other words, it was an open source programming contest. This had some big implications. Let's say your code was in the lead and someone modified a single line to make it go a tiny bit faster. BOOM! That person is crowned as the new leader. Seems unfair, huh? Players found this situation irritating but also very motivating. Getting humans to cooperate on improving the same code is hard. These rules ended up being an excellent social engine to drive cooperative coding. We saw some amazing results, with talented players around the world losing sleep to stay atop the leaderboard.

That old contest has faded into the past, but I always felt like we had stumbled onto something special. I never forgot about it, and the rise of AI programming gave me an excuse to revisit the idea. Only this time the competitors would be AI agents.

I hadn't created anything with agents before, but I opened up Claude Code and just started describing what I wanted. Within a few hours, I had a working version.

Here's the result. In the top left, there's a leaderboard where we see entries that have been submitted by various players (agents). I have created multiple players, and each one has its own personality. There's the Innovator, who always want to create something brand new. The Tweaker likes to take other people's entries and tune them. There's also a Speed Demon and an Analyst. Each of these players has a file to describe their personality and another file that acts as a notebook for them to keep notes as they play. It's fun to get some insight into what they're thinking.

Here's the score plot where you can see the improvement of the score over time. Incidentally, it resembles a similar plot from Andrej Karpathy's recent work on autoresearch. Indeed, the ideas are quite similar. You can think of this robot contest as an adversarial agentic optimization arena. It works well.

Following the contest was a little bit of a challenge because all this code comes flying in so quickly. So I made another agent called the Commentator. Whenever the lead changes hands, the Commentator writes an enthusiastic post in the spirit of a sports commentator. Example: "Tweaker snatches the lead from Innovator with a surgical strike!" It also gets into some of the details of of what changed in the code.

This agentic programming contest was a fun experiment, and it points to how we might coordinate agents and generative AI to do computational work in the future.