Hash Code: AI from DeepMind beats top human programmers

🏆 A "Deep Blue" moment for competitive programming?

Dec 08, 2024

This week researchers from Google DeepMind demonstrated an AI-system beating top human competitive programmers on past Hash Code competition problems.

It’s a big deal! Let’s see what the paper is about and how the method used by DeepMind researchers works.

(I co-created Hash Code and led the engineering team behind the competition until its sunset last year.)

Battle of heuristics

Hash Code is a battle of heuristics. A “heuristic” is a program that produces imperfect results for a specific problem.

Hash Code competition problems are mathematically very hard. In fact, the authors of the competition themselves do not know the optimal solution for each problem.

Paradoxically, this makes such a competition easier (more approachable) for contestants at all levels. This is because they can start with a very simple, not-good-at-all solution, get their first few points, and then iteratively improve it over time.

(This is as opposed to the classical “exact-solution” type of programming competitions known from TopCoder, Codeforces, ACM ICPC, etc. In such competitions each submission is either accepted or fails, with no intermediate states in between.)

An example Hash Code problem from 2018 was optimizing self-driving rides:

The contestants were given a list of passenger requests (getting from A to B at a given time). The task was to assign rides to cars in a way that’s as effective as possible.

DeepMind approach

DeepMind used an AI-powered “function search” mechanism to find the best heuristics for past Hash Code problems.

Like a Hash Code contestant, this method starts with a very simple and ineffective solution, and then experimentally evolves it in different directions. (The researchers wrote the base solution for each of the problems themselves, so their results represent an example of human-AI collaboration 💫.)

For example, for the self-driving rides problem, the base solution simply picks the next doable ride for any car that finishes their previous ride.

The real magic is in the experimental evolution: that’s the part done by the AI. The method is called FunSearch and was originally published in December 2023.

FunSearch uses an LLM to propose changes to a given program (or multiple programs). Starting with the base solution, FunSearch creates modified versions of it.

In the image above, each dot represents a solution, made by experimentally modifying one or more previous solutions.

Evolutionary approach

Each of the experimental modifications of the program is scored. In the picture above, the size of each dot represents how well it performed.

This way, the experimental modifications are not random. The system keeps picking the best scoring existing solutions, and asking the LLM to write a new solution based on these examples. The search is guided by how well the existing solutions score.

This is the evolutionary approach. In living organisms, the “score” are the biological results of each organism’s life: its survival and successful reproduction. For algorithms, the “score” is a number; in this case calculated using a scoring formula indicated in a Hash Code problem statement.

Results

The DeepMind method gets into the top 1% of solutions for all 8 past editions of Hash Code under evaluation; and gets the top spot in five of them.

Conclusion

Is this a “Deep Blue moment” for competitive programming, akin to the famous victory of a computer system over Garry Kasparov in a game of chess?

Kind of. Not quite.

The main difference is that these results don’t come from a live match: that couldn’t happen, as Hash Code was sunset in 2023. Moreover, these results concern specifically the “battle of heuristics” type of programming contest.

In “exact solution” competitions like CodeForces, the best humans are beating LLM-based systems. For now 💫.

Postcard from Paris

Sunshine wins over the general grayness and wetness.

Stay warm 🍵,
– Przemek

pnote

Discussion about this post