GPT-4 Vs. AlphaCode: Comparing Two Leading Code Generation Tools

Itamar Friedman August 23, 2023 3 min

The contest: Codeforces

GPT-4 and AlphaCode are two code-generation tools.

They were both examined on Codeforces programming contests (the benchmark – Codeforces Rating [1]).

First, let’s look into GPT-4 reports.

GPT-4 Codeforces Rating is 392 points, improving on GPT-3.5’s 260 points.

Both of these ratings are under the 5th percentile from the bottom measured according to the competition common (human) participants and are considered as Newbie [1].

Surprisingly low, isn’t it?

After all, aren’t we all developers enjoying using ChatGPT, Copilot, or both while coding?

We’ll attempt to answer this question soon.

Interestingly, compared to other mentioned exams and contests in GPT-4 report [0], GPT-4 achieves the lowest percentile on the Codeforces coding contest (3rd bar from the left on the figure below)

GPT-4 reports

It seems that the problem-solving and programming abilities required to excel at these contests are beyond the capabilities of existing LLMs capabilities.

However, DeepMind managed to build a system reaching the 45% percentile!

That is ~9x better percentile-wise compared to GPT-4!

According to the Codeforces Rating, AlphaCode’s [2] achievement position it approximately as a top-level Pupil.

Codeforces

*AlphaCode’s Niave rating was not reported. The presented estimation was calculated by us according to the fact that the generated solutions’ pass rate over newly generated tests is only 1% (!!). AlphaCode’s full solution with Code Integrity (filtering and clustering) solves that. Keep reading for more info on the full solution.

AlphaCode: Architecture includes both code generation and integrity agents

How could it be that AlphaCode beats GPT-4?

After all, those who tried Bard or Codey [3] will very likely agree that Google’s models and solutions are not better than OpenAI ones. So, what is going on here?

Below is an overview of AlphaCode

AlphaCode

AlphaCode’s three key components were critical to achieving good and reliable performance:

An extensive and clean competitive programming dataset for training and evaluation,
Large-language models (LLMs) that are sampled to explore various solutions,
Followed by a filtering method based on program behavior and testing.

[blog-subscribe]

In other words, AlphaCode includes two major components (agents):

[Code Gen], an agent that generates a variety of possible (not necessarily working) solutions for a given problem, and [Code Integrity], an agent that generates tests, analyzes the solutions’ behaviors and selects the top solutions.

The code integrity agent, the one in charge of filtering and clustering, is actually the secret sauce.

Overall, less than 1% (!!) of the solutions generated by the code generation agent actually pass example tests.

Note #1: AlphaCode solution is very costly. It generates 10K or even 1M solutions by the code generation agent before being examined by the code integrity agent. Future work will have a closer loop. We already see such works, to be covered in another post :)

Note #2: Why don’t we see a code integrity component integrated into Bard, for example?
Answer: it isn’t simple at all to make this technology work on real-world code (disclaimer and spoiler: this is what we solve at qodo (formerly Codium))

Code Generation + Code Integrity:

As shown by AlphaCode, code generation and integrity tools could be a powerful combination.

One can see the similarities between the famous Generative-Adversarial-Networks (GANs) [10] for high-quality content generation and Generative-Integrity-Agents for high-quality code generation.

For further reading about Code Generation and Integrity tools and how they can be combined to create powerful software development agents, see this blog, “Code Integrity Supercharges Code Generation” [11], which is part two of this blog.

References:

[0] GPT-4: GPT-4 Technical Report (paper, blog)
[1] Codeforces Rating (post)
[2] AlphaCode: Competition-Level Code Generation with AlphaCode (paper, blog)
[3] Codey (API doc)
[10] Overview of GAN Structure (doc)
[11] Code Integrity Supercharges Code Generation (blog)