Benchmarks

GPT-5 beats top models from Google, Anthropic, xAI, and Alibaba—but just barely

Jakob Steinschaden08. August 2025, 10:00

Startup Interviewer: Gib uns dein erstes AI Interview

The launch of OpenAI’s GPT-5 on Thursday evening was a huge success: OpenAI is positioning its latest AI model, ChatGPT, on the market as a health and coding assistant, among other things, which is, of course, once again smarter than its predecessors. GPT-5, which will replace the older LLMs of the GPT-4 and o1/o3/o4 series, is also expected to excel in benchmarks such as AIME, SWE-bench Verified, and HealthBench Hard.

So far, so good. But how does GPT-5 compare to competing AI models? As reported, some LLMs from Google, Anthropic, and xAI have already surpassed OpenAI’s previous top models in various disciplines (e.g., coding) – so it was imperative for OpenAI to get back to the top.

Shortly after its launch, it’s safe to say that it has succeeded. Both in the Artificial Analysis Intelligence Index and in various rankings by the highly regarded LMArena, GPT-5 is ahead of its competitors – or at least tied for the lead:

Here are the results from LMArena, where users evaluate the results of AI models in blind tests:

Mathematics (LMArena)

Instruction Following (LMArena)

Creative Writing (LMArena)

As can be seen, GPT-5 was able to take the lead in almost all important categories or at least catch up with the competition. However, it is also evident that the gap between GPT-5 and its competitors is very small in some areas, and for laypeople, the results in areas such as text, coding, mathematics, and the like will hardly differ from those of its rivals. In this respect, OpenAI has managed to regain a slight lead, but it is no longer in a class of its own.

Accordingly, it will be particularly exciting to see what Google will deliver, as it is currently still at an intermediate version, namely Gemini 2.5 Pro. Anthropic (Claude 4), xAI (Grok 4), and Alibaba (Qwen 3) have only recently delivered and will need many months to launch truly new models. In the medium term, it will be exciting to see how Meta will get involved in the game – after the humiliating defeat of Llama 4, Mark Zuckerberg has spent a lot of money to poach talent from OpenAI, Apple, and others. In 2026, we will probably see what they can achieve.