Benchmarks

GPT-5 beats top models from Google, Anthropic, xAI, and Alibaba—but just barely

GPT-5. © OpenAI
Startup Interviewer: Gib uns dein erstes AI Interview Startup Interviewer: Gib uns dein erstes AI Interview

The launch of OpenAI’s GPT-5 on Thursday evening was a huge success: OpenAI is positioning its latest AI model, ChatGPT, on the market as a health and coding assistant, among other things, which is, of course, once again smarter than its predecessors. GPT-5, which will replace the older LLMs of the GPT-4 and o1/o3/o4 series, is also expected to excel in benchmarks such as AIME, SWE-bench Verified, and HealthBench Hard.

So far, so good. But how does GPT-5 compare to competing AI models? As reported, some LLMs from Google, Anthropic, and xAI have already surpassed OpenAI’s previous top models in various disciplines (e.g., coding) – so it was imperative for OpenAI to get back to the top.

Shortly after its launch, it’s safe to say that it has succeeded. Both in the Artificial Analysis Intelligence Index and in various rankings by the highly regarded LMArena, GPT-5 is ahead of its competitors – or at least tied for the lead:

Here are the results from LMArena, where users evaluate the results of AI models in blind tests:

Mathematics (LMArena)

Instruction Following (LMArena)

Creative Writing (LMArena)

As can be seen, GPT-5 was able to take the lead in almost all important categories or at least catch up with the competition. However, it is also evident that the gap between GPT-5 and its competitors is very small in some areas, and for laypeople, the results in areas such as text, coding, mathematics, and the like will hardly differ from those of its rivals. In this respect, OpenAI has managed to regain a slight lead, but it is no longer in a class of its own.

Accordingly, it will be particularly exciting to see what Google will deliver, as it is currently still at an intermediate version, namely Gemini 2.5 Pro. Anthropic (Claude 4), xAI (Grok 4), and Alibaba (Qwen 3) have only recently delivered and will need many months to launch truly new models. In the medium term, it will be exciting to see how Meta will get involved in the game – after the humiliating defeat of Llama 4, Mark Zuckerberg has spent a lot of money to poach talent from OpenAI, Apple, and others. In 2026, we will probably see what they can achieve.

Advertisement
Advertisement

Specials from our Partners

Top Posts from our Network

Powered by This price ticker contains affiliate links to Bitpanda.

Deep Dives

© Wiener Börse

IPO Spotlight

powered by Wiener Börse

Europe's Top Unicorn Investments 2023

The full list of companies that reached a valuation of € 1B+ this year
© Behnam Norouzi on Unsplash

Crypto Investment Tracker 2022

The biggest deals in the industry, ranked by Trending Topics
ThisisEngineering RAEng on Unsplash

Technology explained

Powered by PwC
© addendum

Inside the Blockchain

Die revolutionäre Technologie von Experten erklärt

Trending Topics Tech Talk

Der Podcast mit smarten Köpfen für smarte Köpfe
© Shannon Rowies on Unsplash

We ❤️ Founders

Die spannendsten Persönlichkeiten der Startup-Szene
Tokio bei Nacht und Regen. © Unsplash

🤖Big in Japan🤖

Startups - Robots - Entrepreneurs - Tech - Trends

Continue Reading