I/O

Google Launches Gemini 3.5 Flash With Higher Prices but No Generational Leap

Jakob Steinschaden20. May 2026, 08:16

The focus was actually on things like the video model Omni, the OpenClaw alternative Gemini Spark, or the biggest overhaul of Google Search in 25 years — not on new LLMs.

However: Google presented the new AI model Gemini 3.5 Flash at its developer conference I/O. The model is set to power a wide range of Google products, but comes with a noticeable price increase and without the major performance leap many observers had expected.

In the current Arena.ai ranking, Gemini 3.5 Flash manages “only” (with preliminary figures) to reach 9th place. It is quite possible, however, that the “Pro” version, expected soon, will make the leap to the top, where Anthropic currently holds sway:

What Google wants to use the model for

Gemini 3.5 Flash is now active in numerous Google products and platforms. The company is deploying the model for both end consumers and developers and businesses alike.

In the Gemini app and in the AI mode of Google Search for all users worldwide
In the agentic development platform Google Antigravity as well as in Google AI Studio and Android Studio for developers
In the Gemini Enterprise Agent Platform and in Gemini Enterprise for enterprise customers

Google emphasizes that the model has already had a significant effect internally. Daily token processing via internal AI developer tools rose from half a trillion in March to now more than three trillion per day.

Significant price increase compared to predecessors

Despite broad availability in free consumer products, Gemini 3.5 Flash is considerably more expensive for API customers than its predecessors. At a price of $1.50 per million input tokens and $9 per million output tokens, the model costs three times as much as its predecessor Gemini 3 Flash Preview and even six times as much as Gemini 3.1 Flash-Lite.

This brings Gemini 3.5 Flash closer to the price range of Google’s own Pro model, which is priced at $2 (input) and $12 (output). The price increase fits into an industry-wide trend: OpenAI and Anthropic have also priced their latest models higher than their respective predecessors.

Google itself positions the model, despite the price increase, as a cost-effective alternative to other top models. The company calculates that businesses shifting 80 percent of their workloads to Gemini 3.5 Flash could save over one billion dollars annually.

Performance: Improvements, but no breakthrough

According to Google, Gemini 3.5 Flash performs better than its predecessor Gemini 3.1 Pro on almost all benchmarks and is said to have made particular progress in agentic coding and complex, multi-step tasks. Speed is especially highlighted: the model is said to be four times faster than comparable top models in terms of output tokens per second.

Nevertheless, this is not the major generational leap many observers had hoped for. The model carries version number 3.5, not 4, and thus remains within the existing model family. The technical specifications are solid: a context window of over one million input tokens, up to 65,536 output tokens, and a knowledge cutoff of January 2025.

The big leap will probably only come with Gemini 4

At the I/O conference, disappointment was palpable in the audience when Google CEO Sundar Pichai announced that the eagerly anticipated model Gemini 3.5 Pro is not yet ready.

“I know you can hardly wait to get your hands on it. Give us until next month to bring it to you.”

Pichai gave no reasons for the delay, but praised the model as already in use internally with “significant improvements in performance.” Gemini 3.5 Pro is expected to launch in June 2026, presumably at an even higher price than Flash.

All signs point to Google saving the truly significant innovations for a future Gemini 4 generation. Gemini 3.5 Flash is therefore more of a solid incremental improvement than a milestone, intended to impress primarily through its speed and broad integration into Google’s product ecosystem.