Start

Moonshot AI launches Kimi K2 Thinking: The next open-source shot from China

Jakob Steinschaden06. November 2025, 20:25

Kimi K2-Thinking-Modell in Betrieb. © Moonshot AI

Startup Interviewer: Gib uns dein erstes AI Interview

And the next attack follows immediately: After the Chinese startup MiniMax recently launched an open-source model ten times cheaper against US competition, Moonshot AI now follows. Also based in China, the startup is launching Kimi K2 Thinking, a new open-source model in the “reasoning” sector that is supposed to compete with those from OpenAI and Anthropic in many aspects.

Moonshot AI has introduced Kimi K2 Thinking, a new open-source AI model that directly competes with the established reasoning models from OpenAI and Anthropic. The company describes K2 Thinking as “our best open-source thinking model” and positions it as a “thinking agent” that “thinks step by step while using tools”.

The competition is now open: While OpenAI and Anthropic have been using reasoning models for months, Moonshot AI is now following up with a freely available model. K2 Thinking is now available on kimi.com and can be accessed via an API.

Moonshot AI wants to compete with OpenAI and Anthropic

The technical data reads ambitiously: K2 Thinking achieves 44.9 percent on Humanity’s Last Exam (HLE), a benchmark with “thousands of expert-level questions across more than 100 disciplines.” On BrowseComp, which tests the ability to continuously browse and research, the model scores 60.2 percent – significantly above the human baseline of 29.2 percent. On the coding benchmark SWE-Bench Verified, K2 Thinking achieves 71.3 percent. What’s special: The model can “execute 200 to 300 sequential tool calls without human intervention” and “think coherently across hundreds of steps.”

Moonshot AI demonstrates the performance using a PhD-level mathematics problem from hyperbolic geometry, which K2 Thinking solves through 23 nested reasoning and tool calls. The model searches scientific papers, executes Python code, verifies intermediate results, and finally derives a closed formula. According to the manufacturer, this ability for “planning, reasoning, execution, and adaptation across hundreds of steps” distinguishes K2 Thinking from classic Large Language Models.

From code generation to complex web research

In practice, K2 Thinking shows broad application possibilities: For coding tasks, the model achieves 61.1 percent on SWE-Multilingual and delivers “remarkable improvements in HTML, React, and component-intensive frontend tasks.” Moonshot AI demonstrates how the model creates complete, responsive websites or Word clones from a single prompt. For agent-based search tasks, K2 Thinking goes through dynamic cycles of “thinking → searching → using browser → thinking → code” to break down and solve ambiguous, open-ended problems.

An example illustrates the research capabilities: From a complex description with multiple criteria (university degree, NFL career, sci-fi movie role, prison drama appearance, interview statement), K2 Thinking systematically identifies the sought-after actor Jimmy Gary Jr. and his film role Rudy Cox. The model conducts over 20 searches, verifies information via Wikipedia, IMDb, and interviews, and combines the results into a coherent answer. This “long-horizon planning” and “adaptive reasoning” distinguish K2 Thinking from pure language models.

With K2 Thinking, competition in the reasoning segment intensifies: While OpenAI and Anthropic operate proprietary models, Moonshot AI focuses on open source and aggressive benchmark performance. Test-time scaling – extending thinking time through more reasoning tokens and tool calls – is becoming the new battlefield in the AI race. Whether the Chinese model will prevail against GPTs and Claude in practice will be shown in the coming months. The technical prerequisites have certainly been established.