Speech-to-Text

Mistral AI launches “Voxtral Transcribe 2” for real-time speech recognition

Das Team von Mistral. © Mistral AI
Das Team von Mistral. © Mistral AI
Startup Interviewer: Gib uns dein erstes AI Interview Startup Interviewer: Gib uns dein erstes AI Interview

French AI startup Mistral AI releases Voxtral Transcribe 2, two next-generation speech-to-text models designed to deliver state-of-the-art transcription quality and “ultra-low latency”. The family comprises Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications.

Voxtral Realtime is available as an open-source model under the Apache 2.0 license. The model addresses applications where latency is critical. Realtime uses a novel streaming architecture that transcribes audio as it arrives. According to Mistral, the model delivers transcriptions with latency under 200 milliseconds and unlocks a new class of voice-based applications.

The new speech model family natively supports 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch.

Challenge to ChatGPT and competitors

With Voxtral Mini Transcribe V2, Mistral AI launches a transcription model that clearly sets itself apart from established solutions like ChatGPT. The model aims to deliver improvements in transcription and speaker recognition quality and work reliably across different languages and use cases. With a word error rate of around four percent on the FLEURS benchmark, Voxtral achieves very high accuracy — at just $0.003 per minute. This makes it currently one of the most attractive offerings on the market.

In direct comparison, Voxtral Mini Transcribe V2 is said to outperform models like GPT-4o mini Transcribe, Gemini 2.5 Flash, Assembly Universal, and Deepgram Nova in accuracy. At the same time, according to Mistral, it processes audio data roughly three times faster than ElevenLabs Scribe v2 at comparable quality and about one-fifth the cost.

Technical design and enterprise suitability

Technically, Voxtral 2 is clearly designed as a cost-effective enterprise solution. Context biasing is currently optimized for English — the model is trained on specific words or phrases to ensure they are transcribed correctly. Additionally, the model shows low susceptibility to background noise and is said to deliver stable results even in acoustically challenging environments such as factory floors or call centers.

For testing, the AI company provides an audio playground in Mistral Studio. There, up to ten audio files can be uploaded simultaneously, speaker recognition can be enabled or disabled, timestamp granularity can be selected, and context bias terms can be added. Common audio formats such as MP3, WAV, M4A, FLAC, and OGG are supported, with a maximum file size of one gigabyte per file.

Data protection, availability, and pricing

As a European company, Mistral AI wants to convince with independence from US solutions. Both new Voxtral models support GDPR-compliant deployments, for example on-premise or in private cloud environments. Voxtral Mini Transcribe V2 is available immediately via API at a price of $0.003 per minute. Additionally, Voxtral Realtime for real-time applications is available at $0.006 per minute and is also available as an open-weights model on Hugging Face.

Rank My Startup: Erobere die Liga der Top Founder!
Advertisement
Advertisement

Specials from our Partners

Top Posts from our Network

Deep Dives

© Wiener Börse

IPO Spotlight

powered by Wiener Börse

Europe's Top Unicorn Investments 2023

The full list of companies that reached a valuation of € 1B+ this year
© Behnam Norouzi on Unsplash

Crypto Investment Tracker 2022

The biggest deals in the industry, ranked by Trending Topics
ThisisEngineering RAEng on Unsplash

Technology explained

Powered by PwC
© addendum

Inside the Blockchain

Die revolutionäre Technologie von Experten erklärt

Trending Topics Tech Talk

Der Podcast mit smarten Köpfen für smarte Köpfe
© Shannon Rowies on Unsplash

We ❤️ Founders

Die spannendsten Persönlichkeiten der Startup-Szene
Tokio bei Nacht und Regen. © Unsplash

🤖Big in Japan🤖

Startups - Robots - Entrepreneurs - Tech - Trends

Continue Reading