Mistral Launches Voxtral—Open‑Source AI for Speech

July 16, 2025 Ai Binger News Desk

Mistral AI, released Voxtral, its first family of open-source AI audio models.

This move aims to democratize access to advanced speech understanding and generation capabilities, offering a powerful and cost-effective alternative to proprietary solutions.

Key Features and Capabilities of Voxtral:

Answer Questions Directly: Users can ask questions about audio content, and Voxtral can provide relevant answers.

Generate Summaries: It can automatically summarize lengthy audio, making it ideal for transcribing meetings, lectures, or podcasts.

Function Calling from Voice: A truly innovative feature, Voxtral can interpret spoken commands and directly trigger actions, such as executing API calls or workflows, enabling more intuitive voice-controlled applications.

Multilingual Prowess: Voxtral boasts impressive multilingual capabilities, automatically detecting and performing state-of-the-art transcription and understanding in several widely used languages, including English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian.

Long-form Context: The models can handle extended audio sessions, processing up to 30 minutes for transcription and 40 minutes for understanding, thanks to a 32,000 token context length.

Two Variants for Diverse Needs: Mistral has released two versions of Voxtral:

Voxtral Small (24 billion parameters): Designed for high-performance, production-scale deployments, competing with leading proprietary models like ElevenLabs Scribe, GPT-4o-mini Transcribe, and Gemini 2.5 Flash.

Voxtral Mini (3 billion parameters): A more compact version optimized for local and edge deployments, and a dedicated “Voxtral Mini Transcribe” API for cost-sensitive, transcription-only use cases, claimed to outperform OpenAI’s Whisper for less than half the price.

Availbilty & Pricing

Developers can access Voxtral through Mistral’s API, with pricing starting at an impressive $0.001 per minute, or by downloading the model weights from Hugging Face.

Users can also experience Voxtral’s capabilities through Mistral’s Le Chat platform.

What’s Next?

Mistral plans to add advanced features like speaker identification, emotion detection, audio segmentation, timestamps, and non-speech audio recognition.

A webinar with Inworld showcasing voice-based agents is scheduled for August 6.

News Gist

Mistral AI has launched Voxtral, its first open-source audio model, featuring high-accuracy transcription, multilingual support, and voice function calling.

Available under Apache 2.0, Voxtral rivals Whisper and ElevenLabs while offering enterprise-grade AI at half the typical cost.

Cookie	Domain	Description	Duration	Type
_ga_*	.aibinger.com	Google Analytics sets this cookie to store and count page views.	1 year 1 month 4 days	Analytics
_ga	.aibinger.com	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.	1 year 1 month 4 days	Analytics

AI Binger

Mistral Launches Voxtral—Open‑Source AI for Speech

Key Features and Capabilities of Voxtral:

Availbilty & Pricing

What’s Next?

News Gist

Leave a Reply Cancel reply