AI Tools & Products NewsFeatured News

Mistral Launches Voxtral—Open‑Source AI for Speech

Mistral AI, released Voxtral, its first family of open-source AI audio models.

This move aims to democratize access to advanced speech understanding and generation capabilities, offering a powerful and cost-effective alternative to proprietary solutions.

Key Features and Capabilities of Voxtral:

Answer Questions Directly: Users can ask questions about audio content, and Voxtral can provide relevant answers.

Generate Summaries: It can automatically summarize lengthy audio, making it ideal for transcribing meetings, lectures, or podcasts.

Function Calling from Voice: A truly innovative feature, Voxtral can interpret spoken commands and directly trigger actions, such as executing API calls or workflows, enabling more intuitive voice-controlled applications.

Multilingual Prowess: Voxtral boasts impressive multilingual capabilities, automatically detecting and performing state-of-the-art transcription and understanding in several widely used languages, including English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian.

Long-form Context: The models can handle extended audio sessions, processing up to 30 minutes for transcription and 40 minutes for understanding, thanks to a 32,000 token context length.

Two Variants for Diverse Needs: Mistral has released two versions of Voxtral:

Voxtral Small (24 billion parameters): Designed for high-performance, production-scale deployments, competing with leading proprietary models like ElevenLabs Scribe, GPT-4o-mini Transcribe, and Gemini 2.5 Flash.

Voxtral Mini (3 billion parameters): A more compact version optimized for local and edge deployments, and a dedicated “Voxtral Mini Transcribe” API for cost-sensitive, transcription-only use cases, claimed to outperform OpenAI’s Whisper for less than half the price.

Availbilty & Pricing

Developers can access Voxtral through Mistral’s API, with pricing starting at an impressive $0.001 per minute, or by downloading the model weights from Hugging Face.

Users can also experience Voxtral’s capabilities through Mistral’s Le Chat platform.

What’s Next?

Mistral plans to add advanced features like speaker identification, emotion detection, audio segmentation, timestamps, and non-speech audio recognition.

A webinar with Inworld showcasing voice-based agents is scheduled for August 6.

News Gist

Mistral AI has launched Voxtral, its first open-source audio model, featuring high-accuracy transcription, multilingual support, and voice function calling.

Available under Apache 2.0, Voxtral rivals Whisper and ElevenLabs while offering enterprise-grade AI at half the typical cost.

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Binger
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.