ElevenLabs Unveils Scribe, a High-Accuracy Speech-to-Text Model
ElevenLabs has introduced Scribe, a stand-alone speech-to-text model, claiming the highest accuracy in the industry.
Key points
- Scribe transcribes speech in 99 languages, with features like word-level timestamps, speaker diarisation, audio-event tagging.
- Scribe achieves the lowest word error rate for automated transcription, with 98.7% in Italian, 96.7% in English.
- ElevenLabs tested Scribe model using FLEURS and Common Voice benchmarks, outperform Gemini 2.0 Flash, Whisper Large V3, and Deepgram Nova-3 across all supported languages.
- Scribe offers API-based access, making it ideal for businesses requiring high-volume transcription services.
- Scribe is priced at $0.40 per hour of input audio, and for the next six weeks, it offers an extra introductory discount.
- Additionally, ElevenLabs plans to release a low-latency version, positioning Scribe as a viable option for real-time communication tools.
Scribe vs. The Competition: The Future of AI-Powered Transcription
ElevenLabs’ Scribe faces competition from leading speech-text models such as OpenAI’s Whisper, Deepgram Nova-3, and Google’s Gemini 2.0 Flash.
While Whisper and Deepgram offer robust multilingual support, Scribe excels with a 96.7% English accuracy rate and low word error rates across 97 languages.
Although Gemini2.0 Flash AI-powered enhancements, it falls short in raw transcription accuracy.
Scribe’s combination of speed, accuracy, and API integration positions it as a leading choice.
News Gist
ElevenLabs has launched Scribe, a speech-to-text model supporting 99 languages with industry-leading accuracy (98.7% in Italian, 96.7% in English).
It outperforms competitors like Whisper and Gemini 2.0 Flash, offers API access, and features introductory discounts for early adopters.