ElevenLabs Scribe Transcribe Engine
Tim von Känel, Flavio Schneider, on ElevenLabs blog:
Scribe, our first Speech to Text model, is the world’s most accurate transcription model. Built to handle the unpredictability of real-world audio, Scribe transcribes speech in 99 languages, featuring word-level timestamps, speaker diarization, and audio-event tagging—all delivered in a structured response for seamless integration.
Scribe is engineered for precision. In FLEURS & Common Voice benchmark tests across 99 languages, it consistently outperforms leading models like Gemini 2.0 Flash, Whisper Large V3 and Deepgram Nova-3. Whether it’s meeting summaries, movie subtitles, or even song lyrics, Scribe delivers the lowest automated transcription word error rate in Italian (98.7%), English (96.7%) and 97 other languages.
Missed this when it came out a few days ago but the claims are impressive. I’ve been able to test it on the Whisper Memos app. Seems to work fine1. Hopefully Superwhisper will have access to it and I’ll be able to play with it soon on macOS. Exciting times!
Being able to transcribe anything at all with any app is magical. Fine here means “as magical as usual”.↩︎