We’ve trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.
Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.
Looks amazing. The fact that it has multilingual data makes it specially interesting — at least for those of us that speak with an accent.
Applications for automatic speech recognition (ASR) go way beyond than dictation. But I think the UX of Voice/Keyboard/Pen input still lacks. There’s no “mouse pointer” equivalent — yet?.