#speech-recognition
Every summary, chronological. Filter by category, tag, or source from the rail.
Tag · #speech-recognition
Microsoft's MAI-Transcribe-1.5: Production-Ready Speech Recognition
Microsoft's MAI-Transcribe-1.5 improves speech-to-text with 43-language support, 5x faster long-form inference, and entity-aware keyword biasing for enterprise accuracy.
MarkTechPost
NVIDIA's Nemotron 3.5 ASR: Efficient Multilingual Streaming Speech
NVIDIA's Nemotron 3.5 ASR is a 600M-parameter, cache-aware streaming model that transcribes 40 languages in real-time from a single checkpoint, offering configurable latency-accuracy trade-offs without retraining.
MarkTechPost
Building Robust Voice AI: Beyond Simple Transcription
Speaker diarization is essential for understanding conversations, but combining it with transcription is difficult due to overlapping speech, mismatched timestamps, and poor generalization of ASR models to multi-speaker environments.
AI EngineerShowing 3 of 3