- Media recognition
Large vocabulary speech recognition
Recognize spoken language audio with high accuracy, and detect words not needed to understand the meaning (e.g., fillers and hesitations).
- Using neural networks, created a model for fillers and hesitations with the same framework as short-term speech equivalent to one syllable.
- Depending on the application, display or remove detected fillers and hesitations.
References:
Toshiba Clip: “Voice instantly transformed into text: AI changes Japanese workstyles” (in Japanese)
Applications
- Supports increased understanding of online lectures and presentations.
- Guaranteeing information accessibility for the hearing impaired
- Supports the creation of meeting minutes.
Benchmarks, strengths, and track record
- Achieves a speech recognition rate of 85%, which is considered sufficient to understand spoken content.
Inquiries
Please include the title “Toshiba AI Technology Catalog: Large vocabulary speech recognition” or the URL in the inquiry text.
Please note that because this technology is currently the subject of R&D activities, immediate responses to inquiries may not be possible.
References:
- H. Fujimura, M. Nagao and T. Masuko, “Simultaneous Speech Recognition and Acoustic Event Detection using an LSTM-CTC Acoustic Model and a WFST Decoder”, ICASSP 2018.
- Toshiba’s Auto-Subtitling System for Online Classes is a Win-Win for Educators and Students