This page partially uses JavaScript. This page may not operate normally when these functions are not supported by your browser or the setting is disabled.

Toshiba AI Technology Catalog

Media recognition

Large vocabulary speech recognition

Recognize spoken language audio with high accuracy, and detect words not needed to understand the meaning (e.g., fillers and hesitations).

Using neural networks, created a model for fillers and hesitations with the same framework as short-term speech equivalent to one syllable.
Depending on the application, display or remove detected fillers and hesitations.

Zoom Image

References:
Toshiba Clip: “Voice instantly transformed into text: AI changes Japanese workstyles” (in Japanese)

Applications

Supports increased understanding of online lectures and presentations.
Guaranteeing information accessibility for the hearing impaired
Supports the creation of meeting minutes.

Benchmarks, strengths, and track record

Achieves a speech recognition rate of 85%, which is considered sufficient to understand spoken content.

Inquiries

Inquiries to Toshiba Corporate Laboratory (Komukai region)

Please include the title “Toshiba AI Technology Catalog: Large vocabulary speech recognition” or the URL in the inquiry text.
Please note that because this technology is currently the subject of R&D activities, immediate responses to inquiries may not be possible.

References:

H. Fujimura, M. Nagao and T. Masuko, “Simultaneous Speech Recognition and Acoustic Event Detection using an LSTM-CTC Acoustic Model and a WFST Decoder”, ICASSP 2018.
Toshiba’s Auto-Subtitling System for Online Classes is a Win-Win for Educators and Students

Return to “Toshiba AI Technology Catalog” page