Speech recognition and speech synthesis functions are often used in user interfaces, so they provide a great deal of value to users by having rapid, smooth response. Quick response is also one of the key features of our speech recognition and speech synthesis middleware. They respond far faster than speech recognition and speech synthesis systems using the cloud because our two middleware works on local devices and don’t need to wait time for transferring input data and receiving their processing result via the internet.
Small calculation cost and small memory footprint are also the major key features of our speech recognition and speech synthesis middleware. Our two middleware needs less memory to store data such as dictionaries and acoustic models. Such features are achieved by Toshiba's long years of research and development in the speech AI field.
The reduced computational cost means that the speech middleware can run on general-purpose processors, so the speech recognition and speech synthesis functions can be deployed using the hardware resources already available in devices*. This reduces development costs and the time taken to replace with high performance hardware.
* Devices running speech middleware must have the sufficient processing power to execute software functions that run on general-purpose processors.
Our two speech middleware uses fewer resources and thus stands out for its ability to operate independently on local devices without relying on the cloud for its primary speech recognition and speech synthesis processes. What's more, it can be used as a speech recognition and speech synthesis application programming interface (API).
Almost 30 languages* are supported, making it reliable and easy to use speech middleware for companies in Japan that are developing products for overseas markets, thanks to Toshiba's many years of experience with developing speech technology for the Japanese language, which is linguistically quite different from European languages, both in terms of vocabulary and grammar.
* The number of supported languages varies depending on the product version.
In addition to providing functions like these, we also provide technical support to customers who feel uneasy about their ability to fully utilize the middleware. In general, speech recognition performance is significantly affected by the acoustic environment, including ambient sound in the space where the speech recognition is used. Toshiba is able to inform the ways to tune parameters or to select keywords for better performance. Also, when using speech synthesis to convert text into speech, Toshiba is able to advise to tune the text. Even simple adjusting the way words are read or paused can be effective for the ease of listening. We have an extensive track record of speech middleware deployment, and this has provided us with a great deal of knowledge and expertise regarding speech middleware in embedded systems.
We supply two types of speech middleware that have these features, dedicated respectively to speech recognition and speech synthesis functions.
VoiceTrigger RECAIUS speech recognition middleware ("VoiceTrigger") provides a function for analyzing waveform data captured by a microphone, including ambient sounds, to identify speech containing pre-defined keywords ("spoken keywords"). ToSpeak RECAIUS speech synthesis middleware ("ToSpeak") converts text data into synthesized speech.