Corporate Research & Development Center

Expressive Text-to-Speech Synthesis for e-Book Reading

To provide new and enjoyable experiences for e-book readers, Toshiba has been developing expressive technologies related to e-book reading. We have now developed a technology that automatically estimates implicit emotions from the dialogues of an e-book.

Based on previously learned pairwise data composed of large volumes of sentence examples and emotion labels such as "joy," "anger," and "sadness," this technology assigns emotion labels to the text data by means of a complementary naïve Bayes-based method, and classifies unlabeled data by integrating this method with an expectation-maximization (EM) algorithm. The estimation results allow a text-to-speech system to actualize the expressive reading of dialogues through the selection of a voice font or a prosodic parameter associated with the emotion.

We are promoting the development of a product module with a reading function in which this technology is embedded, as well as the application of this technology to voice content authoring tools.

Basic approach to expressive e-book reading

Basic approach to expressive e-book reading