|
Natural Language Processing Technologies —On the Occasion of the IEEE Milestone Award for Toshiba's Japanese-Language Word Processor Innovation Creation by Natural Language Processing MORI Kenichi Toshiba Natural Language Processing Technologies Starting from Japanese-Language Word Processors:History and Prospects SUMITA Kazuo Natural language is the fundamental means by which we convey our intentions to others and record our knowledge. Since its development of the first Japanese-language word processor, Toshiba has been further advancing natural language processing technologies and developing applications and services using them. In 2008, the Japanese-language word processor was recognized as an IEEE (Institute of Electrical and Electronics Engineers, Inc.) Milestone. We will continue to promote the research and development of natural language processing for multiple languages in order to provide new intelligent solutions that improve the efficiency of office work, to realize new intelligent functions for digital media products, and to create innovative products such as speech-to-speech translators. Introduction of IEEE Milestone Program Recognizing Important Historical Achievements OHNO Eiichi The IEEE (Institute of Electrical and Electronics Engineers, Inc.) Milestones are a program to recognize important historical achievements in the electrical, electronic, information, and communication system fields, which are the technology areas of IEEE. The IEEE Milestones are awarded in recognition of technological innovation and excellence for the benefit of society and industry. Seventy-eight milestones had been awarded worldwide,including seven in Japan, as of December 2007. In 2008, the first Japanese-language word processor, which was developed by Toshiba Corporation in 1978, was selected as the recipient of the eighth IEEE Milestone in Japan. Machine Translation Technology to Accelerate Globalization of Intellectual Property KUMANO Akira There is a large volume of intellectual property documentation, including patent documents, in Japan. Although these documents are worth accessing from other countries, very few of them are written in English. Machine translation is essential as a means of translating them into English. However, specific problems are encountered in the machine translation of patent documents, particularly the difficulty of translating the long sentences in claims. Pre-editing is of assistance in this area. Dictionary building technology using a parallel corpus is also useful for the compilation of technical terms. Toshiba has developed a machine translation technology as an accumulation of these technologies. This machine translation technology makes it possible to realize high-quality translations for widespread use in commercial products and Internet services. Natural Language Information Retrieval for XML Database System MANABE Toshihiko / KOKUBU Tomoharu Toshiba has been developing an extensible markup language (XML) database system with flexible search functions. To enhance the search capability of this XML database system, we have newly developed a natural language information retrieval function on the system. XML documents are ranked in descending order by relevance scores in response to a user’s natural language query. In addition, both query expansion and query-based document summarization are realized in this function. This natural language information retrieval function allows users to utilize the query language of the XML database in combination with Boolean search and full-text search. Advanced Text Mining Technology for Corporate Reputation Information SAKURAI Shigeaki Toshiba has developed a technology that makes it possible to automatically discover, at an early stage, important threads that might cause significant damage to a particular corporation or organization from sets of articles related to specific topics on bulletin board sites. Using both text mining and natural language processing techniques, this technology performs original characterization of threads and can extract important threads and expressions related to topics in the threads using these characterizations. We evaluated the effectiveness of the newly developed technology using articles collected from bulletin board sites, and confirmed that the results based on the technology corresponded to user-based results with high probability. XML Structuring Technology for Various Types of Document Applications FUME Kosei / ISHITANI Yasuto / GOTO Kazuyuki The dramatic increase in the volume of electronic documents in the office environment has spurred demand for easy access to information resources and for their effective management. Toshiba has developed an extensible markup language (XML) document structuring technology that facilitates exploitation of information resources corresponding to these needs. Utilizing natural language processing and XML, this technology makes it possible to extract document attributes, such as logical elements, logical structures, and term semantics, and embed them as machine-processable metadata. We have achieved various applications based on this technology, such as a document transformation system from paper to XML, a document categorization system, and an information access interface. Japanese/ Chinese/ English Hybrid Speech Translation System CHINO Tetsuro/ KAMATANI Satoshi Toshiba has proposed a new hybrid machine translation (MT) method to overcome the language barrier in cross-linguistic communication. The proposed method utilizes both of two complementary methods of MT; namely, the example-based MT (EBMT) method that can produce natural translations within restricted domains, and the rule-based MT (RBMT) method that produces relatively halting translation with wide coverage. We have developed an experimental hybrid speech translation system for Japanese, Chinese, and English, and confirmed a task achievement rate of about 70% within about two minutes in typical tasks in travel situations through field tests conducted in Japan, China, and Australia. Approach to Development of Business Support Solutions Utilizing Japanese-Language Analysis Technologies HAYAKAWA Rumi/ MATSUMOTO Shigeru/ SAITO Yoshimi Toshiba Solutions Corporation has been advancing the research and development of technologies for more practical use of business documents that are being created and stored every day. For this purpose, our technologies support improvements in the quality of documents and classification accuracy. We are currently focusing on the utilization of business document checking technology, business document classification technology, and paraphrase searching technology. Utilizing these technologies, we are building and evaluating prototype systems such as a document checking system for offshore development with China and an automatic classification system for patent documents. |