Toshiba Develops Similar Data Retrieval AI That Detects Past Operating States from Large and Complex Plant Data with World-Class Accuracy

-Enabling rapid access to past response records and contributing to efficient Ooperation and maintenance-

12 Nov, 2025
Toshiba Corporation

Overview

KAWASAKI—Toshiba Corporation (Toshiba) has developed a Similar Data Retrieval AI capable of detecting past operating data similar to the current operating state with world-class accuracy, using massive time-series data collected from thousands of sensors installed in large and complex plants. This AI quickly presents the occurrence dates and response records of similar past cases, supporting cause investigations and countermeasure planning and significantly contributing to stable plant operation and more efficient maintenance work.

In large plants and factories such as power plants, water treatment facilities, and chemical plants, anomaly detection through installed sensors and rapid responses based on past cases are essential. However, automatically detecting past operating conditions similar to an anomaly from vast and constantly changing sensor data has been difficult.

Toshiba has developed a proprietary Two-stage Auto-encoder*1, an AI for detecting signs of anomalies at an early stage. The newly developed Similar Data Retrieval AI applies this technology and uses deep learning to capture subtle differences in features of sensor data, enabling high-accuracy detection of similar conditions even from slight changes (Figure 1).

This AI can simultaneously analyze data from thousands of sensors and accurately capture subtle differences in operating states caused by adjustments to plant operating parameters or changes in environmental conditions such as temperature. It is the first AI to achieve high-accuracy similarity search under such complex conditions*2.

When tested using publicly available data from a pulp-and-paper mill*3, this AI improved the accuracy of similar case detection*4 by a factor of 1.8 compared to conventional methods*5, achieving world-leading performance*6. In further validation using operating data from an actual plant in operation, the AI was able to detect past similar data with 95% accuracy. This AI is expected to support rapid formulation of countermeasures in response to signs of anomalies or deterioration, enabling efficient operation and maintenance through condition-based maintenance (CBM)*7 and improving operating rates.

Toshiba will present the details of this technology at the ICDM 2025 AI4TS*8 to be held on November 12.

Figure 1: Anomaly Sign Detection AI and Similar Data Retrieval AI utilizing the Two-stage Auto-encoder to support stable plant operation

Development background

Large and complex plants are equipped with thousands of sensors to monitor a range of systems and equipment. To efficiently operate and maintain the plants that support the foundations of industry and society, it is important to monitor the ever-changing operating data from these sensors and detect and address anomalies at an early stage before their effects escalate. However, after detecting an anomaly, it is not easy to identify the cause and formulate countermeasures because of the wide range of equipment and systems involved. At many sites, the initial step is to search for similar past cases, but this process relies heavily on the experience and knowledge of skilled personnel. With the aging of skilled workers and ongoing labor shortages, there is growing concern that transfer of such knowledge will become increasingly difficult in the future.

Against this backdrop, there is a growing need for technologies that can rapidly and accurately reference past operating data and response records. The development of AI technologies capable of accurately detecting past cases similar to current operating data from vast volumes of sensor data is an urgent task.

Various types of AI for similar data retrieval are currently under development. However, in environments such as plants, where large numbers of sensors interact in complex ways, improving search accuracy remains a major challenge. Even during normal operation, sensor data such as temperature and pressure from pumps, piping, and other equipment fluctuate due to the intricate interplay between the behavior of individual components and the overall plant status.

With conventional technologies, it is difficult to accurately learn such subtle differences in operating conditions amid these complex fluctuations, and detecting similar cases has often proved challenging.

Features of the technology

To address this, Toshiba has developed a Similar Data Retrieval AI that uses a proprietary deep learning technology called the Two-Stage Auto-encoder to accurately detect past operating data similar to the current operating state. After detecting signs of an anomaly, the AI searches historical data for similar cases based on the current sensor data and quickly presents relevant information such as the date and time of occurrence, subsequent changes, and response records. This enables immediate reference to appropriate countermeasures when similar anomalies occur, supporting on-site decision-making.

This AI converts sensor data into feature values using the core Two-stage Auto-encoder technology, and learns subtle differences in those features through deep learning. This allows for highly accurate detection of past data with similar patterns in subtle anomalies that were previously difficult to identify (Figure 1).

A particularly important aspect is the kind of sensor data that are converted into feature values. Toshiba focused on the feature values generated by the Two-stage Auto-encoder. This technology simultaneously converts signals from numerous sensors into feature values and reconstructs the original signals from them. The Two-stage Auto-encoder takes into account complex interrelationships among multiple sensor signals and extracts differences in sensor data caused by changes in operating conditions as differences in feature values. Using these feature values as training data makes it possible to develop a highly accurate retrieval AI even for complex plant sensor data involving many interacting factors. Using these feature values to train the AI to learn the subtle differences across various operating states enables it to identify fine differences in operating states from large volumes of interrelated sensor data, significantly improving the accuracy of similar data retrieval (Figure 2).

In validation tests involving searches using publicly available data from a pulp-and-paper mill, the accuracy of similar case detection improved by a factor of 1.8 compared to conventional technologies, achieving world-leading performance. Additional validation using 10 years of operating data from an actual plant confirmed that highly accurate searches for similar cases were possible in approximately one hour.

In the past, when expert knowledge was unavailable, it could take several days to investigate similar cases. By applying this AI, similar cases can now be found and responded to in a short time, helping to resolve issues such as the aging of skilled workers and labor shortages.

Figure 2: A screen displaying search results for similar operating states. The system detects times at which the behaviors of multiple sensor data is similar*9

Future developments

Toshiba is currently verifying the effectiveness of this AI in cause investigations and countermeasure planning following the detection of signs of anomalies at multiple plant sites. Going forward, the company will continue research and development with the goal of bringing the technology to a practical level by fiscal 2026 onward, contributing to more efficient operation and maintenance of a wide range of plants and facilities, including power plants, water treatment facilities, and chemical plants.


  • https://www.global.toshiba/ww/technology/corporate/rdc/rd/topics/21/2112-01.html
  • Toshiba research, as of November 12, 2025.
  • Pulp-and-paper mill: Actual operating data from a paper mill. The target anomaly is paper breakage, which occurred 124 times. C. Ranjan, et al., “Dataset: Rare Event Classification in Multivariate Time Series,” 2019, arXiv:1809.10717.
  • Search accuracy: MAP@10 value. MAP@10 is the weighted average of correct detection rankings from 1st to 10th in search results. A higher value indicates a higher likelihood of displaying correct data at the top.
  • Multiple state-of-the-art similar data retrieval methods using machine learning, including E2USD (2024) and TS2Vec (2022). A comparison with each method is provided in the paper referenced in 8.
  • World’s highest MAP@10 score, as of December 7, 2021.
  • Condition based maintenance (CBM): A maintenance approach that monitors equipment conditions and performs maintenance in response to signs of anomalies or deterioration.
  • ICDM 2025 AI4TS: IEEE International Conference on Data Mining Workshop AI for Time Series Analysis, held on November 12 in Washington, D.C.
    S. Naito, K. Nakata, Y. Taguchi, “Deep Distance Measurement Method for Unsupervised Multivariate Time Series Similarity Retrieval”
  • Data: EEG Eye State dataset
    https://archive.ics.uci.edu/dataset/264/eeg+eye+state