Corporate Research & Development Center

Toshiba's Time-series Waveform Anomaly Detection AI Minimizes
Missed Anomalies or False Alarms Offering High Explainability
-Improves anomaly detection by 7%, and will contribute to failure prediction and efficient maintenance of manufacturing equipment and infrastructure facilities-

June 2, 2020
Toshiba Corporation

TOKYO ─Toshiba Corporation (TOKYO: 6502) has developed “Learning Time-series Shapelets for Optimizing Partial AUC” (LTSpAUC)(Note 1), a recent advance in applying machine learning to analysis of time-series waveform data that delivers approximately 7% more accurate anomaly detection than current technologies.

Applying AI to analysis of the waveform data collected by sensors (time series instances) has produced techniques for classifying waveforms, and for learning shapelets, segments in the time series that can be used to classify the instances. This has led to studies of how to learn from both classifiers and shapelets. LTSpAUC can recognize both the classifier and related shapelets simultaneously, and automatically apply positive (abnormal) or negative (normal) classifications, minimizing missed anomalies, or false alarms under the condition that keep less than maximum false alarms, or missed anomalies. Toshiba, a leader in applying AI to manufacturing solutions, has already made important contributions in this area(Note 2).

The market for IoT systems for monitoring equipment is expected to grow to US$3.5 billion a year by 2024(Note 3), as sensors are increasingly used to detect and collect data on variables such as vibration, temperature, voltage and electric current. It is essential that reliable and useful results can be derived from the collected data. LTSpAUC advances this goal by analyzing time-series waveforms in the data, and can provides experts in various domains, from medicine to manufacturing, with data they can apply.

Figure 1: Overview of LTSpAUC

The people who make decisions based on AI must be able to understand and trust the results. Stopping a machine or process cuts into productivity and incurs costs, especially if the decision is based on a false alarm, while overlooking or failing to identify an anomaly can cause its own problems. Experts in the infrastructure field typically want explainable AI with accurate, transparent results, a classification system that reduces missed anomalies under the condition of low false positive rate (FPR).

A widely applied measure for evaluating data mining and machine learning is pAUC, where pAUC refers to the partial Area Under the receiver operator characteristic Curve. The objective is to maximize the pAUC and so reduce overlooked anomalies while keeping FPR less than predefined threshold. However, studies for optimizing the pAUC usually do not consider time-series waveform characteristics, which can have real-world results— particularly in respect of overlooking anomalies in data, and false alarms that result in increased equipment downtime and higher workloads on maintenance and service engineers.

LTSpAUC’s ability to learn both the shapelet and the classifier allows performance optimization that reduces missed anomalies under the condition that lower the FPR less than the permitted maximum FPR threshold. By leaning multiple waveform patterns, the technology realizes an acceptable level of trade-off where it is able to minimize missed anomalies and avoid errors, while maintaining a performance within an acceptable range of false positives. This approach not only realizes the difficult target of minimizing missed anomalies or false positive, it also makes it possible to learn rare anomalous patterns that are beyond the scope of other methodologies. Another advantage of LTSpAUC is that it is also possible to check the waveform patterns and understand how and why the AI decides equipment is an anomaly or not.

Evaluation of LTSpAUC’s effectiveness using the UCR Time Series Classification Archive, a repository of time series datasets for use in research and testing, confirmed a 7% improvement over recently reported technologies(Note 4) in respect of reducing missed anomalies and FPR(Note 5). Tests conducted by Toshiba also found that LTSpAUC can identify shapelets that contribute to improved pAUC performance in commonly used time-series data sets(Note 6) from semiconductor manufacturing, and from results of experiments to measure roller degradation in office automation equipment.

As Toshiba promotes its own digital transformation toward becoming one of the world's leading cyber-physical systems (CPS) technology companies, it will continue to develop capabilities and know-how in AI technologies that improve manufacturing productivity, provide reliable early identification of potential problems, and contribute to more time- and cost-efficient maintenance of equipment and systems. In line with this, Toshiba will enhance LTSpAUC’s performance and scope by expanding it to a multivariate scale, and applying it to manufacturing equipment and infrastructure facility.

(Note 1)
Akihiro Yamaguchi et al., LTSpAUC: Learning Time-series Shapelets for Optimizing Partial AUC, SIAM International Conference on Data Mining (SDM20) pp.1-9, 2020/5.

https://epubs.siam.org/doi/pdf/10.1137/1.9781611976236.1

(Note 2)
LTSpAUC was introduced at the Society for Industrial and Applied Mathematics (SIAM) International Conference on Data Mining (SDM) . Toshiba was also invited to contribute to the Big Data special issue of Best of SDM 2020 journal.

https://www.siam.org/conferences/cm/conference/sdm20

(Note 3)

https://www.marketresearch.com/MarketsandMarkets-v3719/Machine-Condition-Monitoring-Technique-Vibration-11495425/ (MarketResearch.com)

(Note 4)
Technologies introduced in the International Conference "Josif Grabocka et al. Learning Time-Series Shapelets, KDD2014" and International Conference "Shoumik Roychoudhury et al. Cost Sensitive Time-series Classification, ECML/PKDD2017".
(Note 5)
Using Critical difference diagrams from the statistical area, we have shown that the pAUC of the proposed method has improved significantly.
(Note 6)
Using UCR Time Series Classification Archive data sets