In order to ensure stable device operation and greater customer satisfaction, many manufacturing companies keep components in stock long after the products they correspond with have been discontinued. Avoiding the risk of inventory depletion produces major maintenance component procurement and storage costs. Toshiba Digital Solutions has taken on the challenge of highly accurate failure prediction for the maintenance service it provides for its MAGNIA series of PC servers, seeking to improve service quality by optimizing its inventory. We leveraged our unique analysis methods, AI, and historical data regarding MAGNIA manufacturing and maintenance, collected for over a decade, to create a system that performs failure prediction at the individual component level. Let's look at this innovative prediction system, which makes full use of our analysis modeling and data mining technologies.
The longstanding challenge of stable operation
The MAGNIA series of PC servers is used to support a variety of business and societal applications through its high reliability and extendibility. The MAGNIA lineup is extensive, including 1Way servers perfect for stores and small offices thanks to their compact chasses and 2Way servers that offer the high performance needed for core systems. The series is notable for its Build To Order (BTO) production system, which enables customers to freely select CPU, memory, hard disk, and other options. This makes it possible to design servers optimized for customer uses, scales, and installation environments.
For all devices, including MAGNIA servers, the longer a device is in operation the higher the likelihood of a hardware failure. Manufacturers that produce and sell servers have maintenance systems for immediately responding to repair requests after products are shipped, so that the products can be kept in optimal condition over long-term usage periods.
Having a maintenance system is not enough to rapidly provide appropriate maintenance. The components required for maintenance must also be available for use when required. Technical innovations are shortening component product lifespans, but in order to meet maintenance demand, manufacturers must procure and keep sufficient component quantities to avoid running out of maintenance components or supplies before the ends of product support lifespans.
To avoid depleting maintenance component inventories companies must keep sufficiently large spare component inventories, which creates the problem of ballooning inventory storage and procurement costs. For Toshiba Digital Solutions, optimizing maintenance component inventories while at the same time enabling customers to securely use their MAGNIA products was a major, long-running challenge.
The state of the collected data presented a major bottleneck
In order to tackle this challenge, we used AI technology to predict when components would fail, and in what numbers, with a high level of accuracy. The goal for this innovative prediction system was to foresee demand for the over 1,000 maintenance components, supplies, and limited-lifespan components used in MAGNIA systems, to reduce inventory levels while preventing reserves from becoming depleted.
The key to creating this prediction system was selecting the right analysis approach. Generally speaking, failure properties differ for printed circuit board (PCB) components and mechanical components. Even for identical components, lifespans vary depending on the usage environments of the devices in which they are installed, so a standard time-series approach would not be capable of producing highly accurate predictions.
Instead, the approach we chose was to create a model which linked operation start dates and failure dates for individual components in the data we had accrued, enabling it to predict when failures would occur. It used statistical methods to model remaining lifespans in a method called survival analysis, widely used in industries such as the medical and manufacturing industries, in order to arrive at more accurate prediction results.
Survival analysis focuses on the amount of time that elapses before an event occurs, and its relation to the event itself. Plotting product lifespans in the form of cumulative survival rate (y axis) over time (x axis) produces a step-like graph (survival curve). This graph is notable for being able to predict failure incidence rates for components under specified conditions, without being impacted by factors such as outliers.
The problem lay in preparing the data to be entered into the survival analysis tool. No matter how advanced an AI system is, to produce highly accurate inferences it must correctly process past data, so as much accurate past data as possible must be prepared.
However, we found that there were no records of usage start dates for many of the components used in MAGNIA products. We therefore needed to pinpoint component operation start dates based on manufacturing history data, which included both data from the pre-processing stage of assembly and data from the final manufacturing processes.
Looking at the maintenance component replacement history data managed by the maintenance service division, much of the data was in the form of hand-written notes by maintenance personnel in the field, and there were numerous cases of incorrect or missing data. Furthermore, some components were replaced because it was clear that they had failed, but other components were replaced because of a possibility of failure. In the case of the latter, we needed to use verification data from after the components were retrieved from the field to determine if the components had actually failed, and to extract only the components that had failed by excluding normal components.
Using outstanding analysis methods to generate input data
Toshiba Digital Solutions developed technologies for analyzing and utilizing the varied data gathered from the field through its wide-ranging business activities. Furthermore, we have been at the forefront of data mining technologies which can be used to perform highly accurate analysis even of diverse data.
We used our experience and know-how, accumulated through the years, to take on this challenge. We used advanced database technology and data science to generate a sufficient amount of reliable input data. It took a year to extract the operation start data and failure data from the data, which spanned various formats and was stored in scattered locations, and to verify the extracted data. An analysis platform was created for performing data cleansing, such as input error and omission correction, on data, integrating the data, generating sufficient input data, and prediction, with a high level of precision, failure incidence timing for individual components.
The results of verification testing were excellent. The prediction system's effectiveness was verified by comparing the number of failures it predicted (the predicted total number of procurements required) against actual total numbers of procurements for all components requiring maintenance. For PCB components, the actual amount of excess inventory over the total number of failures was 18 components, but the verification showed that the prediction system would have reduced this to just four components. Furthermore, for mechanical components the excess inventory would be just one component, indicating that compared to the conventional inventory management approach, MAGNIA's maintenance costs could be reduced by roughly 30% (Fig. 1). The AI-based failure prediction system was proven to have tremendous potential for optimizing maintenance component inventories.
Optimizing inventory management using highly accurate failure forecasts would also have the potential to affect the formulation of production plans that anticipate future demand and the optimization of personnel placement, greatly accelerating operation process innovation and sweeping business model reforms (Fig. 2).
Based on these results, we have begun full-fledged implementation of the failure prediction system for MAGNIA, as well as applying it to other industrial devices produced by the Toshiba Group for use in social infrastructure, the energy industry, and the like. We plan to use the AI findings and know-how produced by these efforts to help solve the problems faced by our customers and to transform business.
* The corporate names, organization names, job titles and other names and titles appearing in this article are those as of February 2018.