AI-based Image Anomaly Detection Developed for Infrastructure and Plant Inspections, Capable of Handling Detection Conditions Under Ambiguous Prompt

-AI optimizes text prompts and combines them with normal images, reducing false positives to approximately half the conventional level-

12 Sep, 2025
Toshiba Corporation

Overview

KAWASAKI—Toshiba Corporation has developed a new AI that can accurately detect a wide range of anomalies and early-stage defects in inspections of aging infrastructure and plant facilities such as railways, roads, factories, power installations, and plants. In addition to conventional image detection methods, the AI enables specification of inspection conditions even by ambiguous text instructions used on-site. This technology promotes automation for inspections in hazardous or hard-to-access locations, reduces labor requirements, and improves inspection accuracy, thereby contributing to the long-term stable operation of social infrastructure and advancing the digital transformation (DX) maintenance and inspections.
The difficulty in acquiring images of inspection sites in hazardous or remote locations has been a barrier to the introduction of AI into the inspection of infrastructure and plant facilities. Conventionally, a large number of site images had to be collected for AI to detect irregularities in the images. Toshiba previously developed a method called “difference-based image anomaly detection technology^*1,” which can identify anomalies by comparing inspection images with only a few normal images. This technology can accurately detect anomalies even when the inspection image is taken from a different position or angle than the normal image. It can also reduce false positives, where distinctive but normal patterns are mistakenly identified as anomalies. However, there are limits to false-positive suppression in images with complex backgrounds or surrounding structures, and preparing a large number of normal images once again became an issue.
The newly developed technology utilizes a vision-language model (VLM) that integrates images and language, enabling the AI to optimize even ambiguous text prompts and flexibly define detection conditions. In addition, combining this approach with normal images using the difference-based image anomaly detection technology achieves highly accurate anomaly detection while suppressing false positives, even in situations where it is difficult to obtain large numbers of normal images is difficult. Toshiba verified the effectiveness of this technology using a public dataset and confirmed that it reduced false positives to about half compared with conventional methods^*2, demonstrating top-level performance^*3.
Toshiba will present the details of this technology at ICIAP2025 (23rd International Conference on Image Analysis and Processing), which will be held from September 15 to 19, 2025.

Development background

Infrastructure and plant facility maintenance is becoming increasingly important to ensure the long-term stable operation of social infrastructure. In Japan in particular, roads, bridges, and tunnels built during the country’s period of rapid economic growth are now deteriorating rapidly. At the same time, challenges such as the aging and shortage of inspection personnel, as well as the heavy burden of work in hazardous or remote locations, are becoming more serious. Against this backdrop, the use of AI analysis of inspection images taken by drones, robots, and fixed cameras is expected to enable safer and more efficient inspections and the early detection of defects.
There are three main approaches to detect the many unspecified and diverse types of anomalies that occur in infrastructure and plant facilities using AI: (1) Preparing large volumes of training data for each type of anomaly and training a model accordingly, (2) precisely aligning inspection images with images taken under normal conditions and comparing pixel-level differences, and (3) using generative AI to detect anomalies by providing text prompts describing the image and the anomalies to be found (Figure 1). For example, by collecting a large number of images showing various crack and rust patterns along with normal images, and training a model to recognize them, specific anomalies such as cracks or rust can be detected using method (1). However, anomalies and early-stage defects found in infrastructure and plant facilities are not limited to cracks and rust. They also include water or oil leaks, fallen objects, foreign material, and detached components. It is difficult to comprehensively prepare training data in advance regarding these types of anomalies, because they occur infrequently. In addition, at sites such as transmission towers in mountainous areas, the undersides of bridges, offshore wind turbines, or the reverse sides of solar panels, inspections are often dangerous, access is restricted, or traveling to the site is burdensome. In such environments, collecting the large numbers of training data needed or capturing images that are precisely aligned with normal-condition images is extremely difficult.
Toshiba has been developing a difference-based image anomaly detection technology based on method (2). This approach utilizes feature representations extracted from deep learning models that have been pretrained on large image datasets. The system identifies the locations of anomalies by calculating the differences between the deep feature representations of inspection images and normal images. Because it uses a pre-trained model, there is no need to collect data and retrain for each inspection site, allowing the technology to be applied immediately across various locations while maintaining high detection accuracy. In addition, Toshiba’s method can suppress false positives even when the inspection image is taken from a different position or angle than the normal image. However, its ability to suppress false positives is limited in cases where the image contains complex backgrounds or surrounding structures (Figure 2). Method (3) is also effective in reducing false positives, but ambiguous instructions can lead to decreased detection accuracy.

Figure 1: Example of anomaly location detection using images

Features of the technology

To enhance detection accuracy and flexibility, Toshiba combined method (2), which detects anomalies from images, with method (3), which uses language-based instructions. To maximize the effectiveness of this hybrid approach, the company focused on the VLM, a technology that has recently gained attention. By leveraging a VLM, the system can semantically identify the most relevant image content based on user-provided text inputs. However, input of ambiguous text prompts can lead to unstable detection accuracy. Accordingly, Toshiba implemented a mechanism that automatically refines ambiguous expressions into more precise ones, thereby enabling flexible and highly accurate anomaly detection.
For example, when specifying an “obstacle” on the road as the detection target, alternative expressions exist such as “barrier” or “obstruction.” When a user inputs the instruction “detect obstacles” in text form, the proposed method automatically generates similar expressions like “barrier” and “obstruction.” These alternative expressions are also treated as detection candidates: the normal image is input into the VLM along with text instructions such as “Barrier is a detection target” and “Obstruction is a detection target,” and by comparing the anomaly scores output by the VLM, the system selects the optimal detection target expressions.
Furthermore, by calculating the differences between the deep feature representations of normal images using the difference-based image anomaly detection technology, it became possible to extract patterns that are likely to be mistakenly detected as anomalies due to differences in appearance. By adjusting the anomaly scores accordingly, the system can to further suppress false positives (Figure 3).
The proposed method successfully reduced the false positive rate by approximately half compared with conventional methods using a VLM, based on evaluations with a public dataset (Figure 4).

Figure 2: Limitations of the conventional difference-based image anomaly detection technology (conceptual illustration)

Figure 3: Key points of the proposed method

**1：ShanghaiTech Campus (STC) dataset
・Surveillance footage captured across 12 different scenes using fixed-position cameras
・Liu, W., W. Luo, D.L., Gao, S.: Future frame prediction for anomaly detection – a new baseline. In: Proc. CVPR (2018)
**2： AprilGAN、Chen, X., Han, Y., Zhang, J.: A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2:
1st place on zero-shot ad and 4th place on few-shot ad. arXiv preprint arXiv:2305.17382 (2023)
**3： Image anomaly detection technology using difference between an inspection image and reference images | Toshiba AI | Toshiba

Figure 4: Comparison of accuracy with conventional technology using a public dataset^**1

The proposed method can be applied to various infrastructure and plant facilities, including railways, roads, factories, power systems, and plants. For example, the method can detect beehives formed on the undersides of solar panels or bridges using drone imagery; anomalies in hard-to-capture locations such as foreign objects or broken cables; and defects such as cracks or rust on equipment, broken transmission lines, or detached components using images captured by cameras mounted on moving vehicles. In addition, it is capable of detecting low-frequency anomalies in infrastructure and plant facilities, such as water leaks, oil leaks, and cracks (Figure 5).

Figure 5: Anticipated applications of the proposed method

Future developments

Going forward, Toshiba aims to apply this technology to inspection operations in collaboration with its Railway Systems Division, Energy Aggregation Division, and other relevant departments. To support practical implementation, the company will continue research and development to enhance system functionality and detection accuracy, while also working with the ICT Solutions Division and others to create new services.

https://www.global.toshiba/ww/technology/corporate/rdc/rd/topics/22/2205-01.html
AprilGAN、Chen, X., Han, Y., Zhang, J.: A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad. arXiv preprint arXiv:2305.17382 (2023)
Unsupervised Anomaly Localization In the Wild via Token Optimization and Test-Time Score Correction、Naoki Kawamura、Gaku Minamoto、Tomohiro Nakai、Satoshi Ito、Osamu Yamaguchi、Takahiro Takimoto、23th International Conference on Image Analysis and Processing 2025