Toshiba Develops AI that Segments Regions of Individual Packages in Images from Visible-Light Cameras with World's Highest Accuracy
-Aims to commercialize logistics robots capable of high accurate recognition without onsite pre-learning in FY2021. Promoting accelerating automation of logistics industry-
Toshiba Corporation
TOKYO―Toshiba Corporation (TOKYO: 6502) has developed AI that accurately estimates regions(Note 1) of randomly piled individual objects in images taken with standard, visible-light cameras. Logistics robots integrating the AI perform highly accurate unloading and picking, and in tests against a public-data set(Note 2) the AI achieved 45% improvement over the past, the world’s best record in respect of measurement accuracy(Note 3). Because it uses images from a standard camera, the AI has a much shorter learning process than AI based on three-dimensional sensors. It also delivers easy implementation without any onsite pre-installation learning.
Toshiba aims to commercialize unloading robots with the AI in FY2021. Details of the AI will be presented at ACCV 2020, an international computer vision conference that will be streamed online from November 30 to December 4, 2020.
Originally used to transport packages in warehouses, logistics robots are now also unloading and picking packages. This progress is driving a global logistics robots market that is expected to grow from US$4.35 billion in 2018 to approximately US$20.29 billion in 2027(Note 4). Unloading and picking robots must be able to correctly recognize and handle objects of various shapes and sizes, which requires technologies to identify the regions of individual objects, even those randomly piled up and overlapping each other in images taken from above.
One approach uses 3D sensors that can measure depth and accurately identify the regions of overlapped packages, but such sensors are expensive, and there is the additional load of collecting three-dimensional data for the learning process. Use of images from a standard camera has attracted interest as a low-cost solution, but until now it has been linked to the risk of the AI misidentifying multiple objects as a single object; a trade-off between cost and efficiency and accuracy.
Toshiba’s new AI uses points to extract object regions. It delivers highly accurate estimates of the regions of individual packages from standard camera images, even for randomly piled, overlapping packages.
One current method(Note 5) of object area extraction that uses a standard camera recognizes candidate areas of objects by containing each identified object in an image in a rectangle, and detecting the area of objects in rectangles as the pixel-wise masks. However, this method returns inaccurate results for overlapping objects—if objects significantly overlap each other, so too do the rectangles, and the AI identifies them as a single object (Figure 1).
Toshiba’s AI resolves this problem. It recognizes a smaller part of the image as a candidate object, and it can estimate each object area accurately without mistaking multiple overlapping objects as one.
In operation, the AI examines every pixel in an image against its learned neural network. It looks for feature values characteristic of particular objects, and identifies pixels with similar feature values, indicating individual objects. It then brings these pixels together, and determines a specific pixel as the representative point of a particular object. It finally estimates the region of the object by assessing the pixels with similar feature values against the candidate point.
Testing confirms that Toshiba’s new AI successfully deploys these technologies to make highly accurate estimates of the regions of individual packages in images, even when packages overlap (Figure 2). Tested against a public-data set(Note 2) (Figure 3), it achieved 45% improvement over the past, the world’s best performance for measurement accuracy.
With the WISDOM dataset (SD Mask R-CNN)
Aiming to support further acceleration of automation of the logistics industry, Toshiba will launch unloading robots with the new AI into the market in FY2021.
- (Note 1)
- “Estimating regions” refers to recognizing objects from their features, even if only partially visible.
- (Note 2)
- WISDOM Dataset: https://sites.google.com/view/wisdom-dataset/dataset_links (SD Mask R-CNN)
- (Note 3)
- A figure comparing the correct answer rate when the overlap between the estimated area and the correct answer area is 75% or more. Research by Toshiba (November 2020)
- (Note 4)
- Source: Market research report “Logistics Robots Market to 2027 - Global Analysis and Forecasts by Function, Industry, Robot Type” by Global Information, Inc.
- (Note 5)
- Mask-RCNN by Kaiming He et al., winner of the best paper award at ICCV2017