Overview
TOKYO—Toshiba Corporation has developed the world’s first AI (*1) for precise control of complex robot operations using ‘offline reinforcement learning’ with a small amount of data. Toshiba evaluated this AI by simulation of eight types of tasks, such as picking and placing objects, in a publicly available benchmark environment. The average success rate, which was 36% with conventional methods, improved to 72% with the newly developed AI, achieving world-class precision (*2).
Reinforcement learning is performed through an interactive process of trial-and-error, but the method of learning from pre-collected data is known as ‘offline reinforcement learning’. Conventionally, this method required thousands of training data to improve precision, and data preparation took several weeks to a month or more. Developed AI can increase precision with a small amount of training data (about 100), allowing the collection of training data in as little as half a day.
Automation can be promoted in fields where there is limited data available for collecting training data or where trial-and-error in data collection is challenging, including the operation and autonomous driving of manufacturing equipment and medical devices. This technology can contribute to addressing labor shortages with ensuring safety.
This AI technology is a collaborative creation with Professor Masashi Sugiyama, Director of the RIKEN Center for Advanced Intelligence Project and a professor at the Graduate School of Frontier Sciences at the University of Tokyo (*3). It has developed a technique for learning two-step control, which involves cutting out areas of interest from images captured of a robot arm’s work process and adjusting the position of robot arm. This has led to a significant improvement in precision.
Toshiba will present the details of this technology at the international conference ICRA (IEEE International Conference on Robotics and Automation), one of the most prestigious international conferences in the Robotics field, to be held from May 13 to 17 in Yokohama, Japan.
Development background
In recent years, automation using robots has been rapidly advancing in various industrial fields such as manufacturing, maintenance, and logistics due to labor shortages and a decline in skilled workers. The global industrial automation market is predicted to grow from $205.86 billion in 2022 to $395.09 billion by 2029 (with an average annual growth rate of 9.8%) (*4). The demand for technologies that can automate more complex tasks is increasing.
Currently, introducing robots to perform complex tasks at manufacturing fields, etc., requires experts to design and develop robots to estimate the position and orientation of objects and to plan movements for each condition, which are then learned manually. One promising approach to control robots is reinforcement learning known as a branch of machine learning. Reinforcement learning is an AI used for robot control that allows robots to learn autonomously from camera-captured images. However, the AI must run the robot (online) and learn from trial-and-error through interaction with environments to achieve high precision, online learning can be challenging or even impossible due to safety constraints. Therefore, there is growing interest in “offline reinforcement learning,” in which controls are learned from pre-collected data (off-line) without the need for trial-and-error in the real-world. To improve accuracy in offline reinforcement learning, large and diverse datasets of object configurations and task patterns are essential. However, this typically requires thousands of training data points, which can take weeks to months to collected. Off-line reinforcement learning, which can safely and accurately learn complex robot controls with small amounts of data, is in demanded in many fields to be automated with robots.
Features of the technology
Toshiba has developed an AI technology for precise control of complex robot operations using offline reinforcement learning with a small amount of data collected from human demonstrations. This AI is a two-step learning system, combination of a first-step control that determines the tentative destination of the robot arm based on images taken of the operating range of the robot arm, and a second-step control that corrects the tentative destination based on images cut out from the vicinity of the tentative destination (Figure 1).
Conventional methods learn only the first-step control. The second-step control enables more precise robot control because 1) images containing only the region of interest are input, 2) cropped images can be augmented (*5) for training, and 3) only corrections to the destination are learned (Figure 2).
We evaluated this AI by simulation trained with image data from 100 robot controls for each of eight different tasks, such as picking and placing objects in a public benchmark environment (RLBench). The average success rate of 500 operations for each of eight different tasks improved significantly from 36% with the conventional method to 72% with this method. This is the highest accuracy in the world (Figure 3). For the task of picking, the success rate of up to 79% with the conventional method has been improved to up to 99% with this method.
It is possible to collect training data for 100 times in only about half a day, and the cropped image that used as the training data for the second-step control is automatically generated at the same time as the robot arm's destination is determined for the first-step control, so no additional work is required. It is expected to be used in fields where there is little data collected training data or where it is difficult to collect data through trial-and-error in the real-world.
This AI can improve the accuracy of control of equipment that requires safe learning, such as manufacturing equipment and medical equipment operation and automatic operation, in a small amount of time and at a low cost. For example, when this AI will be applied to the automation of welding equipment in manufacturing factories, where safety is required due to the high heat involved, robots can replace the work of skilled workers, thereby contributing to the elimination of human resource shortages.
Future developments
Toshiba will verify the effectiveness of this technology using real-world data and further improve its accuracy, with the aim of early practical application.
*1: Development of the world’s first AI model for highly accurate robot control simulation in offline reinforcement learning using a small amount of images as training data. (According to a survey by Toshiba as of April 2024).
*2: Development of the AI model with the world's highest accuracy for robot control using offline reinforcement learning on a small amount of training data (100 trials), the average success rate of 72% in 500 trials of each of 8 different tasks, such as picking and placing objects, in a public benchmark environment (RLBench/simulation environment). Based on a Toshiba survey conducted in April 2024.
*3: Prof. Sugiyama is a Director of the RIKEN Center for Advanced Intelligence Project and a professor at the Graduate School of Frontier Sciences at the University of Tokyo. Awards: IBM Faculty Award, IPSJ Nagao Special Researcher Award, the MEXT Young Scientists' Prize, JSPS Prize, Japan Academy medal, MEXT Ministerial S&T Award, Funai Achievement Award, and so on.
Developed AI technology is presented at the meeting (ICRA) co-authored by Prof. Sugiyama and Toshiba.
*4: FORTUNE BUSINESS INSIGHTS The global industrial automation market report
https://www.fortunebusinessinsights.com/jp/%E6%A5%AD%E7%95%8C-%E3%83%AC%E3%83%9D%E3%83%BC%E3%83%88/%E7%94%A3%E6%A5%AD%E3%82%AA%E3%83%BC%E3%83%88%E3%83%A1%E3%83%BC%E3%82%B7%E3%83%A7%E3%83%B3%E5%B8%82%E5%A0%B4-101589 (in Japanese)
*5: Data augmentation is a method of increasing data by rotating, cropping, or combining images when there is not enough image data for training.