Google’s DeepMind said that the company has developed an artificial intelligence model called RoboCat, which can control different robot arms to perform a series of tasks. That alone isn’t particularly novel, but DeepMind claims the model is the first to be able to solve and adapt to multiple tasks, and to do so using different, real-world robots.
RoboCat was inspired by Gato, another DeepMind AI model that can analyze and process text, images, and events. RoboCat’s training data includes images and motion data of simulated and real robots from other robot control models in virtual environments, human-controlled robots, and previous versions of RoboCat itself.
“We demonstrated that a single large model can resolve diverse tasks and can quickly adapt to new tasks and entities.”
It is noticed that in order to train RoboCat, DeepMind researchers first collected 100 to 1,000 demonstrations of each task or robot in a simulated or real environment using a human-controlled robotic arm. For example, having a robotic arm pick up gears or stack blocks, etc. They then fine-tuned RoboCat, creating a specialized “derived” model on each task and practicing it an average of 10,000 times. By using the data generated by the derivative model and the demonstration data, the researchers continuously expanded RoboCat’s training data set and trained new versions of RoboCat.
The final version of RoboCat was trained on a total of 253 tasks and tested on 141 variants of these tasks in both simulation and the real world. DeepMind claims that RoboCat learned to operate different types of robotic arms after observing 1,000 human-controlled demonstrations collected over several hours. While RoboCat has been trained on four robots with two-claw arms, the model can be adapted to a more complex arm with a three-finger gripper and twice as controllable inputs.
Even so, RoboCat’s success rate on different tasks in DeepMind’s tests varied widely, ranging from a low of 13 percent to a high of 99 percent. This is with 1000 demonstrations in the training data; if the number of demonstrations is halved, the success rate is reduced accordingly. In some cases, though, DeepMind claims that RoboCat only needs to observe 100 demonstrations to learn new tasks.
Alex Lee believes that RoboCat may make it easier to solve new tasks. “Given a certain number of demonstrations of new tasks, RoboCat can fine-tune to new tasks and can self-generate more data to improve further,” he added.
In the future, the research team aims to reduce the number of demonstrations needed to teach RoboCat a new task to fewer than 10.