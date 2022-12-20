



Overview

Google shows new AI models with large datasets for real-time robot control.

Recent successes in developing AI systems for image and natural language processing are based on a common approach of large and diverse datasets processed by powerful and efficient models.

Text or image generative AI models, such as GPT-3 and DALL-E, retrieve data from the internet and rely only on specific datasets for fine-tuning. For example, OpenAI uses human feedback datasets to better adapt large-scale AI models to human needs.

Robotics does not have the massive data sets that exist for text and images. Large amounts of robot data must be collected from autonomous or remote human operations, making them expensive and difficult to produce. Moreover, there are still no AI models that can learn from such data and generalize in real time.

Some researchers rely instead on training AI robots through simulation. Some people are trying to train AI from videos on the Internet.

Google’s Robotics Transformer 1 learns from different modalities

Google is currently introducing Robotics Transformer 1 (RT-1), an AI model for robot control. The model comes with a large real-world dataset for robot training.

This model uses text instructions and images as inputs. These are converted into tokens by the FiLM EfficientNet model and compressed with an additional method (TokenLearner). The input is then transferred to the Transformer, which outputs commands to the robot. According to Google, this makes the model fast enough to control robots in real time.

According to Google, the RT-1 can efficiently process images and text instructions. | | Image: Google

To train RT-1, Google used 13 robots from Everyday Robots, a robotics company owned by Alphabet, to complete over 700 robot tasks, such as pick-up, deposit, and unpacking, collected over 17 months. We used a large dataset of 130,000 examples. The data includes the robot’s joint movements, the robot’s base, camera footage, and a textual description of the task.

After training, the team at Google compared RT-1 to other methods on a variety of visible and invisible tasks to see how robustly the compared models coped with different environments.

RT-1 clearly outperformed other methods, including Deepmind’s Gato, in all scenarios. Google also experimented with other data sources from different robot models. The results of these experiments suggest that RT-1 can learn new skills using training data from other robots, Google writes.

Google’s RT-1 Improves SayCan

The team also checked whether the performance of Google’s SayCan could be improved with RT-1. In fact, the combined system improved his performance by nearly 20% and was able to maintain that success rate in a more complex kitchen environment.

The RT-1 Robotics Transformer is a simple, scalable action generation model for real-world robotics tasks. Tokenize all inputs and outputs and use a pre-trained EfficientNet model with initial language fusion and token learners for compression. RT-1 delivers strong performance across hundreds of tasks and exhibits extensive generalization capabilities and robustness in real-world settings.

Google

The team hopes to increase the number of robot skills that can be learned more quickly in the future. To do this, we plan to have people with no remote robot experience participate in the training data set. It also aims to further improve reaction time and ability to retain context over time.

The code for RT-1 is available on GitHub.

