Google introduces innovative techniques for robot training utilizing video data and advanced language models

ByYasmeeta Oon

Jan 9, 2024


The year 2024 is poised to be a monumental year in the domain of technological advancements, particularly at the intersection of generative AI, large foundational models, and robotics. This period is charged with palpable excitement and anticipation as various applications ranging from learning algorithms to innovative product design start to take shape and redefine the technological landscape.

At the forefront of this revolutionary wave are researchers from Google’s DeepMind Robotics, alongside numerous other teams fervently exploring the potential of this space. Their collective endeavors are directed towards imbuing robotics with a profound understanding of human needs and expectations. This is a significant shift from traditional robotics, which typically focuses on performing repetitive, singular tasks throughout their operational life. Although single-purpose robots excel in their designated tasks, they encounter challenges when faced with unexpected changes or errors.

In a groundbreaking development, the DeepMind team has introduced AutoRT, a system designed to leverage the power of large foundational models for diverse applications. This system begins its operation by utilizing a Visual Language Model (VLM) to enhance situational awareness. AutoRT is engineered to manage a fleet of robots, each equipped with cameras to comprehend the layout of their environment and the objects within. This enables the robots to work in tandem effectively.

Moreover, a large language model (LLM) is integral to the system, suggesting feasible tasks that the robots can perform, including actions involving their end effectors. LLMs are widely recognized as the key to developing robotics that can understand and respond to natural language commands. This reduces the reliance on hard coding skills and paves the way for more intuitive human-robot interactions.

Over the past seven months or so, the system has undergone extensive testing. AutoRT has demonstrated its capability to orchestrate up to 20 robots simultaneously, managing a total of 52 different devices. Throughout these trials, DeepMind has accumulated a staggering 77,000 trials, encompassing more than 6,000 tasks.

Additionally, the team has unveiled RT-Trajectory, an innovative approach that utilizes video input for robotic learning. This method isn’t completely new as many teams explore using YouTube videos for large-scale robot training. However, RT-Trajectory introduces a novel aspect by overlaying a two-dimensional sketch of the robot’s arm over the video. These trajectories, represented as RGB images, provide practical visual cues to the model, facilitating the learning of robot-control policies.

DeepMind reports that the training involving RT-Trajectory resulted in double the success rate of its predecessor, RT-2 training. This was evident in the testing of 41 tasks, with a success rate of 63% compared to the previous 29%. The team emphasizes that RT-Trajectory is not just a step towards creating robots capable of moving with precision in novel situations but also a method to unlock the wealth of knowledge latent in existing datasets.

The year 2024 marks a pivotal moment in technological evolution. The advancements in the cross-section of generative AI, large foundational models, and robotics are not just incremental improvements but are shaping up to be quantum leaps that will redefine the paradigms of human interaction, production, and creativity. As we stand on the cusp of this new era, the work done by teams like Google’s DeepMind Robotics illuminates the path forward, promising a future where robots are not just tools but collaborators, capable of understanding and augmenting human endeavor like never before. The excitement and anticipation surrounding these developments are a testament to the transformative power of technology and the endless possibilities that lie ahead. As these technologies continue to evolve and intertwine, they will undoubtedly unlock new horizons and possibilities, making 2024 a year to watch in the annals of technological history.

Yasmeeta Oon

Just a girl trying to break into the world of journalism, constantly on the hunt for the next big story to share.