In the pursuit of a future where robots actively assist humans in various tasks, Google has brought us one-step closer with the Robotics Transformer 2, or RT-2. This groundbreaking artificial intelligence model is specifically designed to teach robots real-world actions, such as the vital task of throwing away trash. This revolutionary development marks a significant stride forward in the realm of helpful and adaptable robots.
ADDRESSING CHALLENGES IN ROBOT TRAINING
While chatbots have become familiar to us, the complexity of robots demands a deeper understanding of the real world and the ability to navigate through intricate and unfamiliar situations. Google acknowledges that training robots to perform general tasks has been a laborious and expensive process, involving extensive data training on a vast array of objects, environments, and scenarios.
THE VISION-LANGUAGE-ACTION MODEL (VLA) APPROACH
With the launch of RT-2, Google presents a fresh approach to tackle these challenges. RT-2 is a vision-language-action (VLA) model, built upon the Transformer architecture, adept at processing textual and visual information from the web. Just as language models learn from web data to grasp concepts, RT-2 leverages this knowledge to instruct robots on executing specific actions.
SPEAKING “ROBOT” – REASONING AND DECISION-MAKING
The true strength of RT-2 lies in its ability to communicate in the language of robots. It empowers robots to reason and make informed decisions based on their training data, allowing them to recognize objects in context and understand how to interact with them. For instance, RT-2 can effortlessly identify and dispose of trash without extensive training on this specific task. It comprehends the abstract nature of trash, acknowledging that what was once a bag of chips or a banana peel becomes trash after use. This streamlined approach eliminates the complexity of previous robotic systems, which relied on intricate stacks of systems communicating between high-level reasoning and low-level manipulation to control the robot’s actions. With RT-2, these tasks are consolidated into a single model, enabling intricate reasoning and seamless robot action outputs.
REMARKABLE RESULTS AND FUTURE PROSPECTS
After rigorous testing through over 6,000 robotic trials, Google’s team achieved remarkable results. On tasks that RT-2 was trained on (“seen” tasks), its performance matched that of its predecessor, RT-1. However, the most notable improvement was seen in novel, unseen scenarios, where RT-2’s performance nearly doubled to 62 per cent, compared to RT-1’s 32 per cent.
Robots equipped with RT-2 possess the ability to swiftly adapt to new situations and environments, akin to how humans learn by applying concepts to novel scenarios. Although there is still progress to be made in fully enabling robots in human-cantered environments, RT-2 offers a promising glimpse of the potential that lies ahead in the field of robotics.