Generative AI + Robotics = Awesome!

저자 YJ Lim, February 27, 2025

265 회 조회 (최근 30일) | 0 좋아요 | 0 댓글

Generative AI for Robotics with MATLAB and Simulink

Super excited to share some cool stuff we’ve been working on: using generative AI to make robots way more adaptable and easy to work with!

Generative AI has become one of the most influential trends of recent years, powering tools like ChatGPT and Copilot that have transformed how we live and work. Beyond these domains, generative AI is also reshaping industries, and robotics is no exception. At MathWorks, we are excited to explore how this technology can simplify and enhance the way robots operate, making advanced robotics more accessible and versatile.

One notable example in this field is Google’s Robotics Transformer 2 (RT-2). RT-2 exemplifies the potential of by enabling robots to perceive, plan, and act with a level of adaptability previously unattainable. These models utilize web-scale data to help robots generalize knowledge, perform tasks in unstructured environments, and require minimal task-specific training. While this sounds promising, there are still challenges, such as integrating these models into real-world workflows—and that’s where MathWorks comes in.

Moving from Traditional to Generative AI Approaches

Traditionally, autonomous systems were built with separate modules for perception, planning, and control. These modules, while they are functional, required significant effort to integrate and adapt to new environments.

Existing approach to perform robotic tasks

Figure: Existing approach to perform robotic tasks – In traditional robotics, tasks are broken into sub-tasks such as object detection, grasping, and motion planning. Modules like perception detect and estimate object poses, while motion planning computes trajectories for task execution. These steps often require reruns in dynamic environments, leading to complexity and inefficiencies for multi-step tasks like placing multiple objects.

For instance, in a traditional perception pipeline:

Object Detection: A YOLOv4 detector might be trained to identify objects in an image (example).
Pose Estimation: A multi-step process estimates the 3D position and orientation of the detected objects (example).
Training and Deployment: Each new object or environment necessitates retraining and reconfiguration, which can be time-consuming and difficult to scale.

pose estimation

Figure: Existing pose estimation approach to perform robotics tasks (see this example). In the existing perception pipeline, the objective is to detect objects in an image and estimate their 3D poses, often using a YOLOv4 detector. While effective, this process requires extensive training and pose estimation steps, making it tedious and challenging to scale for new objects or environments.

Generative AI changes this approach by combining perception, planning, and control into a single, end-to-end system. VLA models process text instructions and camera images to predict robot actions, iteratively refining these actions based on feedback. These models are:

Built on transformer architectures—the same foundation as models like ChatGPT.
Capable of reasoning and generating actions based on combined vision and language inputs.
Embodied AI systems that connect abstract understanding to physical actions.

This end-to-end approach simplifies development and makes it easier to adapt robots to new tasks and environments.

Robotics VLA (vision-language-action) models

Figure: Robotics VLA (vision-language-action) models, based on transformer architecture, predict robot actions from text instructions and camera images in a single streamlined step, unlike traditional systems with separate stages for task decomposition, perception, and motion planning. These models iteratively refine actions using visual feedback, ensuring better accuracy, but still rely on low-level controllers for execution and require a safety layer for real-world applications. Unlike models like ChatGPT and DALL-E, VLA models enable embodied AI by integrating decision-making into physical robotic systems.

Generative AI Meets Robotics at MathWorks

At MathWorks, we’re committed to bridging the gap between groundbreaking research in generative AI and practical applications in robotics. Our tools in MATLAB and Simulink complement robotics foundation models by offering, for example:

Plug and Play: Access and deploy models like directly within MATLAB & Simulink.
Test it Our: Simulate robot dynamics, refine motion planning, and turn trajectory control using outputs from generative models (using Robotics System Toolbox).
See it Live: Realistic 3D animations bring robot behaviors to life, making it easier to evaluate performance in simulated environments.
Stay Safe: Tools to verify and validate robotic systems for real-world use, ensuring safety-critical applications.
Real World Ready: Seamlessly transition from simulation to real-world deployment, including testing on resource-constrained devices or leveraging cloud-based inference.

For example, we’ve developed a Simulink block called “RobotPolicy,” which integrates with foundation models to demonstrate their capabilities in closed-loop setups. This block accepts task instructions and visual observations, outputs robot actions, and is compatible with pre-trained small models like RT1-X and Octo.

Figure: Simulate and test the robotics foundation model in Simulink. “RobotPolicy” block in Simulink integrates with Python-based foundation models from sources like HuggingFace. It processes task instructions and observation images to generate robot actions, specifying desired end-effector positions and orientations. The workflow includes position control for natural movements, 3D simulation with realistic environments, and iterative action generation until task completion, enabling seamless testing and deployment of generative AI for robotics.

Real-World Applications and Future Prospects

Generative AI with MATLAB and Simulink opens up exciting possibilities across various robotics domains, such as:

Zero-Shot Deployment: Robots can perform tasks in environments they’ve never encountered before, thanks to the extensive training of foundation models on diverse datasets.
Emergent Capabilities: Beyond basic commands, robots can perform complex tasks requiring reasoning, like selecting healthy drinks or understanding symbolic instructions.
Simulation-Driven Development: High-fidelity simulations help refine models and accelerate testing, reducing the gap between simulation and real-world deployment.
Fine-Tuning for Specific Tasks: Robots can be adapted to new tasks or environments with minimal data, leveraging the priors learned from foundation models. For instance, fine-tuning models for precision manipulation or long-horizon tasks can be achieved in hours with limited samples.

Google RT-2 in Simulink

Try It Yourself

We’re eager to help you explore how generative AI can transform robotics workflows. While we are in the process of making an example available on GitHub, you can reach out to us directly to request trial code access. This example will demonstrate:

Integration of robotics foundation models with Simulink.
Simulation and visualization of robotic tasks.
How you can adapt these models for specific applications.

Please connect with us to obtain the trial codes and experience the possibilities firsthand. We look forward to your feedback and insights!

Join the Conversation

Generative AI is still evolving, and there are challenges to overcome, such as improving success rates and ensuring scalability. As more data and high-fidelity simulations become available, we expect rapid advancements. At MathWorks, we’re excited to contribute to this journey and look forward to hearing how you envision generative AI unlocking new opportunities in robotics.

Let us know your thoughts and try out the example on GitHub to see the possibilities firsthand.

Are you currently exploring generative AI applications in robotics? If so, in which robotics applications do you think generative AI can make a significant impact?
The robotics VLA foundation models, such as Google’s RT-X and Covariant’s RFM-01, can perform tasks in an end-to-end manner (perception, planning, and actuation). Do you think these models can replace classical algorithms?
Robotics foundation models still need low-level controllers, additional safety features, and extensive testing for deployment in production. Do you think Model-Based Design can play a key role in ensuring the functional safety of these models?
MATLAB/Simulink offers easy connectivity to foundation models and provides tools for simulation, testing, and deployment. Would you be interested in using MATLAB/Simulink for this purpose?