{"id":19069,"date":"2026-01-13T10:00:49","date_gmt":"2026-01-13T15:00:49","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=19069"},"modified":"2026-01-07T14:42:58","modified_gmt":"2026-01-07T19:42:58","slug":"reinforcement-learning-on-hardware-explained-by-brian-douglas","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2026\/01\/13\/reinforcement-learning-on-hardware-explained-by-brian-douglas\/","title":{"rendered":"Reinforcement Learning on Hardware, explained by Brian Douglas"},"content":{"rendered":"<div class=\"rtcContent\">\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">\r\n<h6><\/h6>\r\n<table style=\"background-color: #e2f0ff;\">\r\n<tbody>\r\n<tr>\r\n<td style=\"width: 120px; padding: 3px; vertical-align: middle;\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-19070\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2025\/12\/brian-150x150.jpg\" alt=\"\" width=\"100\" height=\"100\" \/><\/td>\r\n<td style=\"vertical-align: middle; padding: 3px;\"><strong>Co-author: <a href=\"https:\/\/www.linkedin.com\/in\/brian-douglas-505b7175\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Brian Douglas<\/a><\/strong>\r\n\r\nBrian is a Technical Content Creator at MathWorks and control engineer with over 20 years of experience in the field, and a passion for sharing his knowledge with others. He creates and posts engaging videos, drawings, and short writings on various engineering topics. You can find his Tech Talks series on mathworks.com and YouTube.<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<\/div>\r\n<h6><\/h6>\r\n<div class=\"rtcContent\">\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">You've probably seen impressive demos of reinforcement learning agents doing amazing things\u2014balancing poles, playing games, controlling robots. But here's the thing: getting from \"cool simulation\" to \"actually running on hardware\" is where it gets tricky.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">My colleague Brian Douglas just released a Tech Talk that tackles exactly this question. Not the specifics of RL algorithms (<a href=\"https:\/\/www.mathworks.com\/videos\/reinforcement-learning-part-1-what-is-reinforcement-learning-1551974943006.html\">we have a whole series for that<\/a>), but something that doesn't get nearly enough attention: <span style=\"font-style: italic;\">which approach should you take<\/span> to get a good policy running on your hardware?<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">I know that doesn't sound terribly exciting at first. But trust me\u2014it matters a lot for hardware safety and how long you'll spend waiting for your policy to converge.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<h3 style=\"margin: 15px 10px 5px 4px; padding: 0px; line-height: 20.4px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 17px; font-weight: bold; text-align: left;\">The setup<\/h3>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Brian demonstrates this with a <a href=\"https:\/\/www.quanser.com\/products\/qube-servo-2\/\" target=\"_blank\" rel=\"noopener\">Quanser\u00ae Qube-Servo 2<\/a> rotary pendulum, controlled by a policy running on a Raspberry Pi\u00ae, which is connected to a PC running MATLAB\u00ae and Simulink\u00ae. Three pieces of hardware working together\u2014and how you use them depends entirely on your approach to training.<\/div>\r\n<h6><\/h6>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">If you want to follow along with your own hardware, check out the\u00a0<a href=\"https:\/\/github.com\/mathworks\/Reinforcement-Learning-Inverted-Pendulum-with-QUBE-Servo2\" data-href=\"https:\/\/github.com\/mathworks\/Reinforcement-Learning-Inverted-Pendulum-with-QUBE-Servo2\">GitHub example<\/a>\u00a0that covers the complete workflow from plant modeling to deployment.<\/div>\r\n<\/div>\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<div class=\"rtcContent\">\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-19088\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2025\/12\/3-hardware.png\" alt=\"\" width=\"800\" height=\"450\" \/><\/div>\r\n<h3 style=\"margin: 15px 10px 5px 4px; padding: 0px; line-height: 20.4px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 17px; font-weight: bold; text-align: left;\">The decision flowchart<\/h3>\r\n<h6><\/h6>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Here is the complete decision tree that Brian covers in his video. Let's break it down, step by step:<\/div>\r\n<h6><\/h6>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-19067\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2025\/12\/rl-hardware-brian-douglas_3.png\" alt=\"\" width=\"800\" height=\"450\" \/><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Let me walk you through the key decision points:<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><span style=\"font-weight: bold;\">1. Do you have existing data?<\/span><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">You can learn a policy offline from data collected by humans or another controller. This is great for bootstrapping\u2014getting a decent starting point before you do any online training.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><span style=\"font-weight: bold;\">2. Where do you get additional data?<\/span><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">This is the big fork in the road: train directly on hardware (real environment) or train in simulation (modeled environment). Both have trade-offs. Brian covers the <a href=\"https:\/\/www.mathworks.com\/videos\/reinforcement-learning-part-2-understanding-the-environment-and-rewards-1551976590603.html\">environment setup<\/a> in detail in Part 2 of the RL series.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<h3 style=\"margin: 15px 10px 5px 4px; padding: 0px; line-height: 20.4px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 17px; font-weight: bold; text-align: left;\">Training directly on hardware<\/h3>\r\n<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><span style=\"font-weight: bold;\"> Where does inference happen vs. where does learning happen?<\/span><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">This one's subtle but important. You might have a low-power embedded processor (like a Raspberry Pi) and a more powerful PC or cloud server. You can mix and match:<\/div>\r\n<table style=\"margin: 3px; border: 0.666667px solid #bfbfbf; border-collapse: collapse;\">\r\n<tbody>\r\n<tr style=\"background-color: #f5f5f5;\">\r\n<td style=\"border: 0.666667px solid #bfbfbf; vertical-align: top;\">\r\n<div style=\"margin: 2px 10px 2px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: break-spaces; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Approach<\/div><\/td>\r\n<td style=\"border: 0.666667px solid #bfbfbf; vertical-align: top;\">\r\n<div style=\"margin: 2px 10px 2px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: break-spaces; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Pros<\/div><\/td>\r\n<td style=\"border: 0.666667px solid #bfbfbf; vertical-align: top;\">\r\n<div style=\"margin: 2px 10px 2px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: break-spaces; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Cons<\/div><\/td>\r\n<\/tr>\r\n<tr style=\"background-color: rgba(0, 0, 0, 0);\">\r\n<td style=\"border: 0.666667px solid #bfbfbf; vertical-align: top;\">\r\n<div style=\"margin: 2px 10px 2px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: break-spaces; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Train &amp; run on embedded<\/div><\/td>\r\n<td style=\"border: 0.666667px solid #bfbfbf; vertical-align: top;\">\r\n<div style=\"margin: 2px 10px 2px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: break-spaces; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Self-contained, minimal latency<\/div><\/td>\r\n<td style=\"border: 0.666667px solid #bfbfbf; vertical-align: top;\">\r\n<div style=\"margin: 2px 10px 2px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: break-spaces; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Embedded processor may be too weak for training<\/div><\/td>\r\n<\/tr>\r\n<tr style=\"background-color: rgba(0, 0, 0, 0);\">\r\n<td style=\"border: 0.666667px solid #bfbfbf; vertical-align: top;\">\r\n<div style=\"margin: 2px 10px 2px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: break-spaces; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Train &amp; run on remote PC<\/div><\/td>\r\n<td style=\"border: 0.666667px solid #bfbfbf; vertical-align: top;\">\r\n<div style=\"margin: 2px 10px 2px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: break-spaces; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Faster learning<\/div><\/td>\r\n<td style=\"border: 0.666667px solid #bfbfbf; vertical-align: top;\">\r\n<div style=\"margin: 2px 10px 2px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: break-spaces; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Susceptible to latency and connection issues<\/div><\/td>\r\n<\/tr>\r\n<tr style=\"background-color: rgba(0, 0, 0, 0);\">\r\n<td style=\"border: 0.666667px solid #bfbfbf; vertical-align: top;\">\r\n<div style=\"margin: 2px 10px 2px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: break-spaces; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Train on PC, run on embedded<\/div><\/td>\r\n<td style=\"border: 0.666667px solid #bfbfbf; vertical-align: top;\">\r\n<div style=\"margin: 2px 10px 2px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: break-spaces; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Best of both worlds<\/div><\/td>\r\n<td style=\"border: 0.666667px solid #bfbfbf; vertical-align: top;\">\r\n<div style=\"margin: 2px 10px 2px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: break-spaces; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">More complex setup<\/div><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Brian goes with that third option\u2014and honestly, it's often the sweet spot for real hardware applications.<\/div>\r\n<div class=\"rtcContent\">\r\n<div><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">The results? Well...<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">The first episodes are rough. Random commands, motor jittering, the pendulum occasionally banging into the power cord (which let's admit, isn't great for the hardware). After several hours and almost 1,500 episodes, the policy could balance the pendulum\u2014but with noticeable wobble and plenty of room for improvement.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">The takeaway: training on hardware is straightforward but time-consuming, potentially dangerous, and hard to cover all operating states your system might encounter in the real world.<\/div>\r\n<h3 style=\"margin: 15px 10px 5px 4px; padding: 0px; line-height: 20.4px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 17px; font-weight: bold; text-align: left;\">Training in simulation<\/h3>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Here's where things get more practical. With a model of your environment, you can:\r\n- Train faster than real-time\r\n- Forget about hardware safety during exploration\r\n- Easily test different initial conditions and scenarios<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Brian trained against a simulated Qube model for about 1,200 episodes. The result? A policy that controls the pendulum rock-steady in simulation\u2014reward over 800, nice and settled.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">But the real question: does it work on actual hardware?<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">For a detailed answer with code, see this example from the documentation: <a href=\"https:\/\/www.mathworks.com\/help\/reinforcement-learning\/ug\/train-agents-to-control-quanser-qube-pendulum.html\">Train Reinforcement Learning Agents to Control Quanser QUBE Pendulum<\/a><\/div>\r\n<h3 style=\"margin: 15px 10px 5px 4px; padding: 0px; line-height: 20.4px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 17px; font-weight: bold; text-align: left;\">The sim2real gap<\/h3>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">When Brian deployed the simulation-trained policy to the real Qube, it worked pretty well! No terrible jittering, no banging into cables. But if you look closely, there's a small amount of wobble that wasn't there in simulation.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">That's the sim2real gap\u2014the difference between what your model predicts and what happens in reality. Now, this isn't necessarily a problem. Models don't have to be perfect; they just have to be useful. If the behavior meets your requirements, you're done.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">But if it's not good enough, you have options:<\/div>\r\n<div><strong>\u00a01. Improve the model<\/strong>\u00a0\u2014 Add more dynamics, use domain randomization to make your policy more robust<\/div>\r\n<div>\u00a0<strong>2. Fine-tune on hardware<\/strong> \u2014 Start from your simulation-trained policy and continue learning on the real system<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">That second approach is particularly nice because you're not starting from scratch. Your policy already knows roughly what to do, so training is faster and puts less stress on the hardware.<\/div>\r\n<div><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<h3 style=\"margin: 15px 10px 5px 4px; padding: 0px; line-height: 20.4px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 17px; font-weight: bold; text-align: left;\">Why this matters<\/h3>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">There's no single best way to learn a policy for hardware. It depends on how well you can model your environment, how hard it is to reset between episodes, and how much you care about hardware safety (spoiler: probably a lot).<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">But this workflow\u2014train in simulation, fine-tune on hardware, run policy on embedded processor, learn on a remote computer\u2014is actually ideal for many situations. It's also the foundation for continuous learning, where your policy keeps adapting as components wear out or conditions change, without requiring a full retraining cycle.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Brian goes into much more detail in the video, including live demos of all three approaches and the actual Simulink models involved. Highly recommend watching the full thing:<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">\ud83d\udcfd\ufe0f <a href=\"https:\/\/www.mathworks.com\/videos\/reinforcement-learning-on-hardware-1765346150358.html\">Reinforcement Learning on Hardware<\/a><\/div>\r\n<div class=\"row\"><div class=\"col-xs-12 containing-block\"><div class=\"bc-outer-container add_margin_20\"><videoplayer><div class=\"video-js-container\"><video data-video-id=\"6386267250112\" data-video-category=\"blog\" data-autostart=\"false\" data-account=\"62009828001\" data-omniture-account=\"mathwgbl\" data-player=\"rJ9XCz2Sx\" data-embed=\"default\" id=\"mathworks-brightcove-player\" class=\"video-js\" controls><\/video><script src=\"\/\/players.brightcove.net\/62009828001\/rJ9XCz2Sx_default\/index.min.js\"><\/script><script>if (typeof(playerLoaded) === 'undefined') {var playerLoaded = false;}(function isVideojsDefined() {if (typeof(videojs) !== 'undefined') {videojs(\"mathworks-brightcove-player\").on('loadedmetadata', function() {playerLoaded = true;});} else {setTimeout(isVideojsDefined, 10);}})();<\/script><\/div><\/videoplayer><\/div><\/div><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">And if you're new to reinforcement learning, check out Brian's multi-part <a href=\"https:\/\/www.mathworks.com\/videos\/reinforcement-learning-part-1-what-is-reinforcement-learning-1551974943006.html\">Tech Talk series<\/a> that covers the fundamentals.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Happy reinforcement learning! \ud83e\udd16<\/div>\r\n<\/div>\r\n<script type=\"text\/javascript\">var css = ''; var head = document.head || document.getElementsByTagName('head')[0], style = document.createElement('style'); head.appendChild(style); style.type = 'text\/css'; if (style.styleSheet){ style.styleSheet.cssText = css; } else { style.appendChild(document.createTextNode(css)); }<\/script>","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2025\/12\/rl-hardware-brian-douglas_2.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><p>\r\n\r\n\r\n\r\n\r\n\r\n\r\nCo-author: Brian Douglas\r\n\r\nBrian is a Technical Content Creator at MathWorks and control engineer with over 20 years of experience in the field, and a passion for sharing his knowledge... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2026\/01\/13\/reinforcement-learning-on-hardware-explained-by-brian-douglas\/\">read more >><\/a><\/p>","protected":false},"author":230,"featured_media":19066,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[68,76],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/19069"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/230"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=19069"}],"version-history":[{"count":8,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/19069\/revisions"}],"predecessor-version":[{"id":19112,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/19069\/revisions\/19112"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media\/19066"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=19069"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=19069"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=19069"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}