{"id":5660,"date":"2021-05-04T20:57:32","date_gmt":"2021-05-04T18:57:32","guid":{"rendered":"https:\/\/blogs.mathworks.com\/student-lounge\/?p=5660"},"modified":"2025-02-20T14:14:59","modified_gmt":"2025-02-20T19:14:59","slug":"deploying-algorithms-from-matlab-and-simulink-to-nvidia-drive-agx","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/student-lounge\/2021\/05\/04\/deploying-algorithms-from-matlab-and-simulink-to-nvidia-drive-agx\/","title":{"rendered":"Deploying Algorithms from MATLAB and Simulink to NVIDIA DRIVE AGX"},"content":{"rendered":"<p><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">This is\u00a0<\/span><\/span><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">the second\u00a0<\/span><\/span><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">post<\/span><\/span><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">\u00a0of our two<\/span><\/span><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">&#8211;<\/span><\/span><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">part series on how MathWorks platform<\/span><\/span><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">s<\/span><\/span><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">\u00a0support AV developers who use NVIDIA\u00a0<\/span><\/span><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">DRIVE\u00a0<\/span><\/span><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">Sim and deploy algorithms to NVIDIA hardware<\/span><\/span><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">\u00a0<\/span><\/span><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">(<a href=\"https:\/\/blogs.mathworks.com\/racing-lounge\/2021\/04\/26\/building-an-autonomous-vehicle-av-simulation-toolchain-with-simulink-roadrunner-and-nvidia-drive-sim\/\">See Part 1 here<\/a>).\u00a0 In this post we\u2019ll cover how to deploy algorithms created in MATLAB and Simulink to NVIDIA\u00a0<\/span><\/span><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">DRIVE AGX<\/span><\/span><span class=\"TextRun SCXW206000522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW206000522 BCX0\">.\u00a0<\/span><\/span><span class=\"EOP SCXW206000522 BCX0\" data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"1588\" height=\"568\" class=\"aligncenter size-full wp-image-5656\" src=\"https:\/\/blogs.mathworks.com\/racing-lounge\/files\/2021\/05\/Part-2-Image-1-3-MW-technologies.png\" alt=\"\" \/><\/p>\n<p><span data-contrast=\"auto\">Simulink provides an environment to integrate and run simulations of the control logic with the vehicle dynamics and environment models. This enables\u00a0<\/span><span data-contrast=\"auto\">t<\/span><span data-contrast=\"auto\">he entire system to be tested earl<\/span><span data-contrast=\"auto\">y<\/span><span data-contrast=\"auto\">\u00a0in the design process. GPU Coder and Embedded Coder can then be used to deploy to modern NVIDIA GPUs, including the\u00a0<\/span><a href=\"https:\/\/www.nvidia.com\/en-us\/self-driving-cars\/drive-platform\/hardware\/\"><span data-contrast=\"none\">NVIDIA DRIVE platform<\/span><\/a><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">To illustrate this workflow, consider a\u00a0<\/span><span data-contrast=\"auto\">highway lane following system\u00a0<\/span><span data-contrast=\"auto\">that\u00a0<\/span><span data-contrast=\"auto\">steers a vehicle to travel within a marked lane. The system typically uses vision processing algorithms to detect lanes and vehicles from a camera. The controller uses the lane detections, vehicle detections, and set speed to control steering and acceleration.<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-5654\" src=\"https:\/\/blogs.mathworks.com\/racing-lounge\/files\/2021\/05\/Part-2-Image-2-Diagram-for-Lane-Follwing-Controller.png\" alt=\"\" width=\"1007\" height=\"568\" \/><\/p>\n<p><span class=\"TextRun SCXW190071473 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW190071473 BCX0\">The system level simulation can be run to see that it correctly identifies the lane markers and vehicles on the road.<\/span><\/span><span class=\"EOP SCXW190071473 BCX0\" data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"1280\" height=\"720\" class=\"aligncenter wp-image-5652\" src=\"https:\/\/blogs.mathworks.com\/racing-lounge\/files\/2021\/05\/Part-2-Image-3-animation-of-3-perspectives-of-lane-following.gif\" alt=\"\" \/><\/p>\n<p><span class=\"TextRun SCXW167879641 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW167879641 BCX0\">Looking inside the vision detector subsystem, the input video is fed to two deep-learning networks running in parallel to detect the left and right lane markers and oncoming vehicles. Pre- and post-processing subsystems prepare the input video data for the two deep learning networks, annotate the lane markers, and draw bounding boxes around detected vehicles prior to displaying the output video.<\/span><\/span><span class=\"EOP SCXW167879641 BCX0\" data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"780\" height=\"440\" class=\"aligncenter size-full wp-image-5650\" src=\"https:\/\/blogs.mathworks.com\/racing-lounge\/files\/2021\/05\/Part-2-Image-4-Diagram-for-Lane-Detector.png\" alt=\"\" \/><\/p>\n<p><span class=\"TextRun SCXW242793224 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW242793224 BCX0\">Using video captured from a test vehicle, the simulation of the vision detector subsystem can be run on the host machine CPU to ensure it correctly identifies the lane markers and incoming vehicles.<\/span><\/span><span class=\"EOP SCXW242793224 BCX0\" data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"1280\" height=\"720\" class=\"aligncenter size-full wp-image-5648\" src=\"https:\/\/blogs.mathworks.com\/racing-lounge\/files\/2021\/05\/Part-2-Image-5-Animiatons-and-Diagram-for-Lane-Detector.gif\" alt=\"\" \/><\/p>\n<p><span class=\"TextRun SCXW224233855 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW224233855 BCX0\">In this example, running the simulation on the CPU, the framerate appears choppy. Developers can switch to a desktop NVIDIA GPU to speed up the simulation. The output results remain the same and the framerate improves significantly.<\/span><\/span><span class=\"EOP SCXW224233855 BCX0\" data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"1280\" height=\"720\" class=\"aligncenter size-full wp-image-5662\" src=\"https:\/\/blogs.mathworks.com\/racing-lounge\/files\/2021\/05\/Deployment-3-Vehicle-lane-simulation-GPU-8MB.gif\" alt=\"\" \/><\/p>\n<p><span class=\"TextRun SCXW103882644 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW103882644 BCX0\">When\u00a0<\/span><\/span><span class=\"TextRun SCXW103882644 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW103882644 BCX0\">satisfied<\/span><\/span><span class=\"TextRun SCXW103882644 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW103882644 BCX0\">\u00a0with the simulation results, developers can generate code to target the NVIDIA DRIVE from the same Simulink model. Embedded Coder generates optimized C\/C++ code running on the ARM processors while GPU Coder generates CUDA kernels for the CUDA cores. GPU Coder takes care of allocating memory on the GPU (using\u00a0<\/span><\/span><span class=\"TextRun SCXW103882644 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SpellingErrorV2 SCXW103882644 BCX0\">cudaMalloc<\/span><\/span><span class=\"TextRun SCXW103882644 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW103882644 BCX0\">\u00a0calls), moving the data between CPU and GPU memory (using\u00a0<\/span><\/span><span class=\"TextRun SCXW103882644 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SpellingErrorV2 SCXW103882644 BCX0\">cudaMemcpyToSymbol<\/span><\/span><span class=\"TextRun SCXW103882644 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW103882644 BCX0\">\u00a0calls), and calling the CUDA kernels, all at the appropriate points in the code.<\/span><\/span><span class=\"EOP SCXW103882644 BCX0\" data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"1280\" height=\"720\" class=\"aligncenter size-full wp-image-5646\" src=\"https:\/\/blogs.mathworks.com\/racing-lounge\/files\/2021\/05\/Part-2-Image-7-Code-example-animiation.gif\" alt=\"\" \/><\/p>\n<p><span class=\"TextRun SCXW184752045 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW184752045 BCX0\">GPU Coder then calls the NVIDIA toolchain to compile and download the complete application on the NVIDIA DRIVE. Using Simulink to start the application on the board, the processed video is sent back from the NVIDIA DRIVE on the SDL video display window. The framerate is not quite as snappy as running the simulation on the desktop GPU, but it is to be expected given the more resource-constrained embedded GPU.<\/span><\/span><span class=\"EOP SCXW184752045 BCX0\" data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" width=\"1280\" height=\"720\" class=\"aligncenter size-full wp-image-5644\" src=\"https:\/\/blogs.mathworks.com\/racing-lounge\/files\/2021\/05\/Part-2-Image-8-Diagram-and-animation-of-detector.gif\" alt=\"\" \/><\/p>\n<p><span data-contrast=\"auto\">With this workflow set up, the Simulink model can continue to be tweaked and the changes can be seen running on the NVIDIA GPUs within a few minutes. Simulation enables the ability to find and fix bugs earlier in the process, and GPU Coder and Embedded Coder provide an automated workflow to running the entire application on NVIDIA DRIVE.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/student-lounge\/files\/2021\/04\/DRIVESim_KV1.jpg\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div>\n<p>This is\u00a0the second\u00a0post\u00a0of our two&#8211;part series on how MathWorks platforms\u00a0support AV developers who use NVIDIA\u00a0DRIVE\u00a0Sim and deploy algorithms to NVIDIA hardware\u00a0(See Part 1 here).\u00a0 In this&#8230; <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/student-lounge\/2021\/05\/04\/deploying-algorithms-from-matlab-and-simulink-to-nvidia-drive-agx\/\">read more >><\/a><\/p>\n","protected":false},"author":174,"featured_media":5576,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[287,8],"tags":[488,490,494,492],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/posts\/5660"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/users\/174"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/comments?post=5660"}],"version-history":[{"count":6,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/posts\/5660\/revisions"}],"predecessor-version":[{"id":11891,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/posts\/5660\/revisions\/11891"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/media\/5576"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/media?parent=5660"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/categories?post=5660"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/tags?post=5660"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}