The following post is from Bill Chou, Product Manager for AI Deployment with GPU Coder.
The newest Jetson AGX Orin
packs some incredible processing power in a small package and opens doors for running more computationally intensive AI algorithms outside the lab. As NVIDIA has pointed out, the Jetson AGX Orin is capable of delivering up to 8 times the AI performance
of the previous Jetson AGX Xavier. We were eager to try out some AI applications developed in Simulink and see how we can quickly get the AI algorithms onto the board and test it on the go.
Showing the lane following example we'll put onto Jetson AGX Orin
||Users like Airbus have been using Simulink and GPU Coder to deploy AI applications onto various generations of Jetson boards to quickly prototype and test their AI applications They can test the AI application on their desktop developer machine first, then migrate the AI application onto Jetson boards to use it outside their labs, for use under a variety of conditions: inside an aircraft, on the road in a vehicle, or an autonomous underwater vehicle.
To illustrate this approach, we'll use a highway lane following example that processes video from a dashcam. Once we verify the AI application with the test video input, we can unhook the Jetson from our desktop developer machine, switch out the input test video for live video feeds, and take the Jetson out of the lab for additional testing.
Running lane and vehicle detection Simulink model on desktop developer GPU
The Simulink model that we’re using takes an input video stream, detects the left and right lane markers as well as vehicles in the video frame. It uses two deep learning networks based on YOLO v2 and AlexNet to achieve this. Some pre and postprocessing, including drawing annotations for the left & right lanes and bounding boxes around vehicles, help to complete the application.
We were able to quickly prototype this application by starting with two out-of-the-box examples described in more detail here
. Running the Simulink model on our desktop developer machine outfitted with a powerful NVIDIA desktop class GPU, we see the AI application run smoothly, correctly identifying lane markers and vehicle. Underneath the hood, Simulink automatically identified compute-intensive parts of the model and, together with the NVIDIA CUDA toolkit, offloads these computations from the CPU and onto the desktop GPU cores to give us the smooth processing seen in the output video.
Next, let's focus on the deployment portion of the workflow to see how we can embed this onto the newest Jetson AGX Orin.
Generating CUDA code from Simulink model
To generate CUDA code and deploy the AI application onto the Jetson AGX Orin, we can use GPU Coder
. Using the same Simulink model from the desktop simulations, we need to replace the output Viewer block with a SDL Video Output block so that video will appear on the Jetson board desktop for us to see.
We will also need to set the code generation configurations for the Jetson AGX Orin. In the configuration parameters for code generation, we can choose between using NVIDIA’s cuDNN or TensorRT for the deep learning networks. For the non-deep learning portions of our Simulink model, GPU Coder will automatically integrate calls to CUDA optimized libraries like cuBLAS and cuFFT.
We can also set the hardware configuration settings for the Jetson board, including the NVIDIA toolchain, board login/password, and build options.
|Once configured, we can start generating code. GPU Coder will first automatically identify compute-intensive parts of the Simulink model and translate them into CUDA kernels that will execute on the GPU cores for best performance. The rest of the AI application will run as C/C++ code on the ARM cores of the Jetson board.
Looking at snippets of the generated CUDA code, we can see cudaMalloc() calls to allocate memory on the GPU in preparation for running kernels on the GPU cores. We can also spot cudaMemcpy() calls to move data between the CPU and GPU at the appropriate parts of the algorithms, and several CUDA kernels launches through the laneAndVehicleD_Outputs_kernel1() and laneAndVehicleD_Outputs_kernel1() calls.
We can also poke into the code that represents the 2 deep learning networks. Looking inside the setup functions of the YOLO v2 network that is executed once at the beginning of our AI application, we can see that it initializes each layer into memory sequentially, including all the weights and biases that are stored as binary files on disk.
Finally, while the Simulink model and CUDA code generation settings are configured for the Jetson AGX Orin, it’s worth noting that the generated CUDA code is portable and can run on all modern NVIDIA GPUs including the Jetson & DRIVE platforms, not to mention desktop and server class GPUs.
Once the CUDA code is generated, GPU Coder will automatically call the CUDA toolchain to compile, download, and start the executable on the Jetson AGX Orin. For our application, we've also copied the input video file onto the Jetson board to serve as the input video to the AI application. As we are using the SDL video block, the processed output video from the Jetson board will appear as a SDL window on the Jetson board and we can visually see the output is the same as our desktop GPU simulations, though with expected lower framerates given the difference in processing power.
At this point, we can unplug the Jetson AGX Orin from our host developer machine and move it out of our lab for further testing in the field. We can also take the generated CUDA code and manually integrate it into a larger application in another project by using the packngo function
to neatly zip up all the necessary source code. Given the way CUDA is architected, the generated CUDA code is portable and can run on all modern NVIDIA platforms, from desktop and server class GPUs to the embedded Jetson and DRIVE boards.
It's been interesting to run various AI applications on the Jetson AGX Orin and see the boost in performance over the previous Jetson AGX Xavier. The workflow we described above has helped various users move more quickly when exploring and prototyping AI applications in the field. Take a spin with the new Jetson AGX Orin and see what types of AI application you can bring to your designs in the field.
We'll be presenting this demo using the AGX and go through more details on this workflow at our upcoming MATLAB Expo 2022 talk: Machine Learning with Simulink and NVIDIA Jetson
on May 17, 2022. Join the session to see the workflow in action and visit the NVIDIA booth to ask more question about everything NVIDIA, including their newest board Jetson AGX Orin Jetson AGX Orin
Here is the link to the lane and vehicle detection example:
To run this and other AI applications on the Jetson, you need the MATLAB Coder Support Package for NVIDIA Jetson and NVIDIA DRIVE Platforms
. Finally, the example runs on any of the recent Jetson boards, though for best performance, you'll want to grab the latest Jetson AGX Orin