{"id":4365,"date":"2020-07-22T17:25:00","date_gmt":"2020-07-22T15:25:00","guid":{"rendered":"https:\/\/blogs.mathworks.com\/student-lounge\/?p=4365"},"modified":"2020-07-22T18:15:10","modified_gmt":"2020-07-22T16:15:10","slug":"yolov2-object-detection-deploy-trained-neural-networks-to-nvidia-embedded-gpus","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/student-lounge\/2020\/07\/22\/yolov2-object-detection-deploy-trained-neural-networks-to-nvidia-embedded-gpus\/","title":{"rendered":"YOLOv2 Object Detection: Deploy Trained Neural Networks to NVIDIA Embedded GPUs"},"content":{"rendered":"<p>Our <a href=\"https:\/\/blogs.mathworks.com\/racing-lounge\/2020\/07\/07\/yolov2-object-detection-data-labelling-to-neural-networks-in-matlab\/\">previous blog post<\/a>, walked us through using MATLAB to label data, and design deep neural networks, as well as importing third-party pre-trained networks. We trained a YOLOv2 network to identify different competition elements from <a href=\"https:\/\/robosub.org\/\">RoboSub<\/a>\u2013an autonomous underwater vehicle (AUV) competition. See our trained network identifying buoys and a navigation gate in a test dataset.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-4343 size-full aligncenter\" src=\"https:\/\/blogs.mathworks.com\/racing-lounge\/files\/2020\/07\/yolo_small_final.gif\" alt=\"\" width=\"400\" height=\"400\" \/><\/p>\n<p>But what next? When building an autonomous, untethered system like an AUV for RoboSub, the challenge is transferring these algorithms from a desktop\/development environment onto an embedded computer, characterized by a low power requirement, but also lower memory and compute capabilities. Remember: whether it&#8217;s life or autonomous systems, everything is a compromise ?. Let\u2019s dive into using MATLAB to deploy this network to an <a href=\"https:\/\/www.nvidia.com\/en-us\/autonomous-machines\/embedded-systems\/\">NVIDIA Jetson<\/a>. NVIDIA Jetson is a power efficient System-on-Module (SOM) with CPU, GPU, PMIC, DRAM, and flash storage for edge AI applications that comes in a variety of configuration specifications. While this workflow is for the NVIDIA Jetson TX2, the same can be applied to <a href=\"https:\/\/www.nvidia.com\/en-us\/autonomous-machines\/embedded-systems\/\">other NVIDIA embedded products<\/a> as well.<\/p>\n<h1>I. Setup the Jetson and the Host Computer<\/h1>\n<p>First, let\u2019s look at some setup steps needed on the MATLAB Host and Jetson.<\/p>\n<p><strong><u>MATLAB Host<\/u><\/strong><\/p>\n<p>GPU Coder helps convert MATLAB code into CUDA code for the NVIDIA Jetson. Here is a list of required products for this example.<\/p>\n<p><strong><u>MathWorks Products<\/u><\/strong><\/p>\n<ul>\n<li>MATLAB\u00ae\u00a0(required)<\/li>\n<li>MATLAB Coder\u2122\u00a0(required)<\/li>\n<li>Parallel Computing Toolbox\u2122 (required)<\/li>\n<li>Deep Learning Toolbox\u2122 (required for deep learning)<\/li>\n<li>GPU Coder Interface for Deep Learning Libraries\u00a0(required for deep learning)<\/li>\n<li>GPU Coder Support Package for NVIDIA GPU&#8217;s\u00a0(required to deploy code to NVIDIA GPUs)<\/li>\n<li>Image Processing Toolbox\u2122 (recommended)<\/li>\n<li>Computer Vision Toolbox\u2122 (recommended)<\/li>\n<li>Embedded Coder\u00ae\u00a0(recommended)<\/li>\n<li>Simulink\u00ae\u00a0(recommended)<\/li>\n<\/ul>\n<p><strong><u>Third-Party Libraries<\/u><\/strong><\/p>\n<p><em>As a reference we are using MATLAB R2020a; please refer to the documentation for the appropriate libraries for other versions.<\/em><\/p>\n<ul>\n<li>C\/C++ Compiler &#8211; <strong>Microsoft Visual Studio 2013 \u2013 2019<\/strong> can be used for <strong>Windows<\/strong>. For <strong>Linux<\/strong> use <strong>GCC C\/C++ compiler 6.3.x<\/strong>. To check if the compiler has been set-up correctly, use the following command:<\/li>\n<\/ul>\n<pre style=\"padding-left: 40px;\"> <strong>mex -setup C++<\/strong><\/pre>\n<ul>\n<li><a href=\"https:\/\/developer.nvidia.com\/cuda-toolkit-archive\">CUDA Toolkit and driver<\/a>: GPU Coder has been tested with CUDA toolkit v10.1 The default installation comes with the <em>nvcc<\/em> compiler, <em>cuFFT<\/em>, <em>cuBLAS<\/em>, <em>cuSOLVER<\/em>, and Thrust libraries<\/li>\n<li><a href=\"https:\/\/developer.nvidia.com\/cudnn\">CUDA Deep Neural Network library (cuDNN):<\/a> The NVIDIA CUDA\u00ae Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for\u00a0deep neural networks. GPU coder has been tested with v7.5.x. Download the files and run the installer.<\/li>\n<li><a href=\"https:\/\/developer.nvidia.com\/tensorrt\">NVIDIA TensorRT<\/a>: High Performance deep learning inference optimizer and runtime library, GPU Coder has been tested with v5.1.x. Supported in R2019b and later for CUDA Code generation from a Windows Machine.<\/li>\n<li><a href=\"https:\/\/opencv.org\/releases\/\">OpenCV Libraries<\/a>: Open Source Computer Vision Library (OpenCV) v3.1.0 is required. The <em>OpenCV<\/em> library that ships with Computer Vision Toolbox does not have all the required libraries and the OpenCV installer does not install them. Therefore, you must download the OpenCV source and build the libraries and add it to the path<\/li>\n<li>CUDA toolkit for ARM\u00ae\u00a0and\u00a0<a href=\"https:\/\/releases.linaro.org\/components\/toolchain\/binaries\/4.9-2016.02\/aarch64-linux-gnu\/\">Linaro GCC 4.9\u00a0<\/a> Use the <em>gcc-linaro-4.9-2016.02-x86_64_aarch64-linux-gnu<\/em> release tarball.<\/li>\n<\/ul>\n<p><span style=\"text-decoration: underline;\"><strong>Environment Variables<\/strong><\/span><\/p>\n<p>Once all the libraries are installed, set the environment variables to point MATLAB to these libraries, these environment variables are erased when a MATLAB session is ended. <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/startup.html\">Add this code to a startup script to ensure these variables are set when MATLAB is launched<\/a>.<\/p>\n<pre>setenv('CUDA_PATH','C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.0');\r\n\r\nsetenv('NVIDIA_CUDNN','C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\cuDNN');\r\n\r\nsetenv('NVIDIA_TENSORT','C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\TensorRT');\r\n\r\nsetenv('OPENCV_DIR','C:\\Program Files\\opencv\\build');\r\n\r\nsetenv('PATH', ...\r\n \u00a0\u00a0 ['C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.0\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\cuDNN\\bin;C:\\Program Files\\opencv\\build\\x64\\vc15\\bin'...\r\n\u00a0\u00a0\u00a0 getenv('PATH')]);<\/pre>\n<h3>NVIDIA Jetson<\/h3>\n<p>Once you have installed <a href=\"https:\/\/developer.nvidia.com\/embedded\/jetpack\">JetPack<\/a> on the Jetson, to run this example, install <a href=\"https:\/\/www.libsdl.org\/\">Simple DirectMedia Layer (SDL v1.2) library<\/a> on the Jetson by running the following commands in a terminal on the Jetson.<\/p>\n<p><strong>$ sudo apt-get install libsdl1.2debian<\/strong><\/p>\n<p><strong>$ sudo apt-get install libsdl1.2-dev<\/strong><\/p>\n<p>Next, set environment variables on the Jetson by adding the following commands to the <strong>$HOME\/ .bashrc<\/strong> file for bash profiles. Use <strong>sudo gedit $HOME\/ .bashrc<\/strong> to open the file:<\/p>\n<p><strong>export PATH=\/usr\/local\/cuda\/bin:$PATH<\/strong><\/p>\n<p><strong>export LD_LIBRARY_PATH=\/usr\/local\/cuda\/lib64:$LD_LIBRARY_PATH<\/strong><\/p>\n<h1>II. Establish a connection with the Jetson and Verify Set-up<\/h1>\n<p>Once all the required libraries are downloaded, use the <strong>coder.checkGPUInstallApp <\/strong>to interactively test if the system variables are set-up correctly and if the MATLAB host is ready to generate and deploy code for the Jetson.<\/p>\n<pre><strong>coder.checkGpuInstallApp;<\/strong><\/pre>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-4379 size-full aligncenter\" src=\"https:\/\/blogs.mathworks.com\/racing-lounge\/files\/2020\/07\/checkGPU2-e1595349736265.png\" alt=\"screenshot\" width=\"500\" height=\"588\" \/><\/p>\n<p>Enter the appropriate device parameters and choose the checks to run. Based on the combination of checks selected MATLAB will:<\/p>\n<ul>\n<li>Verify if the environment variables have been set up correctly<\/li>\n<li>Generate code for an example function with a deep learning network<\/li>\n<li>Compile and deploy it to the Jetson<\/li>\n<li>Produce an html report with the results of the check<\/li>\n<\/ul>\n<p><a href=\"https:\/\/www.mathworks.com\/help\/gpucoder\/gs\/using-the-check-gpu-install-app.html\">Check out this documentation page to learn more.<\/a><\/p>\n<h1>III. Prepare MATLAB Code for Code Generation<\/h1>\n<p>MATLAB code and Simulink models can be converted into low-level code like C\/C++, CUDA, HDL etc. This enables you to get the best of both worlds\u2013develop code in a high-level language, and then implement it as efficient low-level code, optimized for embedded devices. Never used code generation before? <a href=\"https:\/\/www.mathworks.com\/videos\/series\/student-competition-code-generation-training.html\">Check out this tutorial series on the basics of code generation and hardware support.<\/a><\/p>\n<p>Our AUV relies on a camera to see its surroundings, so our object detection system must consist of a camera feeding a video stream into the Jetson. Our network will process the image stream and return the pixel location, confidence score, and classification label of the object of interest. When converting MATLAB Code to C\/C++ or in this case, CUDA, wrap the functionality you want to generate code for in a function as shown below. GPU Coder will convert this MATLAB function into a CUDA function and generate any other files needed to execute this function in a CUDA environment. Use the<span style=\"color: #008000;\"> %#codegen<\/span>\u00a0directive to check for code generation compatibility:<\/p>\n<pre><span class=\"function\">function roboSubPredict()<\/span>\r\n<span class=\"comment\">%#codegen <\/span>\r\n<span class=\"comment\">% Create a Jetson object to access the board and its peripherals <\/span>\r\nhwobj = jetson;\r\nw = webcam(hwobj,1);\r\nd = imageDisplay(hwobj); \r\n<span class=\"comment\">% Load the trained network \u00a0from a MAT file <\/span>\r\npersistent mynet\r\nif isempty(mynet)\r\n \u00a0\u00a0 mynet = coder.loadDeepLearningNetwork('detectorYoloV2.mat');\r\nend\r\n<span class=\"comment\">% Process loop<\/span>\r\n  for i = 1:1e5\r\n \u00a0\u00a0   img = snapshot(w);\r\n \u00a0\u00a0   [bboxes,scores,labels] = mynet.detect(img,'Threshold',0.6);\r\n \u00a0\u00a0   [~,idx] = max(scores);\r\n \u00a0\u00a0   <span class=\"comment\">% Annotate detections in the image.<\/span>\r\n \u00a0\u00a0   if ~isempty(bboxes)\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0   outImg = insertObjectAnnotation(img,'Rectangle', bboxes(idx), cellstr(labels(idx)));\r\n \u00a0\u00a0   else\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0   outImg = img;\r\n \u00a0\u00a0   end\r\n \u00a0\u00a0   image(d,outImg);\r\n  end\r\nend<\/pre>\n<p>This function loads and constructs a trained detector object\u2014<strong>mynet<\/strong> from a MAT-File\u2014with the <strong>coder.loadDeepLearningNetwork<\/strong> function. Declaring <strong>mynet<\/strong> as persistent, stores it in memory and saves having to reconstruct the object at every function call. The <strong>detect<\/strong> method of the object returns the location, confidence score, and classification label. At student competitions, very often teams are training new deep learning detectors during the competition event to counter natural lighting and weather conditions. The network is loaded at compile time so swap out the MAT-file and rebuild the code, to use the most appropriate network for the given conditions, within minutes.<\/p>\n<p>The next step is to acquire the image stream from a camera peripheral connected to the Jetson Board. The GPU Coder Support Package for NVIDIA GPUs will help you do this. Now, for this demonstration we are using an NVIDIA Jetson TX2, but as mentioned above, the support package can generate code for other NVIDIA GPUs as well.<\/p>\n<h3>GPU Coder Support Package for NVIDIA GPUs<\/h3>\n<p>GPU Coder Support Package for NVIDIA GPUs automates the deployment of MATLAB algorithms on embedded NVIDIA GPUs by building and deploying the generated CUDA code onto the GPU and CPU of the target hardware board. It enables you to remotely communicate with the NVIDIA target and control the peripheral devices for prototyping. Examine the code below:<\/p>\n<p>Create a Jetson object to access the board and its peripherals.<\/p>\n<pre>hwobj = jetson;\r\nw = webcam(hwobj,1);\r\nd = imageDisplay(hwobj);<\/pre>\n<p>Calling the <strong>jetson<\/strong> function creates a Jetson object which represents a connection to the Jetson board; use this object and its various methods to create objects for peripherals connected to the board like the <strong>webcam <\/strong>and<strong> imageDisplay<\/strong> as shown above. <a href=\"https:\/\/www.mathworks.com\/help\/supportpkg\/nvidia\/ref\/jetson.html\">For more information on the Jetson object check out its documentation page<\/a>. Use these objects and their methods to stream in video data and display the output on a display device respectively.<\/p>\n<pre>img = snapshot(w); <span class=\"comment\">% w is the webcam object created earlier<\/span>\r\nimage(d, img); <span class=\"comment\">% d is the imageDisplay object<\/span><\/pre>\n<h1>IV. Generate Code and Deploy Application<\/h1>\n<p>Finally, we are ready to deploy our application to the Jetson. Use the coder configurations object to configure the build as below:<\/p>\n<pre>cfg = coder.gpuConfig('exe');\r\ncfg.Hardware = coder.hardware('NVIDIA Jetson');\r\ncfg.Hardware.BuildDir = '~\/remoteBuildTest';\r\ncfg.GenerateExampleMain = 'GenerateCodeAndCompile'<\/pre>\n<p>In this configuration we are instructing GPU Coder to auto generate a main function and compile the code into an executable. This can be adjusted based on your application, need to generate a library to integrate into another codebase? <a title=\"https:\/\/www.mathworks.com\/help\/gpucoder\/ref\/coder.gpuconfig.html (link no longer works)\">Choose the appropriate build type as shown here<\/a>.<\/p>\n<p>Call the codegen command and generate a report to view the generated code.<\/p>\n<pre>codegen('-config ', cfg,'roboSubPredict', '-report');<\/pre>\n<p>This will deploy a <strong>roboSubPredict.elf <\/strong>executable in the build directory on the Jetson. Navigate to that directory to see all the generated files on the Jetson.<\/p>\n<p>Use the Jetson object to launch and kill the application on the Jetson from MATLAB.<\/p>\n<pre>hwobj.runApplication('roboSubPredict');<\/pre>\n<pre>hwobj.killApplication('roboSubPredict');<\/pre>\n<p>Here is a video of our trained detector running as CUDA code on the Jetson, identifying buoys and navigation gates from a video playing on a computer screen.<\/p>\n<p><div style=\"width: 854px;\" class=\"wp-video\"><!--[if lt IE 9]><script>document.createElement('video');<\/script><![endif]-->\n<video class=\"wp-video-shortcode\" id=\"video-4365-1\" width=\"854\" height=\"480\" preload=\"metadata\" controls=\"controls\"><source type=\"video\/mp4\" src=\"https:\/\/blogs.mathworks.com\/racing-lounge\/files\/2020\/07\/Untitled-Project.mp4?_=1\" \/><a href=\"https:\/\/blogs.mathworks.com\/racing-lounge\/files\/2020\/07\/Untitled-Project.mp4\">https:\/\/blogs.mathworks.com\/racing-lounge\/files\/2020\/07\/Untitled-Project.mp4<\/a><\/video><\/div><\/p>\n<p>To learn more and download the code used in this example <a href=\"https:\/\/github.com\/mathworks-robotics\/deep-learning-for-object-detection-yolov2\">visit this GitHub repository<\/a> and <a href=\"https:\/\/www.mathworks.com\/videos\/deploy-yolov2-to-an-nvidia-jetson-1578035533852.html\">watch this video<\/a>.<\/p>\n<p><iframe loading=\"lazy\" title=\"Deploy YOLOv2 to an NVIDIA Jetson\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/fD-PKiqYNKo?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/p>\n<h1>Key Takeaways<\/h1>\n<p>I want to end on some key takeaways:<\/p>\n<ul>\n<li>Code Generation and Deployment helps you get production ready code in minutes without the hassle of having to debug low-level code<\/li>\n<li>The hardware support package helps automate the build and deployment onto the Jetson<\/li>\n<li>Swap out the detector MAT-files and rebuild to generate and deploy code for a new network within minutes<\/li>\n<li>Change the build-type to generate readable, editable code that can be integrated into other codebases<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/student-lounge\/files\/2020\/07\/Untitled-Project.gif\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"Inference model running on an NVIDIA Jetson\" decoding=\"async\" loading=\"lazy\" \/><\/div>\n<p>Our previous blog post, walked us through using MATLAB to label data, and design deep neural networks, as well as importing third-party pre-trained networks. We trained a YOLOv2 network to identify&#8230; <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/student-lounge\/2020\/07\/22\/yolov2-object-detection-deploy-trained-neural-networks-to-nvidia-embedded-gpus\/\">read more >><\/a><\/p>\n","protected":false},"author":163,"featured_media":4391,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[145,365,14],"tags":[151,363,421,94,419],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/posts\/4365"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/users\/163"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/comments?post=4365"}],"version-history":[{"count":14,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/posts\/4365\/revisions"}],"predecessor-version":[{"id":4403,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/posts\/4365\/revisions\/4403"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/media\/4391"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/media?parent=4365"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/categories?post=4365"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/student-lounge\/wp-json\/wp\/v2\/tags?post=4365"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}