Accelerating Deep Learning Training with MATLAB and NVIDIA NGC

著者 Johanna Pingel, May 3, 2021

7 ビュー (過去 30 日間) | 0 いいね | 0 コメント

This is post is from Akhil Docca, Senior Product Marketing Manager at NVIDIA and Andy The, AI Partner Manager at MathWorks

Introduction

Data Scientists, researchers and developers need the right software tools to easily build, optimize, and test their AI applications without having to worry about complex environments, interdependencies, and drivers required to run their application. Furthermore, they need the ability to scale-up and scale-out to reduce network training times to enable rapid iterations, with the added flexibility of running their workloads on-premises or in the cloud.

To simplify this entire process MATLAB has partnered with NVIDIA NGC to containerize and deliver its latest software to enabled GPU-accelerated AI workflows.

Containers and NVIDIA NGC

A container is a portable unit of software that combines the application and all its dependencies into a single package. It’s agnostic to the underlying host OS, removing the need to build complex environments and simplifying the application development-to-deployment process.

The NVIDIA NGC catalog is a GPU-optimized hub of AI and HPC software including containers, pre-trained models, SDKs and Helm charts, designed to simplify and accelerate AI workflows. The containers on NGC are scanned for Common Vulnerabilities and Exposures (CVEs) and are tested to both Docker and Singularity runtimes. They are tested for performance on single-GPU to multiple-GPU to multi-node systems. Additionally, the containers are extremely portable as they can be run on-premises, on the cloud, or at the edge.

Training a Computer Vision Model using the MATLAB NGC Container

This guide will help you to run the MATLAB desktop in the cloud on an Amazon EC2® P3 instance. However, you can run the container on a CSP of your choice or your on-premise system.

The MATLAB Deep Learning Container is available in the NVIDIA NGC Catalog.

NOTE: MATLAB R2021a supports the latest NVIDIA Ampere GPUs and will soon be available in the NVIDIA NGC Catalog.

Requirements

Amazon® Web Services account
A MATLAB license valid for the products in the MATLAB Deep Learning Container. For more information about licensing for MathWorks Containers, see Configure License for MathWorks Containers (Licensing on Cloud Platforms).
- You can obtain a trial license for products in the MATLAB Deep Learning Container at MATLAB Trial for Deep Learning on the Cloud.

Create EC2 instance on AWS

Figure 1: AWS management console and list of services

Create a key pair using the Amazon EC2 Console. Make sure that you have access to your private key so you can log in to your instance.

Figure 2: EC2 dashboard and the location of key pairs

Figure 3: Dialog box for creating the key pair

NOTE: Make sure that you download and note the location of the private key when you create a pair, as it is the only way to connect to the instance as an administrator.

Launch the Docker Host Instance

Figure 4: Launching an EC2 instance from the EC2 dashboard

On the Choose AMI page, navigate to the AWS Marketplace and search for the NVIDIA Deep Learning AMI. Select the NVIDIA Deep Learning AMI which is designed for use with NVIDIA NGC containers and the latest GPUs, including NVIDIA Ampere GPUs.

Figure 5: Finding the NVIDIA Deep Learning AMI in the AWS Marketplace

Figure 6: Choosing an NVIDIA GPU enabled instance

NOTE: that not all Availability Zones offer P3 instances. Your Availability Zone is defined during setup of your virtual private cloud (VPC).

On the Configure Instance, Add Storage, and Add Tags pages, configure your instance as needed. If necessary, choose or create appropriate Security Groups for your instance on the Configure Security Group page.

Once configured, select the appropriate key pair option and Launch your instance. Make sure that you have access to your private key so you can log in to your instance.

Figure 7: Selecting the public key pair for the EC2 instance

Click on View Instances and select the running instance once it has finished initializing. Copy the Public IPv4 DNS address by clicking on the copy icon.

Figure 8: Locating and copying the public IPc4 DNS from the EC2 dashboard

Use PuTTY to connect to the EC2 instance

SSH tunneling creates an encrypted channel between your client machine and the container session so that all communications are secure. You must do this to access the desktop of the container running in your EC2 instance.

Using PuTTY to connect to your Docker host instance by going to Category: Session > Host Name (or IP Address) and enter ‘ubuntu@[your Public IPv4 DNS address]’

Figure 9: Configuring the host name for the PuTTY terminal

Go to Category: Connections > SSH > Auth and navigate to the location of the private key of the EC2 instance.

Figure 10: Selecting the private key pair for the PuTTY termina

To connect via a web browser, set up a tunnel to the container port 6080.

In the Source port field, enter a free port on the client machine, for example 6080.
In the Destination field, enter the relevant host port that you connected to container port 6080 in Run the Container, for example, localhost:6080. Note that you must use localhost and not the name of the host instance.

To connect via a VNC client, set up a tunnel to the container port 5901.

In the Source port field, enter a free port on the client machine starting at 5900, for example 5901.
In the Destination field, enter the relevant host port that you connected to container port 5901 in Run the Container, for example, localhost:5901. Note that you must use localhost and not the name of the host instance.

Figure 11: Setting the source and destination ports for the PuTTY terminal

If you are using multiple containers or running a VNC server on the client machine, you must increment the source ports on the client machine until you find a free port, for example 5902 or 6081.

Click Open and click yes for the PuTTY Security Alert as it is just confirming that you want to connect to that host.

Pull & Run the MATLAB container from the NGC catalog

Copy the pull command for the container image release from the MATLAB landing page in the NVIDIA NGC Catalog. In the Tags section, locate the container image release that you want to run. In the Pull column, click the icon to copy the docker pull command. The command is of the form:

docker pull nvcr.io/partners/matlab:r20XYz

where the tag r20XYz must be replaced with the specific MATLAB release name, for example r2021a. Ensure the last part of the pull command matches the MATLAB release you want to use.

Figure 12: Copying the docker pull command from the MATLAB NGC catalog page

Paste the docker pull command into your SSH client, and run the command on your EC2 instance. You do not need to log in to the NVIDIA Container Registry to pull the container image.

Figure 13: Paste the docker pull command in the PuTTY terminal

Running the docker pull command downloads the MATLAB container image (~9GB) onto the host EC2 machine. It might take some time to download and extract the large container image. You only have to pull the container once per EC2 instance.

Run the MATLAB Deep Learning Container by copying it from the MATLAB NGC landing page:

Figure 14: The docker run command from the MATLAB NGC catalog page

Ensure the last part of the run command matches the MATLAB release you want to use.

The options -p hostport:containerport map ports from inside the container to ports on the Docker host so that you can connect to the container desktop. Ports used in the container are 5901 (for VNC connection) and 6080 (for web browser connection). If you are deploying multiple containers on the same host instance, you must increment the host ports until you find a free port. For example:

-p 5902:5901 -p 6081:6080

Paste the docker pull command into PuTTY, and run the command on your EC2 instance. You do not need to log in to the NVIDIA Container Registry to pull the container image.

Figure 15: Pasting the docker pull command in the PuTTY terminal

The MATLAB Deep Learning Container is now running on your EC2 machine.

There are three ways to access MATLAB in the container, but we are going to use the web browser in this example. Please refer to documentation if you want to access the MATLAB container with either a command-line interface or VNC client.

To connect using a web browser, use the URL:

http://localhost:6080

Note that you must use localhost and not the name of the host instance.

If you incremented the client port in Connect Securely, use the appropriate host port number, for example 6081.

You will see a login screen for noVNC. Click connect. When you are prompted for a password to access the desktop, use the password:

matlab

Log in into your MathWorks.com account

Run MATLAB by using the desktop icon and log in using your MathWorks Account.

Figure 16: Launching MATLAB from the desktop in the web browser

If you cannot log in using your MathWorks Account, check that your account is connected to a license that is configured for cloud use. To check, visit License Center.

Figure 17: Enter in your MathWorks Account credential at the prompt

Figure 18: The MATLAB desktop in the web browser

Running a MATLAB deep learning example

To test your container, you can run the Create Simple Deep Learning Network for Classification (Deep Learning Toolbox) example. To try this example, double-click the file MNISTExample.mlx in the Current Folder pane in the MATLAB startup folder.

Figure 19: Running the MNIST example to test the deep learning setup

MATLAB supports training a single network in parallel using multiple GPUs. To enable multi-GPU training in the MATLAB Deep Learning Container, use the trainingOptions function to set 'ExecutionEnvironment' to 'multi-gpu'. For more training options using multiple GPU see Deep Learning with MATLAB on Multiple GPUs.

Observe the training progress of the network in the live plot along with the validation accuracy, losses, and time elapsed.

Figure 20: Training progress plot for MNIST example

Congratulations! You are ready to move on to more sophisticated AI examples using MATLAB, NGC, AWS, and NVIDIA GPUs.

Visit MATLAB documentation to find many deep learning examples and access to pretrained models to continue on your AI journey.

NOTE: that you follow the steps to run the container on a CSP instance of your choice as well as on your on-premise systems.

What's Next

Get started by downloading the MATLAB container from the NGC catalog page.

See MATLAB container documentation for additional topics such as:

Running the MATLAB NGC container on NVIDIA DGX systems
Importing and exporting data to AWS S3
MATLAB licensing options
Installing toolboxes and Add-Ons in the MATLAB NGC Container
Using a VNC client to connect to MATLAB

Also, a new MATLAB R2021a NGC container will soon be available to support NVIDIA Ampere along with the latest CUDA 11.0 and TensorRT 7.2.x libraries, so stay tuned.