{"id":10563,"date":"2022-11-10T15:27:33","date_gmt":"2022-11-10T20:27:33","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=10563"},"modified":"2024-02-12T20:40:24","modified_gmt":"2024-02-13T01:40:24","slug":"style-transfer-and-cloud-computing-with-multiple-gpus","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2022\/11\/10\/style-transfer-and-cloud-computing-with-multiple-gpus\/","title":{"rendered":"Style Transfer and Cloud Computing with Multiple GPUs"},"content":{"rendered":"<em>The following post is from <a href=\"https:\/\/www.linkedin.com\/in\/nicholas-ide-6712bb9\/\">Nicholas Ide<\/a>, Product Manager at MathWorks.<\/em>\r\n<h6><\/h6>\r\nWe\u2019re headed to the SC22 supercomputing conference in Dallas next week. Thousands of people are expected to attend this year\u2019s Super Computing event; marking a large-scale return to in-person conferences. If you're one of those people, stop by and say hello! MathWorks will be there, representing Artificial Intelligence, High Performance Computing, and Cloud Computing.\r\n<h6><\/h6>\r\nAt the conference, we\u2019ll be luring people to our booth with free goodies; including Rubiks cubes, stickers, and live demos. Below I\u2019ll walk you through one of the demos we\u2019ll be showing. The new demo is an updated style transfer demo that runs on the cloud, applies AI to images captured by a web cam, uses a GPU to accelerate the underlying computationally intensive algorithm, and leverages multiple GPUs to increase the frame rate of processed results.\r\n<h6><\/h6>\r\nIf you\u2019ve ever wondered how you might use multiple GPUs to speed up a workflow, you should stop by. We\u2019ll show you how parallel constructs like parfeval can be used to leverage more of your CPU and GPU resources for independent tasks.\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-10758 size-medium\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2022\/11\/styled_image-300x234.png\" alt=\"styled image with style transfer\" width=\"300\" height=\"234\" \/>\r\n<h6><\/h6>\r\n&nbsp;\r\n<p style=\"font-size: 18px;\"><strong>What is style transfer?<\/strong><\/p>\r\nWith style transfer, you can apply the stylistic appearance of one image to the scene content of a second image. To learn more about style transfer, read the documentation example <a href=\"https:\/\/www.mathworks.com\/help\/images\/neural-style-transfer-using-deep-learning.html\">Neural Style Transfer Using Deep Learning<\/a>.\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-10608 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2022\/11\/style_transfer.png\" alt=\"image style transfer with deep learning\" width=\"421\" height=\"246\" \/><\/h6>\r\n<strong>Figure:<\/strong> Style transfer with deep learning\r\n<h6><\/h6>\r\nNow, some might argue that style transfer isn\u2019t exactly new, which is true. In fact, we presented a style transfer demo a few years back. Read more about our original demo in this blog post: <a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2019\/03\/13\/gtc-here-we-come\/\">MATLAB Demos at GTC: Style Transfer and Celebrity Lookalikes<\/a>.\r\n<h6><\/h6>\r\nWhat is new, is the acceleration of a computationally expensive demo by just leveraging more hardware with the same core code; speeding up an algorithm that is normally just a few frames per second, into a stream-able algorithm with 4 times that speed. In fact, our demo, which uses a high-end multi-GPU instance in the cloud, can process 15 frames per second.\r\n<h6><\/h6>\r\n&nbsp;\r\n<p style=\"font-size: 18px;\"><strong>Connection to Cloud Machine<\/strong><\/p>\r\nTo run the style transfer demo, we connect to a cloud Windows machine in AWS using <a href=\"https:\/\/www.mathworks.com\/help\/cloudcenter\/mathworks-cloud-center.html?s_tid=CRUX_lftnav\">MathWorks Cloud Center<\/a>.\r\n<h6><\/h6>\r\nIf you have a MathWorks Account, a license for MATLAB, and an AWS account, you can leverage MathWorks Cloud Center to get on-demand access to Windows or Linux instances in the cloud with hardware that far exceeds what you likely have on your desktop now. Getting set up the first time is straightforward, and re-starting your instance is a breeze. The best part is that all the changes you make to the environment persist between re-starts. The one-time effort for initial set-up quicky pays dividends in re-use.\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-10611 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2022\/11\/cloud_machine.png\" alt=\"Windows machine on the AWS cloud\" width=\"494\" height=\"288\" \/><\/h6>\r\n<strong>Figure:\u00a0<\/strong>Windows machine on the cloud\r\n<h6><\/h6>\r\nIf you are new to creating, managing, and accessing machines on AWS with MATLAB, see the documentation for <a href=\"https:\/\/www.mathworks.com\/help\/cloudcenter\/getting-started-with-cloud-center.html?s_tid=CRUX_lftnav\">Getting Started with Cloud Center<\/a> and <a href=\"https:\/\/www.mathworks.com\/help\/cloudcenter\/ug\/start-matlab-on-amazon-web-services-aws-using-cloud-center.html\">Starting MATLAB on AWS Using Cloud Center<\/a>.\r\n<h6><\/h6>\r\n&nbsp;\r\n<p style=\"font-size: 18px;\"><strong>GPU-Accelerated Computing<\/strong><\/p>\r\nWe used <a href=\"https:\/\/www.mathworks.com\/products\/matlab\/app-designer.html\">App Designer<\/a> to easily build a professional-looking app that provides an integrated environment to load frames, perform style transfer using deep learning, leverage one or more GPUs, and display results.\r\n<h6><\/h6>\r\nThe key aspects and controls of the app (starting from the bottom) are:\r\n<h6><\/h6>\r\n<ul>\r\n \t<li><em>Styled output FPS<\/em> \u2013 frame rate (frames\/sec) for the styled output images<\/li>\r\n \t<li><em>Style network prediction time<\/em> \u2013 how long it takes on average to re-style an input frame<\/li>\r\n \t<li><em>Style network prediction rate<\/em> \u2013 desired frames per second for processing. When using a single GPU, the app should be able to process at a rate of approximately 1\/<em>t<\/em>, where\u00a0<em>t<\/em>\u00a0is the prediction time for the style network.<\/li>\r\n \t<li><em>NumWorkers <\/em>\u2013 number of parallel workers in our pool. Each worker can leverage one GPU. We have 4 GPUs on this cloud instance, so we chose 4 workers. With 4 GPUs, we can process up to 4 times as many frames per second.<\/li>\r\n<\/ul>\r\n<h6><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-10740 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2022\/11\/style_transfer_UI.png\" alt=\"User interface for style transfer app\" width=\"625\" height=\"376\" \/><\/h6>\r\n<strong>Figure:\u00a0<\/strong>User interface of the style transfer app\r\n<h6><\/h6>\r\nWe took advantage of MATLAB and Parallel Computing Toolbox features to accelerate the execution of the computationally intensive AI algorithm:\r\n<h6><\/h6>\r\n<ul>\r\n \t<li><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/matlab_prog\/run-functions-on-threads.html\">thread pool<\/a> creates multiple workers within a single MATLAB process to more efficiently share data between workers.<\/li>\r\n \t<li><a href=\"https:\/\/www.mathworks.com\/help\/parallel-computing\/parallel.pool.parfeval.html\">parfeval\u00a0<\/a>queues the frames for parallel processing on multiple GPUs.<\/li>\r\n \t<li><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/parallel.future.aftereach.html\">afterEach<\/a> moves complete frame data from the queue into the app\u2019s display buffer.<\/li>\r\n \t<li><a href=\"https:\/\/www.mathworks.com\/help\/parallel-computing\/parallel.pool.constant.html\">parallel.pool.Constant<\/a> efficiently manages construction and updating of networks on thread workers.<\/li>\r\n<\/ul>\r\n<h6><\/h6>\r\nWhen we run on a machine with multiple GPUs and use a pool of thread workers to execute the parfeval\u00a0queue, each worker in the pool is assigned a GPU in a round-robin fashion. That is, the work is evenly distributed among all available resources.\r\n<h6><\/h6>\r\nIn the following screenshots, you can observe the work distribution among the 4 GPUs of our machine and the performance of the GPUs when increasing the desired frame rate. We first set a conservative frame rate of 3 frames\/sec (based on 0.27 sec processing). Then, we increased the rate to 6 frames\/sec, which is frequent enough to engage 2 GPUs. Finally, we set the frame rate to 15 frames\/sec to see how far we can push our hardware and engage all 4 GPUs.\r\n<h6><\/h6>\r\nNote that based on the last observation, and the relative utilization of the GPUs for the different screenshots, we could likely have achieved close to 4 frames per second with just a single GPU.\r\n<h6><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-10770 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2022\/11\/style_transfer_fps.png\" alt=\"style transfer with increasing frame rate\" width=\"1488\" height=\"1946\" \/><\/h6>\r\n<h6><\/h6>\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n<strong>Figure:<\/strong> Observe the work distribution among 4 GPUs (charts on right) and the styled output FPS (bottom number in the UI) when increasing the desired frame rate from 3 frames\/sec, to 6 frames\/sec, and finally to 16 frames\/sec.\r\n<h6><\/h6>\r\n&nbsp;\r\n<p style=\"font-size: 18px;\"><strong>Conclusion<\/strong><\/p>\r\nIf you\u2019re coming to SC22, stop by our booth to say hi and check out the demo. If you\u2019re not able to attend, leave a comment with anything you\u2019d like to chat about related to supercomputing.\r\n<h6><\/h6>","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2022\/11\/styled_image.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><p>The following post is from Nicholas Ide, Product Manager at MathWorks.\r\n\r\nWe\u2019re headed to the SC22 supercomputing conference in Dallas next week. Thousands of people are expected to attend this... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2022\/11\/10\/style-transfer-and-cloud-computing-with-multiple-gpus\/\">read more >><\/a><\/p>","protected":false},"author":194,"featured_media":10758,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[48,9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/10563"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/194"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=10563"}],"version-history":[{"count":61,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/10563\/revisions"}],"predecessor-version":[{"id":10779,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/10563\/revisions\/10779"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media\/10758"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=10563"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=10563"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=10563"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}