{"id":230,"date":"2022-05-19T09:40:07","date_gmt":"2022-05-19T13:40:07","guid":{"rendered":"https:\/\/blogs.mathworks.com\/matlab\/?p=230"},"modified":"2022-05-19T09:44:32","modified_gmt":"2022-05-19T13:44:32","slug":"how-to-make-a-gpu-version-of-this-matlab-program-by-changing-two-lines","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/matlab\/2022\/05\/19\/how-to-make-a-gpu-version-of-this-matlab-program-by-changing-two-lines\/","title":{"rendered":"How to make a GPU version of this MATLAB program by changing two lines"},"content":{"rendered":"<div class = rtcContent><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>In his article, <\/span><a href = \"https:\/\/blogs.mathworks.com\/steve\/2022\/02\/28\/a-short-game-of-life\/\"><span>A short game of Life<\/span><\/a><span>, Steve Eddins showed us the following few lines of code that impemented <\/span><a href = \"https:\/\/en.wikipedia.org\/wiki\/Conway%27s_Game_of_Life\"><span>Conway's game of life<\/span><\/a><span>.  Steve's version used a 750 x 750 gameboard whereas mine is using 2000 x 2000 because I want something meaty to compute<\/span><\/div><div style=\"background-color: #F5F5F5; margin: 10px 0 10px 0;\"><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 1px solid rgb(191, 191, 191); border-bottom: 0px none rgb(33, 33, 33); border-radius: 4px 4px 0px 0px; padding: 6px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >clear <\/span><span style=\"color: rgb(0, 128, 19);\">% Clear all variables<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >tic<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >im = rand(2000,2000)&gt;0.8;<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">for <\/span><span >k=1:500<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >  t = conv2(im(:,:,k),[2,2,2;2,1,2;2,2,2],<\/span><span style=\"color: rgb(167, 9, 245);\">\"same\"<\/span><span >);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >  im(:,:,1,k+1) = (t &gt; 4) &amp; (t &lt; 8);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">end<\/span><\/span><\/div><\/div><div class=\"inlineWrapper outputs\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 1px solid rgb(191, 191, 191); border-radius: 0px; padding: 0px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >OriginalTime = toc<\/span><\/span><\/div><div  style = 'color: rgb(33, 33, 33); padding: 10px 0px 6px 17px; background: rgb(255, 255, 255) none repeat scroll 0% 0% \/ auto padding-box border-box; font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; overflow-x: hidden; line-height: 17.234px; '><div class='variableElement' style='font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 12px; '>OriginalTime = 39.4715<\/div><\/div><\/div><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>As Steve showed us, this is extremely easy to turn into an animated gif.  Since my gameboard is so big, I'll just zoom in on a small section of it <\/span><\/div><div style=\"background-color: #F5F5F5; margin: 10px 0 10px 0;\"><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 1px solid rgb(191, 191, 191); border-bottom: 1px solid rgb(191, 191, 191); border-radius: 4px; padding: 6px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >imwrite(~im(1:200,1:200,1,:),<\/span><span style=\"color: rgb(167, 9, 245);\">\"Life.gif\"<\/span><span >,<\/span><span style=\"color: rgb(167, 9, 245);\">\"DelayTime\"<\/span><span >,0.02,<\/span><span style=\"color: rgb(167, 9, 245);\">\"LoopCount\"<\/span><span >,Inf)<\/span><\/span><\/div><\/div><\/div><div  style = 'margin: 10px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><img class = \"imageNode\" src = \"https:\/\/blogs.mathworks.com\/matlab\/files\/2022\/05\/life.gif\" width = \"200\" height = \"200\" alt = \"life.gif\" style = \"vertical-align: baseline; width: 200px; height: 200px;\"><\/img><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span style=' font-weight: bold;'>The first step to faster code: Preallocation<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>I'm obsessed with speed in the MATLAB programming language and wondered if there is anything that can be done with these few lines to speed them up without ruining their elegance too much.<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>Let's start with something easy: preallocation of the output arrays, <\/span><span style=' font-family: monospace;'>im<\/span><\/div><div style=\"background-color: #F5F5F5; margin: 10px 0 10px 0;\"><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 1px solid rgb(191, 191, 191); border-bottom: 0px none rgb(33, 33, 33); border-radius: 4px 4px 0px 0px; padding: 6px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >clear <\/span><span style=\"color: rgb(0, 128, 19);\">% Clear all variables<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >tic<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >im = zeros(2000,2000,1,501,<\/span><span style=\"color: rgb(167, 9, 245);\">\"logical\"<\/span><span >);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >im(:,:,1,1) = rand(2000,2000) &gt; 0.8;<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">for <\/span><span >k=1:500<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >  t = conv2(im(:,:,k),[2,2,2;2,1,2;2,2,2],<\/span><span style=\"color: rgb(167, 9, 245);\">\"same\"<\/span><span >);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >  im(:,:,1,k+1) = (t &gt; 4) &amp; (t &lt; 8);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">end<\/span><\/span><\/div><\/div><div class=\"inlineWrapper outputs\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 1px solid rgb(191, 191, 191); border-radius: 0px; padding: 0px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >PreallocatedTime = toc<\/span><\/span><\/div><div  style = 'color: rgb(33, 33, 33); padding: 10px 0px 6px 17px; background: rgb(255, 255, 255) none repeat scroll 0% 0% \/ auto padding-box border-box; font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; overflow-x: hidden; line-height: 17.234px; '><div class='variableElement' style='font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 12px; '>PreallocatedTime = 14.2942<\/div><\/div><\/div><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>Over 2x speed up for one extra line and one modified line.  Not bad but can I go any further?<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span style=' font-weight: bold;'>Moving from a CPU implementation to a GPU implementation of many MATLAB codes is very easy!<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>My computer has quite a nice NVIDIA GPU which I can access using <\/span><a href = \"https:\/\/se.mathworks.com\/products\/parallel-computing.html\"><span>Parallel Computing Toolbox<\/span><\/a><\/div><div style=\"background-color: #F5F5F5; margin: 10px 0 10px 0;\"><div class=\"inlineWrapper outputs\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 1px solid rgb(191, 191, 191); border-bottom: 1px solid rgb(191, 191, 191); border-radius: 4px 4px 0px 0px; padding: 6px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >gpuDevice()<\/span><\/span><\/div><div  style = 'color: rgb(33, 33, 33); padding: 10px 0px 6px 17px; background: rgb(255, 255, 255) none repeat scroll 0% 0% \/ auto padding-box border-box; font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; overflow-x: hidden; line-height: 17.234px; '><div class=\"inlineElement eoOutputWrapper embeddedOutputsVariableStringElement\" uid=\"2411CBAB\" prevent-scroll=\"true\" data-testid=\"output_2\" style=\"width: 1146px; white-space: pre; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><div class=\"textElement eoOutputContent\" data-width=\"1116\" data-height=\"355\" data-hashorizontaloverflow=\"false\" style=\"max-height: 366px; white-space: pre; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><div style=\"white-space: pre; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><span class=\"variableNameElement\" style=\"white-space: pre; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\">ans = <\/span><\/div><div style=\"white-space: pre; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\">  CUDADevice with properties:\r\n\r\n                      Name: 'NVIDIA GeForce RTX 3070'\r\n                     Index: 1\r\n         ComputeCapability: '8.6'\r\n            SupportsDouble: 1\r\n             DriverVersion: 11.6000\r\n            ToolkitVersion: 11.2000\r\n        MaxThreadsPerBlock: 1024\r\n          MaxShmemPerBlock: 49152\r\n        MaxThreadBlockSize: [1024 1024 64]\r\n               MaxGridSize: [2.1475e+09 65535 65535]\r\n                 SIMDWidth: 32\r\n               TotalMemory: 8.5894e+09\r\n           AvailableMemory: 7.2939e+09\r\n       MultiprocessorCount: 46\r\n              ClockRateKHz: 1725000\r\n               ComputeMode: 'Default'\r\n      GPUOverlapsTransfers: 1\r\n    KernelExecutionTimeout: 1\r\n          CanMapHostMemory: 1\r\n           DeviceSupported: 1\r\n           DeviceAvailable: 1\r\n            DeviceSelected: 1\r\n<\/div><\/div><\/div><\/div><\/div><\/div><div  style = 'margin: 10px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>GPUs are perfectly suited to this kind of thing but writing a GPU version of this would be difficult right?  Late nights of hardcore CUDA coding await. <\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>But there's another way! <\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>Well over <\/span><a href = \"https:\/\/se.mathworks.com\/help\/matlab\/referencelist.html?type=function&amp;listtype=cat&amp;category=index&amp;blocktype=all&amp;capability=gpuarrays&amp;s_tid=CRUX_lftnav\"><span>1000 MATLAB functions<\/span><\/a><span> (including those in the official Toolboxes) are 'overloaded' with the <\/span><span style=' font-family: monospace;'>gpuArray<\/span><span> data type.  What this means in practice is that whereas this code runs on the CPU:<\/span><\/div><div style=\"background-color: #F5F5F5; margin: 10px 0 10px 0;\"><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 1px solid rgb(191, 191, 191); border-bottom: 0px none rgb(33, 33, 33); border-radius: 4px 4px 0px 0px; padding: 6px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >A = rand(3);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >B = rand(4);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper outputs\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 1px solid rgb(191, 191, 191); border-radius: 0px; padding: 0px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >Cfull = conv2(A,B)<\/span><\/span><\/div><div  style = 'color: rgb(33, 33, 33); padding: 10px 0px 6px 17px; background: rgb(255, 255, 255) none repeat scroll 0% 0% \/ auto padding-box border-box; font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; overflow-x: hidden; line-height: 17.234px; '><div class=\"inlineElement eoOutputWrapper embeddedOutputsVariableMatrixElement\" uid=\"E22AF5B0\" prevent-scroll=\"true\" data-testid=\"output_3\" data-width=\"1116\" style=\"width: 1146px; white-space: normal; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><div class=\"matrixElement veSpecifier saveLoad eoOutputContent\" style=\"white-space: normal; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><div class=\"veVariableName variableNameElement double\" style=\"width: 1116px; white-space: normal; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><div class=\"headerElementClickToInteract\" style=\"white-space: normal; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><span style=\"white-space: normal; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\">Cfull = <\/span><span class=\"veVariableValueSummary veMetaSummary\" style=\"white-space: normal; font-style: normal; color: rgb(179, 179, 179); font-size: 12px;\">6\u00d76<\/span><\/div><\/div><div class=\"valueContainer\" data-layout=\"{&quot;columnWidth&quot;:66,&quot;totalColumns&quot;:&quot;6&quot;,&quot;totalRows&quot;:&quot;6&quot;,&quot;charsPerColumn&quot;:10}\" style=\"white-space: nowrap; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><div class=\"variableValue\" style=\"width: 398px; white-space: pre; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\">    0.2357    0.1997    0.7748    0.4111    0.5196    0.0080\r\n    0.2237    0.4145    1.3158    1.0064    1.2236    0.0201\r\n    0.6943    0.7811    2.3386    1.8777    1.5679    0.0412\r\n    0.3726    0.9340    2.2665    2.3125    1.7186    0.5522\r\n    0.5210    0.8183    1.7277    1.9970    1.4909    0.8490\r\n    0.2133    0.8088    1.2790    1.8410    1.2599    0.5397\r\n<\/div><div class=\"horizontalEllipsis hide\" style=\"white-space: nowrap; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><\/div><div class=\"verticalEllipsis hide\" style=\"white-space: nowrap; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><\/div><\/div><\/div><div class=\"outputLayer selectedOutputDecorationLayer doNotExport\" style=\"white-space: normal; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><\/div><div class=\"outputLayer activeOutputDecorationLayer doNotExport\" style=\"white-space: normal; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><\/div><div class=\"outputLayer scrollableOutputDecorationLayer doNotExport\" style=\"white-space: normal; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><\/div><\/div><\/div><\/div><\/div><div  style = 'margin: 10px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>    This code runs on the GPU:<\/span><\/div><div style=\"background-color: #F5F5F5; margin: 10px 0 10px 0;\"><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 1px solid rgb(191, 191, 191); border-bottom: 0px none rgb(33, 33, 33); border-radius: 4px 4px 0px 0px; padding: 6px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >gpuA = gpuArray(A);  <\/span><span style=\"color: rgb(0, 128, 19);\">% Transfer A to the GPU and call it gpuA<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >gpuB = gpuArray(B);  <\/span><span style=\"color: rgb(0, 128, 19);\">% Transfer B to the GPU and call it gpuB<\/span><\/span><\/div><\/div><div class=\"inlineWrapper outputs\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 1px solid rgb(191, 191, 191); border-radius: 0px; padding: 0px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >Cfull_gpu = conv2(gpuA,gpuB) <\/span><span style=\"color: rgb(0, 128, 19);\">% This now runs on the GPU <\/span><\/span><\/div><div  style = 'color: rgb(33, 33, 33); padding: 10px 0px 6px 17px; background: rgb(255, 255, 255) none repeat scroll 0% 0% \/ auto padding-box border-box; font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; overflow-x: hidden; line-height: 17.234px; '><div class=\"inlineElement eoOutputWrapper embeddedOutputsTextElement\" uid=\"CB5E6FC3\" prevent-scroll=\"true\" data-testid=\"output_4\" style=\"width: 1146px; white-space: pre; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><div class=\"textElement eoOutputContent\" data-width=\"1116\" data-height=\"114\" data-hashorizontaloverflow=\"false\" style=\"max-height: 261px; white-space: pre; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\">Cfull_gpu =\r\n\r\n    0.2357    0.1997    0.7748    0.4111    0.5196    0.0080\r\n    0.2237    0.4145    1.3158    1.0064    1.2236    0.0201\r\n    0.6943    0.7811    2.3386    1.8777    1.5679    0.0412\r\n    0.3726    0.9340    2.2665    2.3125    1.7186    0.5522\r\n    0.5210    0.8183    1.7277    1.9970    1.4909    0.8490\r\n    0.2133    0.8088    1.2790    1.8410    1.2599    0.5397<\/div><\/div><\/div><\/div><\/div><div  style = 'margin: 10px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>All you need to do in order to make over 1000 MATLAB functions work on an NVIDIA GPU is Parallel Computing Toolbox which allows you to give them <\/span><span style=' font-family: monospace;'>gpuArray<\/span><span>s instead of normal arrays.  <\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>Whether or not you'll actually get a speed up depends on many factors but to get started, to simply get things running on the GPU instead of the CPU, this is it!<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>Back to Steve's code.  What I need to do is change the initialisation of <\/span><span style=' font-family: monospace;'>im<\/span><span style=' font-weight: bold; font-family: monospace;'> <\/span><span>to a <\/span><span style=' font-family: monospace;'>gpuArray<\/span><span> and everything will automagically run on the GPU.  That is I change<\/span><\/div><div style=\"background-color: #F5F5F5; margin: 10px 0 10px 0;\"><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 1px solid rgb(191, 191, 191); border-bottom: 0px none rgb(33, 33, 33); border-radius: 4px 4px 0px 0px; padding: 6px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >im = zeros(2000,2000,1,501,<\/span><span style=\"color: rgb(167, 9, 245);\">\"logical\"<\/span><span >);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 1px solid rgb(191, 191, 191); border-radius: 0px 0px 4px 4px; padding: 0px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >im(:,:,1,1) = rand(2000,2000) &gt; 0.8;<\/span><\/span><\/div><\/div><\/div><div  style = 'margin: 10px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>to<\/span><\/div><div style=\"background-color: #F5F5F5; margin: 10px 0 10px 0;\"><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 1px solid rgb(191, 191, 191); border-bottom: 0px none rgb(33, 33, 33); border-radius: 4px 4px 0px 0px; padding: 6px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >im = zeros(2000,2000,1,501,<\/span><span style=\"color: rgb(167, 9, 245);\">\"logical\"<\/span><span >,<\/span><span style=\"color: rgb(167, 9, 245);\">\"gpuArray\"<\/span><span >);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 1px solid rgb(191, 191, 191); border-radius: 0px 0px 4px 4px; padding: 0px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >im(:,:,1,1) = rand(2000,2000,<\/span><span style=\"color: rgb(167, 9, 245);\">\"gpuArray\"<\/span><span >) &gt; 0.8;<\/span><\/span><\/div><\/div><\/div><div  style = 'margin: 10px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>Let's give it a try<\/span><\/div><div style=\"background-color: #F5F5F5; margin: 10px 0 10px 0;\"><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 1px solid rgb(191, 191, 191); border-bottom: 0px none rgb(33, 33, 33); border-radius: 4px 4px 0px 0px; padding: 6px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >clear <\/span><span style=\"color: rgb(0, 128, 19);\">% Clear all variables<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >dev = gpuDevice();<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >tic<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >im = zeros(2000,2000,1,501,<\/span><span style=\"color: rgb(167, 9, 245);\">\"logical\"<\/span><span >,<\/span><span style=\"color: rgb(167, 9, 245);\">\"gpuArray\"<\/span><span >);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >im(:,:,1,1) = rand(2000,2000,<\/span><span style=\"color: rgb(167, 9, 245);\">\"gpuArray\"<\/span><span >) &gt; 0.8;<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">for <\/span><span >k=1:500<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >  t = conv2(im(:,:,k),[2,2,2;2,1,2;2,2,2],<\/span><span style=\"color: rgb(167, 9, 245);\">\"same\"<\/span><span >);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >  im(:,:,1,k+1) = (t &gt; 4) &amp; (t &lt; 8);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">end<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >wait(dev);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper outputs\"><div  style = 'border-left: 1px solid rgb(191, 191, 191); border-right: 1px solid rgb(191, 191, 191); border-top: 0px none rgb(33, 33, 33); border-bottom: 1px solid rgb(191, 191, 191); border-radius: 0px; padding: 0px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >GpuTime = toc<\/span><\/span><\/div><div  style = 'color: rgb(33, 33, 33); padding: 10px 0px 6px 17px; background: rgb(255, 255, 255) none repeat scroll 0% 0% \/ auto padding-box border-box; font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; overflow-x: hidden; line-height: 17.234px; '><div class='variableElement' style='font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 12px; '>GpuTime = 5.3013<\/div><\/div><\/div><\/div><div  style = 'margin: 10px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>Almost 3x faster than the CPU version that used preallocated arrays and around 7x faster than the original!  Not bad considering I only changed 2 lines of code.  Furthermore, this has to be the easiest GPU 'port' of a simulation I've ever written<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>Now, I am sure that there are CUDA experts out there who could do better than this -- squeezing every last drop of performance possible from the poor overworked GPU -- but 3x speedup for so little work is pretty good going and there are several options in the MATLAB ecosystem that allow you to go deeper and explore other approaches.<\/span><\/div><h3  style = 'margin: 15px 10px 5px 4px; padding: 0px; line-height: 18px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 17px; font-weight: 700; text-align: left; '><span>What's going on with wait(dev)?<\/span><\/h3><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>The eagle-eyed among you might have noticed that my GPU version has an extra couple of lines in it that I haven't mentioned yet.  As we speak, I can feel you reaching for the comment button to tell me that I lied to you....I changed four lines not two! Here are the two lines I conveniently chose not to mention to you<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><img class = \"imageNode\" src = \"https:\/\/blogs.mathworks.com\/matlab\/files\/2022\/05\/StevesLife_gpu_2.png\" width = \"446\" height = \"240\" alt = \"\" style = \"vertical-align: baseline; width: 446px; height: 240px;\"><\/img><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>The reason for these lines is all about timing.  You see, when you ask for a computation to be done on the GPU, MATLAB kicks things off and moves to the next line <\/span><span style=' font-weight: bold;'>without waiting for the GPU to finish the calculation. <\/span><span>This can be used for some very nifty interleaving of code, where you have things running on the CPU and GPU simultaneously, but it can also mess up timing if you use <\/span><span style=' font-family: monospace;'>tic\/toc<\/span><span>.   Timing GPU code can be tricky which is why MathWorks also give you the <\/span><a href = \"https:\/\/mathworks.com\/help\/parallel-computing\/gputimeit.html?s_tid=doc_ta\"><span style=' font-family: monospace;'>gputimeit<\/span><\/a><span> command.<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>If all I wanted to do was run the code, and not time it, then I wouldn't need to bother with these two extra lines.  So what I told you is true... from a certain point of view.<\/span><\/div><h3  style = 'margin: 15px 10px 5px 4px; padding: 0px; line-height: 18px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 17px; font-weight: 700; text-align: left; '><span style=' font-weight: bold;'>System details<\/span><\/h3><ul  style = 'margin: 10px 0px 20px; padding-left: 0px; font-family: Helvetica, Arial, sans-serif; font-size: 14px; '><li  style = 'margin-left: 56px; line-height: 21px; min-height: 0px; text-align: left; white-space: pre-wrap; '><span>MATLAB R2022a<\/span><\/li><li  style = 'margin-left: 56px; line-height: 21px; min-height: 0px; text-align: left; white-space: pre-wrap; '><span>CPU: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz<\/span><\/li><li  style = 'margin-left: 56px; line-height: 21px; min-height: 0px; text-align: left; white-space: pre-wrap; '><span>GPU: NVIDIA GeForce RTX 3070<\/span><\/li><li  style = 'margin-left: 56px; line-height: 21px; min-height: 0px; text-align: left; white-space: pre-wrap; '><span>OS: Windows 11<\/span><\/li><\/ul><h3  style = 'margin: 15px 10px 5px 4px; padding: 0px; line-height: 18px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 17px; font-weight: 700; text-align: left; '><span style=' font-weight: bold;'>Over to you<\/span><\/h3><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span>Have you got some code that can be easily made to run on a GPU like this and show a performance boost? <\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><\/div>\r\n<\/div><script type=\"text\/javascript\">var css = '\/* Styling that is common to warnings and errors is in diagnosticOutput.css *\/.embeddedOutputsErrorElement {    min-height: 18px;    max-height: 550px;} .embeddedOutputsErrorElement .diagnosticMessage-errorType {    overflow: auto;} .embeddedOutputsErrorElement.inlineElement {} .embeddedOutputsErrorElement.rightPaneElement {} \/* Styling that is common to warnings and errors is in diagnosticOutput.css *\/.embeddedOutputsWarningElement {    min-height: 18px;    max-height: 550px;} .embeddedOutputsWarningElement .diagnosticMessage-warningType {    overflow: auto;} .embeddedOutputsWarningElement.inlineElement {} .embeddedOutputsWarningElement.rightPaneElement {} \/* Copyright 2015-2019 The MathWorks, Inc. *\/\/* In this file, styles are not scoped to rtcContainer since they could be in the Dojo Tooltip *\/.diagnosticMessage-wrapper {    font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace;    font-size: 12px;} .diagnosticMessage-wrapper.diagnosticMessage-warningType {    color: rgb(255,100,0);} .diagnosticMessage-wrapper.diagnosticMessage-warningType a {    color: rgb(255,100,0);    text-decoration: underline;} .diagnosticMessage-wrapper.diagnosticMessage-errorType {    color: rgb(230,0,0);} .diagnosticMessage-wrapper.diagnosticMessage-errorType a {    color: rgb(230,0,0);    text-decoration: underline;} .diagnosticMessage-wrapper .diagnosticMessage-messagePart,.diagnosticMessage-wrapper .diagnosticMessage-causePart {    white-space: pre-wrap;} .diagnosticMessage-wrapper .diagnosticMessage-stackPart {    white-space: pre;} .embeddedOutputsTextElement,.embeddedOutputsVariableStringElement {    white-space: pre;    word-wrap:  initial;    min-height: 18px;    max-height: 550px;} .embeddedOutputsTextElement .textElement,.embeddedOutputsVariableStringElement .textElement {    overflow: auto;} .textElement,.rtcDataTipElement .textElement {    padding-top: 2px;} .embeddedOutputsTextElement.inlineElement,.embeddedOutputsVariableStringElement.inlineElement {} .inlineElement .textElement {} .embeddedOutputsTextElement.rightPaneElement,.embeddedOutputsVariableStringElement.rightPaneElement {    min-height: 16px;} .rightPaneElement .textElement {    padding-top: 2px;    padding-left: 9px;} .variableValue { width: 100% !important; } .embeddedOutputsMatrixElement,.eoOutputWrapper .matrixElement {    min-height: 18px;    box-sizing: border-box;} .embeddedOutputsMatrixElement .matrixElement,.eoOutputWrapper  .matrixElement,.rtcDataTipElement .matrixElement {    position: relative;} .matrixElement .variableValue,.rtcDataTipElement .matrixElement .variableValue {    white-space: pre;    display: inline-block;    vertical-align: top;    overflow: hidden;} .embeddedOutputsMatrixElement.inlineElement {} .embeddedOutputsMatrixElement.inlineElement .topHeaderWrapper {    display: none;} .embeddedOutputsMatrixElement.inlineElement .veTable .body {    padding-top: 0 !important;    max-height: 100px;} .inlineElement .matrixElement {    max-height: 300px;} .embeddedOutputsMatrixElement.rightPaneElement {} .rightPaneElement .matrixElement,.rtcDataTipElement .matrixElement {    overflow: hidden;    padding-left: 9px;} .rightPaneElement .matrixElement {    margin-bottom: -1px;} .embeddedOutputsMatrixElement .matrixElement .valueContainer,.eoOutputWrapper .matrixElement .valueContainer,.rtcDataTipElement .matrixElement .valueContainer {    white-space: nowrap;    margin-bottom: 3px;} .embeddedOutputsMatrixElement .matrixElement .valueContainer .horizontalEllipsis.hide,.embeddedOutputsMatrixElement .matrixElement .verticalEllipsis.hide,.eoOutputWrapper .matrixElement .valueContainer .horizontalEllipsis.hide,.eoOutputWrapper .matrixElement .verticalEllipsis.hide,.rtcDataTipElement .matrixElement .valueContainer .horizontalEllipsis.hide,.rtcDataTipElement .matrixElement .verticalEllipsis.hide {    display: none;} .embeddedOutputsVariableMatrixElement .matrixElement .valueContainer.hideEllipses .verticalEllipsis, .embeddedOutputsVariableMatrixElement .matrixElement .valueContainer.hideEllipses .horizontalEllipsis {    display:none;} .embeddedOutputsMatrixElement .matrixElement .valueContainer .horizontalEllipsis,.eoOutputWrapper .matrixElement .valueContainer .horizontalEllipsis {    margin-bottom: -3px;} .eoOutputWrapper .embeddedOutputsVariableMatrixElement .matrixElement .valueContainer {    cursor: default !important;} .embeddedOutputsVariableElement {    white-space: pre-wrap;    word-wrap: break-word;    min-height: 18px;    max-height: 250px;    overflow: auto;} .variableElement {} .embeddedOutputsVariableElement.inlineElement {} .inlineElement .variableElement {} .embeddedOutputsVariableElement.rightPaneElement {    min-height: 16px;} .rightPaneElement .variableElement {    padding-top: 2px;    padding-left: 9px;} .outputsOnRight .embeddedOutputsVariableElement.rightPaneElement .eoOutputContent {    \/* Remove extra space allocated for navigation border *\/    margin-top: 0;    margin-bottom: 0;} .variableNameElement {    margin-bottom: 3px;    display: inline-block;} \/* * Ellipses as base64 for HTML export. *\/.matrixElement .horizontalEllipsis,.rtcDataTipElement .matrixElement .horizontalEllipsis {    display: inline-block;    margin-top: 3px;    \/* base64 encoded version of images-liveeditor\/HEllipsis.png *\/    width: 30px;    height: 12px;    background-repeat: no-repeat;    background-image: url(\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAB0AAAAJCAYAAADO1CeCAAAAJUlEQVR42mP4\/\/8\/A70xw0i29BUDFPxnAEtTW37wWDqakIa4pQDvOOG89lHX2gAAAABJRU5ErkJggg==\");} .matrixElement .verticalEllipsis,.textElement .verticalEllipsis,.rtcDataTipElement .matrixElement .verticalEllipsis,.rtcDataTipElement .textElement .verticalEllipsis {    margin-left: 35px;    \/* base64 encoded version of images-liveeditor\/VEllipsis.png *\/    width: 12px;    height: 30px;    background-repeat: no-repeat;    background-image: url(\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAZCAYAAAAIcL+IAAAALklEQVR42mP4\/\/8\/AzGYgWyFMECMwv8QddRS+P\/\/KyimlmcGUOFoOI6GI\/UVAgDnd8Dd4+NCwgAAAABJRU5ErkJggg==\");}'; var head = document.head || document.getElementsByTagName('head')[0], style = document.createElement('style'); head.appendChild(style); style.type = 'text\/css'; if (style.styleSheet){ style.styleSheet.cssText = css; } else { style.appendChild(document.createTextNode(css)); }<\/script><a href=\"https:\/\/blogs.mathworks.com\/matlab\/files\/2022\/05\/StevesLife_gpu.mlx\"><button class=\"btn btn-sm btn_color_blue pull-right add_margin_10\">Download Live Script<\/button><\/a>","protected":false},"excerpt":{"rendered":"<p>In his article, A short game of Life, Steve Eddins showed us the following few lines of code that impemented Conway's game of life.  Steve's version used a 750 x 750 gameboard whereas mine is using... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/matlab\/2022\/05\/19\/how-to-make-a-gpu-version-of-this-matlab-program-by-changing-two-lines\/\">read more >><\/a><\/p>","protected":false},"author":176,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[23,17,14],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/posts\/230"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/users\/176"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/comments?post=230"}],"version-history":[{"count":4,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/posts\/230\/revisions"}],"predecessor-version":[{"id":254,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/posts\/230\/revisions\/254"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/media?parent=230"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/categories?post=230"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/tags?post=230"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}