{"id":927,"date":"2013-12-10T13:19:47","date_gmt":"2013-12-10T18:19:47","guid":{"rendered":"https:\/\/blogs.mathworks.com\/steve\/?p=927"},"modified":"2019-11-01T09:38:56","modified_gmt":"2019-11-01T13:38:56","slug":"image-processing-with-a-gpu","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/steve\/2013\/12\/10\/image-processing-with-a-gpu\/","title":{"rendered":"Image processing with a GPU"},"content":{"rendered":"<div class=\"content\"><!--introduction-->\r\n<p>\r\n<em>\r\nI'd like to welcome guest blogger Anand Raja for today's post. Anand is a developer on the Image Processing Toolbox team. -Steve\r\n<\/em>\r\n<\/p>\r\n<p>Many desktop computers and laptops now come with fairly powerful Graphics Processing Units (GPU's). Initially, GPU's were mostly used to power computations for graphics applications, but soon people realized that they are just as useful for any kind of numerical computing.<\/p><p>GPU's are made of a large number of processing units which by themselves aren't very powerful, but become formidable when used in tandem. So, if you have processing to be done that is parallelizable, the GPU will be a great fit.<\/p><p>With that in mind, isn't it almost obvious that image processing is a great fit for GPU's! A lot of image processing algorithms are data-parallel, meaning the same task\/computation needs to be performed on many elements of the data. Lots of image processing algorithms either operate on pixels independantly or rely only on a neighborhood around pixels (like image filtering).<\/p><!--\/introduction--><p>So, lets get down to it. My desktop computer has a GPU, and I want to do some image processing using my favorite software (no prizes for guessing), MATLAB. Note that in order to interact with the GPU from MATLAB, you require the <a href=\"https:\/\/www.mathworks.com\/products\/parallel-computing\/\">Parallel Computing Toolbox<\/a>.<\/p><p>I can use the <a title=\"https:\/\/www.mathworks.com\/help\/distcomp\/gpudevice.html (link no longer works)\">gpuDevice<\/a> function to get information about my GPU.<\/p><pre class=\"codeinput\">gpuDevice\r\n<\/pre><pre class=\"codeoutput\">\r\nans = \r\n\r\n  CUDADevice with properties:\r\n\r\n                      Name: 'Tesla C2075'\r\n                     Index: 1\r\n         ComputeCapability: '2.0'\r\n            SupportsDouble: 1\r\n             DriverVersion: 5.5000\r\n            ToolkitVersion: 5\r\n        MaxThreadsPerBlock: 1024\r\n          MaxShmemPerBlock: 49152\r\n        MaxThreadBlockSize: [1024 1024 64]\r\n               MaxGridSize: [65535 65535 65535]\r\n                 SIMDWidth: 32\r\n               TotalMemory: 5.6368e+09\r\n                FreeMemory: 5.5362e+09\r\n       MultiprocessorCount: 14\r\n              ClockRateKHz: 1147000\r\n               ComputeMode: 'Default'\r\n      GPUOverlapsTransfers: 1\r\n    KernelExecutionTimeout: 0\r\n          CanMapHostMemory: 1\r\n           DeviceSupported: 1\r\n            DeviceSelected: 1\r\n\r\n<\/pre><p>Seeing that I have a supported GPU, I can read an image and transfer the image data to my GPU using the constructor for the <a href=\"https:\/\/www.mathworks.com\/help\/distcomp\/gpuarray.html\">gpuArray<\/a> class. The gpuArray object is used to access and work with data on the GPU.<\/p><pre class=\"codeinput\">im = imread(<span class=\"string\">'concordaerial.png'<\/span>);\r\nimGPU = gpuArray(im);\r\nimshow(imGPU);\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2013\/imageprocessingwithagpu_01.png\" alt=\"\"> <p>So imGPU is a gpuArray object containing data of type uint8.<\/p><pre class=\"codeinput\">class(imGPU)\r\nclassUnderlying(imGPU)\r\n<\/pre><pre class=\"codeoutput\">\r\nans =\r\n\r\ngpuArray\r\n\r\n\r\nans =\r\n\r\nuint8\r\n\r\n<\/pre><p>A number of the functions in the Image Processing Toolbox have support for GPU processing in R2013b. This means you can accelerate existing MATLAB scripts and functions with minimal changes. To find the list of functions that are supported for GPU  processing in the Image Processing Toolbox, you can visit <a href=\"https:\/\/www.mathworks.com\/help\/images\/gpu-computing.html\">this<\/a> page. Some of the basic image processing algorithms like image filtering, morphology and edge detection have GPU support and this list is going to grow in the coming releases.<\/p><p>Let's look at a small example to set the ball rolling. Inspired by Brett Schoelson's guest <a href=\"https:\/\/blogs.mathworks.com\/steve\/2012\/11\/13\/image-effects-part-1\/\">post<\/a> a few months back about Photoshop-like effects in MATLAB, I thought I might do one of my own. I call it the <i>canvas effect<\/i> . The canvas effect gives an image the feel of a canvas painting. I had created this little function that does it.<\/p><pre class=\"codeinput\">type <span class=\"string\">canvasEffect<\/span>\r\n<\/pre><pre class=\"codeoutput\">\r\nfunction out = canvasEffect(im)\r\n\r\n% Filter the image with a Gaussian kernel.\r\nh = fspecial('gaussian');\r\nimf = imfilter(im,h);\r\n\r\n% Increase image contrast for each color channel.\r\nima = cat( 3, imadjust(imf(:,:,1)), imadjust(imf(:,:,2)), imadjust(im(:,:,3)) );\r\n\r\n% Perform a morphological closing on the image with a 11x11 structuring\r\n% element.\r\nse = strel('disk',9);\r\nout = imopen(ima,se);\r\n<\/pre><p>It's fairly straight-forward. I first smooth the image with a Gaussian kernel to round off some edges. Then to give the effect of more vivid colors, I increase the contrast for each color channel and finally a morphological opening gives it the canvas painting look. Ofcourse, you could add more bells and whistles by providing additional inputs for the filter kernel size and structuring element, but I wanted to keep it simple.<\/p><p>The script below reads an aerial image and gives it that canvas painting effect.<\/p><pre class=\"codeinput\">type <span class=\"string\">canvasAerialCPU<\/span>\r\n<\/pre><pre class=\"codeoutput\">\r\n% Read the image.\r\nim = imread('concordaerial.png');\r\n\r\n% Produce canvas effect.\r\ncanvas = canvasEffect(im);\r\n\r\n%Display the canvas-ed image.\r\nfigure; imshow(canvas);\r\n<\/pre><p>All the processing in the script above was done on the CPU. To move the computation to the GPU, I need to transfer the image from the CPU to the GPU using the gpuArray constructor. So the new script would like this:<\/p><pre class=\"codeinput\">type <span class=\"string\">canvasAerialGPU<\/span>\r\n\r\nrun <span class=\"string\">canvasAerialGPU<\/span>\r\n<\/pre><pre class=\"codeoutput\">\r\n% Read the image.\r\nim = imread('concordaerial.png');\r\n\r\n% Transfer data to the GPU.\r\nimGPU = gpuArray(im);\r\n\r\n% Produce canvas effect.\r\ncanvasGPU = canvasEffect(imGPU);\r\n\r\n% Gather data back from the GPU.\r\ncanvas = gather(canvasGPU);\r\n\r\n%Display the canvas-ed image.\r\nfigure; imshow(canvas);\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2013\/imageprocessingwithagpu_02.png\" alt=\"\"> <p>Wasn't that easy! All I had to do was convert the image to a gpuArray and gather data back after all the computation was done. The function canvasEffect did not have to change at all. This was because all functions used in canvasEffect were supported for GPU computing.<\/p><p>Let's see how much of a win this is in terms of performance. For a few years now I've been using the timeit function that Steve put on the File Exchange. From R2013b, the <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/timeit.html\">timeit<\/a> function is part of MATLAB.<\/p><pre class=\"codeinput\">cpuTime = timeit(@()canvasEffect(im), 1)\r\n<\/pre><pre class=\"codeoutput\">\r\ncpuTime =\r\n\r\n    3.1311\r\n\r\n<\/pre><p>This function however can only be used to benchmark computations undertaken by the CPU. For the GPU, a special benchmarking function <a href=\"https:\/\/www.mathworks.com\/help\/distcomp\/gputimeit.html\">gputimeit<\/a> has been provided. This function ensures that all computations have completed on the GPU before recording the finish time.<\/p><pre class=\"codeinput\">gpuTime = gputimeit(@()canvasEffect(imGPU), 1)\r\n<\/pre><pre class=\"codeoutput\">\r\ngpuTime =\r\n\r\n    0.2130\r\n\r\n<\/pre><p>So with these small changes, I was able to get a considerable speed-up. Imagine having to do this on an entire data set of images. Working with the GPU would save a lot of processing time.<\/p><pre class=\"codeinput\">speedup = cpuTime\/gpuTime\r\n<\/pre><pre class=\"codeoutput\">\r\nspeedup =\r\n\r\n   14.6990\r\n\r\n<\/pre><p>This is not the complete picture though. I have not accounted for the time it takes to transfer data from the CPU to the GPU and back. This may or may not be significant, depending on how long the computations themselves take. As a rule of thumb, minimize data transfers to and from the device.<\/p><pre class=\"codeinput\">transferTimeToGPU = gputimeit(@()gpuArray(im), 1)\r\ntransferTimeToCPU = gputimeit(@()gather(canvasGPU), 1)\r\n\r\ngpuTime = transferTimeToGPU + gpuTime + transferTimeToCPU;\r\n\r\nspeedup = cpuTime\/gpuTime\r\n<\/pre><pre class=\"codeoutput\">\r\ntransferTimeToGPU =\r\n\r\n    0.0037\r\n\r\n\r\ntransferTimeToCPU =\r\n\r\n    0.0074\r\n\r\n\r\nspeedup =\r\n\r\n   13.9753\r\n\r\n<\/pre><p>I'm going to end with some pointers about the performance of GPU processing.<\/p><div><ol><li>We've seen in the simple example above that you can get a significant speed-up using the supported functions. However, this speed-up is highly dependent on your hardware. If you have a very capable CPU with multiple cores and a not-so-good GPU, the speed-up can appear to be poor because functions like imfilter and imopen are multi-threaded on the CPU. Similarly, if you have a reasonable GPU on a not-so-capable CPU, you're speed-up can make you're GPU execution look faster than it is.<\/li><li>The speed-up achieved is dependent on image size. At smaller image sizes, the overhead of parsing input arguments and moving data to and from the GPU contribute to lower speed-ups. Here's an example that demonstrates this.<\/li><\/ol><\/div><pre class=\"codeinput\"><span class=\"comment\">% Define image sizes over which to measure performance.<\/span>\r\nsizes = [100 500 2000 4000];\r\n\r\n<span class=\"comment\">% Preallocate timing arrays.<\/span>\r\n[cpuTime,gpuTime,transferTimeToGPU,transferTimeToCPU] = deal(zeros(<span class=\"string\">'like'<\/span>,sizes));\r\n\r\n<span class=\"keyword\">for<\/span> n = 1 : numel(sizes)\r\n    size = sizes(n);\r\n\r\n    <span class=\"comment\">% Resize image to size x size.<\/span>\r\n    im_scaled = imresize(im,[size size]);\r\n\r\n    <span class=\"comment\">% Transfer resized image to GPU.<\/span>\r\n    imGPU_scaled = gpuArray(im_scaled);\r\n\r\n    <span class=\"comment\">% Process image on GPU.<\/span>\r\n    canvasGPU_scaled = canvasEffect(imGPU_scaled);\r\n\r\n    <span class=\"comment\">% Time CPU execution.<\/span>\r\n    cpuTime(n)           = timeit(@()canvasEffect(im_scaled), 1);\r\n\r\n    <span class=\"comment\">% Time GPU execution.<\/span>\r\n    transferTimeToGPU(n) = gputimeit(@()gpuArray(im_scaled)       , 1);\r\n    gpuTime(n)           = gputimeit(@()canvasEffect(imGPU_scaled), 1);\r\n    transferTimeToCPU(n) = gputimeit(@()gather(canvasGPU_scaled)  , 1);\r\n<span class=\"keyword\">end<\/span>\r\n\r\ngpuTotalTime = transferTimeToGPU+gpuTime+transferTimeToCPU;\r\n<span class=\"comment\">% Plot CPU vs GPU execution<\/span>\r\nfigure;\r\nplot(sizes, cpuTime, <span class=\"string\">'rx--'<\/span>,<span class=\"keyword\">...<\/span>\r\n     sizes, gpuTotalTime,<span class=\"string\">'bx--'<\/span>,<span class=\"keyword\">...<\/span>\r\n     <span class=\"string\">'LineWidth'<\/span>,2);\r\nlegend(<span class=\"string\">'cpu time'<\/span>,<span class=\"string\">'gpu time'<\/span>);\r\nxlabel(<span class=\"string\">'image size [n x n]'<\/span>);\r\nylabel(<span class=\"string\">'execution time'<\/span>);\r\ntitle(<span class=\"string\">'cpu time vs gpu time'<\/span>);\r\n\r\nfigure;\r\nplot(sizes,cpuTime.\/gpuTotalTime,<span class=\"string\">'LineWidth'<\/span>,2);\r\nxlabel(<span class=\"string\">'image size [n x n]'<\/span>);\r\nylabel(<span class=\"string\">'speed up'<\/span>);\r\ntitle(<span class=\"string\">'Speed up'<\/span>);\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2013\/imageprocessingwithagpu_03.png\" alt=\"\"> <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2013\/imageprocessingwithagpu_04.png\" alt=\"\"> <p>I hope this got you as excited about image processing with GPU's as it did me!<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_e8110affd0e5481fb7a10def479af542() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='e8110affd0e5481fb7a10def479af542 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' e8110affd0e5481fb7a10def479af542';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2013 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_e8110affd0e5481fb7a10def479af542()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2013b<br><\/p><p class=\"footer\"><br>\r\n      Published with MATLAB&reg; R2013b<br><\/p><\/div><!--\r\ne8110affd0e5481fb7a10def479af542 ##### SOURCE BEGIN #####\r\n%% Image Processing with a GPU\r\n\r\n% Many desktop computers and laptops now come with fairly powerful Graphics\r\n\r\n% Processing Units (GPU's). Initially, GPU's were mostly used to power\r\n\r\n% computations for graphics applications, but soon people realized that\r\n\r\n% they are just as useful for any kind of numerical computing.\r\n\r\n%\r\n\r\n% GPU's are made of a large number of processing units which by\r\n\r\n% themselves aren't very powerful, but become formidable when used in\r\n\r\n% tandem. So, if you have processing to be done that is parallelizable, the\r\n\r\n% GPU will be a great fit.\r\n\r\n%\r\n\r\n% With that in mind, isn't it almost obvious that image processing is a\r\n\r\n% great fit for GPU's! A lot of image processing algorithms are\r\n\r\n% data-parallel, meaning the same task\/computation needs to be performed on\r\n\r\n% many elements of the data. Lots of image processing algorithms either\r\n\r\n% operate on pixels independantly or rely only on a neighborhood around\r\n\r\n% pixels (like image filtering).\r\n\r\n\r\n\r\n%%\r\n\r\n% So, lets get down to it. My desktop computer has a GPU, and I want to do\r\n\r\n% some image processing using my favorite software (no prizes for\r\n\r\n% guessing), MATLAB. Note that in order to interact with the GPU from\r\n\r\n% MATLAB, you require the\r\n\r\n% <https:\/\/www.mathworks.com\/products\/parallel-computing\/ Parallel Computing\r\n\r\n% Toolbox>.\r\n\r\n\r\n\r\n%%\r\n\r\n% I can use the <https:\/\/www.mathworks.com\/help\/distcomp\/gpudevice.html\r\n\r\n% gpuDevice> function to get information about my GPU.\r\n\r\n\r\n\r\ngpuDevice\r\n\r\n\r\n\r\n%%\r\n\r\n% Seeing that I have a supported GPU, I can read an image and transfer the\r\n\r\n% image data to my GPU using the constructor for the\r\n\r\n% <https:\/\/www.mathworks.com\/help\/distcomp\/gpuarray.html gpuArray> class.\r\n\r\n% The gpuArray object is used to access and work with data on the GPU.\r\n\r\n\r\n\r\nim = imread('concordaerial.png');\r\n\r\nimGPU = gpuArray(im);\r\n\r\nimshow(imGPU);\r\n\r\n\r\n\r\n%%\r\n\r\n% So imGPU is a gpuArray object containing data of type uint8.\r\n\r\n\r\n\r\nclass(imGPU)\r\n\r\nclassUnderlying(imGPU)\r\n\r\n\r\n\r\n%%\r\n\r\n% A number of the functions in the Image Processing Toolbox have support\r\n\r\n% for GPU processing in R2013b. This means you can accelerate existing\r\n\r\n% MATLAB scripts and functions with minimal changes. To find the list of\r\n\r\n% functions that are supported for GPU  processing in the Image Processing\r\n\r\n% Toolbox, you can visit\r\n\r\n% <https:\/\/www.mathworks.com\/help\/images\/gpu-computing.html this> page. Some\r\n\r\n% of the basic image processing algorithms like image filtering, morphology\r\n\r\n% and edge detection have GPU support and this list is going to grow in the\r\n\r\n% coming releases.\r\n\r\n\r\n\r\n%%\r\n\r\n% Let's look at a small example to set the ball rolling. Inspired by Brett\r\n\r\n% Schoelson's guest\r\n\r\n% <https:\/\/blogs.mathworks.com\/steve\/2012\/11\/13\/image-effects-part-1\/ post>\r\n\r\n% a few months back about Photoshop-like effects in MATLAB, I thought I\r\n\r\n% might do one of my own. I call it the _canvas effect_ . The canvas\r\n\r\n% effect gives an image the feel of a canvas painting. I had created this\r\n\r\n% little function that does it.\r\n\r\n\r\n\r\ntype canvasEffect\r\n\r\n\r\n\r\n%%\r\n\r\n% It's fairly straight-forward. I first smooth the image with a Gaussian\r\n\r\n% kernel to round off some edges. Then to give the effect of more vivid\r\n\r\n% colors, I increase the contrast for each color channel and finally a\r\n\r\n% morphological opening gives it the canvas painting look. Ofcourse, you\r\n\r\n% could add more bells and whistles by providing additional inputs for the\r\n\r\n% filter kernel size and structuring element, but I wanted to keep it\r\n\r\n% simple.\r\n\r\n\r\n\r\n%%\r\n\r\n% The script below reads an aerial image and gives it that canvas\r\n\r\n% painting effect.\r\n\r\n\r\n\r\ntype canvasAerialCPU\r\n\r\n\r\n\r\n%%\r\n\r\n% All the processing in the script above was done on the CPU. To move the\r\n\r\n% computation to the GPU, I need to transfer the image from the CPU to the\r\n\r\n% GPU using the gpuArray constructor. So the new script would like this:\r\n\r\n\r\n\r\ntype canvasAerialGPU\r\n\r\n\r\n\r\nrun canvasAerialGPU\r\n\r\n\r\n\r\n%%\r\n\r\n% Wasn't that easy! All I had to do was convert the image to a gpuArray\r\n\r\n% and gather data back after all the computation was done. The function\r\n\r\n% canvasEffect did not have to change at all. This was because all\r\n\r\n% functions used in canvasEffect were supported for GPU computing.\r\n\r\n\r\n\r\n%%\r\n\r\n% Let's see how much of a win this is in terms of performance. For a few\r\n\r\n% years now I've been using the\r\n\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/18798-timeit-benchmarking-function\r\n\r\n% timeit> function that Steve put on the File Exchange. From R2013b, the\r\n\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/timeit.html timeit> function is\r\n\r\n% part of MATLAB.\r\n\r\n\r\n\r\ncpuTime = timeit(@()canvasEffect(im), 1)\r\n\r\n\r\n\r\n%%\r\n\r\n% This function however can only be used to benchmark computations\r\n\r\n% undertaken by the CPU. For the GPU, a special benchmarking function\r\n\r\n% <https:\/\/www.mathworks.com\/help\/distcomp\/gputimeit.html gputimeit> has\r\n\r\n% been provided. This function ensures that all computations have completed\r\n\r\n% on the GPU before recording the finish time.\r\n\r\n\r\n\r\ngpuTime = gputimeit(@()canvasEffect(imGPU), 1)\r\n\r\n\r\n\r\n%%\r\n\r\n% So with these small changes, I was able to get a considerable speed-up.\r\n\r\n% Imagine having to do this on an entire data set of images. Working with\r\n\r\n% the GPU would save a lot of processing time.\r\n\r\n\r\n\r\nspeedup = cpuTime\/gpuTime\r\n\r\n\r\n\r\n%%\r\n\r\n% This is not the complete picture though. I have not accounted for the\r\n\r\n% time it takes to transfer data from the CPU to the GPU and back. This may\r\n\r\n% or may not be significant, depending on how long the computations\r\n\r\n% themselves take. As a rule of thumb, minimize data transfers to and from\r\n\r\n% the device.\r\n\r\n\r\n\r\ntransferTimeToGPU = gputimeit(@()gpuArray(im), 1)\r\n\r\ntransferTimeToCPU = gputimeit(@()gather(canvasGPU), 1)\r\n\r\n\r\n\r\ngpuTime = transferTimeToGPU + gpuTime + transferTimeToCPU;\r\n\r\n\r\n\r\nspeedup = cpuTime\/gpuTime\r\n\r\n\r\n\r\n%%\r\n\r\n% I'm going to end with some pointers about the performance of GPU\r\n\r\n% processing.\r\n\r\n% \r\n\r\n% # We've seen in the simple example above that you can get a significant\r\n\r\n% speed-up using the supported functions. However, this speed-up is highly\r\n\r\n% dependent on your hardware. If you have a very capable CPU with multiple\r\n\r\n% cores and a not-so-good GPU, the speed-up can appear to be poor because\r\n\r\n% functions like imfilter and imopen are multi-threaded on the CPU.\r\n\r\n% Similarly, if you have a reasonable GPU on a not-so-capable CPU, you're\r\n\r\n% speed-up can make you're GPU execution look faster than it is.\r\n\r\n% # The speed-up achieved is dependent on image size. At smaller image\r\n\r\n% sizes, the overhead of parsing input arguments and moving data to and\r\n\r\n% from the GPU contribute to lower speed-ups. Here's an example that\r\n\r\n% demonstrates this.\r\n\r\n\r\n\r\n% Define image sizes over which to measure performance.\r\n\r\nsizes = [100 500 2000 4000];\r\n\r\n\r\n\r\n% Preallocate timing arrays.\r\n\r\n[cpuTime,gpuTime,transferTimeToGPU,transferTimeToCPU] = deal(zeros('like',sizes));\r\n\r\n\r\n\r\nfor n = 1 : numel(sizes)\r\n\r\n    size = sizes(n);\r\n\r\n    \r\n\r\n    % Resize image to size x size.\r\n\r\n    im_scaled = imresize(im,[size size]);\r\n\r\n    \r\n\r\n    % Transfer resized image to GPU.\r\n\r\n    imGPU_scaled = gpuArray(im_scaled);\r\n\r\n    \r\n\r\n    % Process image on GPU.\r\n\r\n    canvasGPU_scaled = canvasEffect(imGPU_scaled);\r\n\r\n    \r\n\r\n    % Time CPU execution.\r\n\r\n    cpuTime(n)           = timeit(@()canvasEffect(im_scaled), 1);\r\n\r\n    \r\n\r\n    % Time GPU execution.\r\n\r\n    transferTimeToGPU(n) = gputimeit(@()gpuArray(im_scaled)       , 1);\r\n\r\n    gpuTime(n)           = gputimeit(@()canvasEffect(imGPU_scaled), 1);\r\n\r\n    transferTimeToCPU(n) = gputimeit(@()gather(canvasGPU_scaled)  , 1);\r\n\r\nend\r\n\r\n\r\n\r\ngpuTotalTime = transferTimeToGPU+gpuTime+transferTimeToCPU;\r\n\r\n% Plot CPU vs GPU execution\r\n\r\nfigure;\r\n\r\nplot(sizes, cpuTime, 'rxREPLACE_WITH_DASH_DASH',...\r\n\r\n     sizes, gpuTotalTime,'bxREPLACE_WITH_DASH_DASH',...\r\n\r\n     'LineWidth',2);\r\n\r\nlegend('cpu time','gpu time');\r\n\r\nxlabel('image size [n x n]');\r\n\r\nylabel('execution time');\r\n\r\ntitle('cpu time vs gpu time');\r\n\r\n\r\n\r\nfigure;\r\n\r\nplot(sizes,cpuTime.\/gpuTotalTime,'LineWidth',2);\r\n\r\nxlabel('image size [n x n]');\r\n\r\nylabel('speed up');\r\n\r\ntitle('Speed up');\r\n\r\n\r\n\r\n%%\r\n\r\n% I hope this got you as excited about image processing with GPU's as it\r\n\r\n% did me!\r\n##### SOURCE END ##### e8110affd0e5481fb7a10def479af542\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2013\/imageprocessingwithagpu_03.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction-->\r\n<p>\r\n<em>\r\nI'd like to welcome guest blogger Anand Raja for today's post. Anand is a developer on the Image Processing Toolbox team. -Steve\r\n<\/em>\r\n... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/steve\/2013\/12\/10\/image-processing-with-a-gpu\/\">read more >><\/a><\/p>","protected":false},"author":42,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[46,917,1047,1053,725,292,1049,1045,1043,1051,949,370,108,76,156,36,92,162,68,106,474,52,717,94,96,130],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/927"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/users\/42"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/comments?post=927"}],"version-history":[{"count":4,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/927\/revisions"}],"predecessor-version":[{"id":2626,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/927\/revisions\/2626"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/media?parent=927"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/categories?post=927"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/tags?post=927"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}