{"id":3372,"date":"2019-07-15T11:55:36","date_gmt":"2019-07-15T15:55:36","guid":{"rendered":"https:\/\/blogs.mathworks.com\/steve\/?p=3372"},"modified":"2019-11-01T22:46:22","modified_gmt":"2019-11-02T02:46:22","slug":"the-curious-case-of-ordfilt2-performance","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/steve\/2019\/07\/15\/the-curious-case-of-ordfilt2-performance\/","title":{"rendered":"The curious case of ordfilt2 performance"},"content":{"rendered":"<div class=\"content\"><p><i>For today's post, I'd like to welcome guest author Kushagr Gupta, a developer on the Image Processing Toolbox team. -Steve<\/i><\/p><p>The Image Processing Toolbox function <tt>ordfilt2<\/tt> can be used to perform various types of 2-D statistical filtering, including minimum or maximum filtering and median filtering. A straightforward implementation of this function is expected to scale with $O(r^2)$, where $r$ is the width of the window (assuming a square window). To compute each output pixel, we have to select the $k$-th largest value from the set of $r^2$ input image pixels, which can be done in $O(r^2)$ time with an algorithm such as quick-select.<\/p><p>The function <tt>ordfilt2<\/tt>, however, does better than that; it scales with $O\\left(r\\right)$complexity. That's because <tt>ordfilt2<\/tt> uses a sliding window-based histogram approach to filter the data. When the algorithm moves from one input image pixel to the next, only part of the histogram needs to be updated. (Reference: Huang, T.S., Yang, G.J., and Yang, G.Y., \"A fast two-dimensional median filtering algorithm,\" <i>IEEE transactions on Acoustics, Speech and Signal Processing<\/i>, Vol ASSP 27, No. 1, February 1979.)<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/steve\/files\/sliding-window-diagram.png\" alt=\"\"> <\/p><p>As shown in the figure above, r additions and r subtractions need to be carried out for updating the histogram. Computing the output pixel from the histogram doesn't depend on the window size, and so the overall complexity is $O(r)$.<\/p><p>We recently received an interesting tech support case related to performance of <tt>ordfilt2<\/tt> for uint16 random input data. One of our customers reported the time taken to filter an image with a smaller window size was <b>higher<\/b> than the time taken to process the same image with a relatively larger window size.<\/p><p>Was this just a noisy observation? Was this reproducible? Was this a bug in our implementation?<\/p><p>We investigated?<\/p><p>The customer was generous enough to give us reproduction code. And, the behavior was indeed reproducible! (Many, many times!). What was the explanation?<\/p><p>Here are the reproduction steps:<\/p><pre class=\"codeinput\">Im = im2uint16(rand(1e3));\r\nn = [7 9 11 13 15 21 25 31 35 41 49 51 57 65 73];\r\nt = zeros(length(n),1);\r\n<span class=\"keyword\">for<\/span> id=1:length(n)\r\n   t(id) = timeit(@() ordfilt2(Im,ceil(n(id)^2*0.5),<span class=\"keyword\">...<\/span>\r\n       true(n(id))));\r\n<span class=\"keyword\">end<\/span>\r\n\r\nplot(n,t)\r\ngrid <span class=\"string\">on<\/span>\r\ntitle(<span class=\"string\">'ordfilt2 performance curve (uint16)'<\/span>)\r\nxlabel(<span class=\"string\">'window size'<\/span>)\r\nylabel(<span class=\"string\">'Time (s)'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/steve\/files\/ordfilt2PerformanceUint16_v20190715_01.png\" alt=\"\"> <p>A puzzling aspect of this case was that the performance behavior was observed only for random input data and not for real-world images. Here is a comparison:<\/p><pre class=\"codeinput\">Im1 = im2uint16(imresize(imread(<span class=\"string\">'coins.png'<\/span>),[500 500]));\r\nIm2 = im2uint16(rand(500));\r\nn = [11 15 21 27 31 35 41 49 51];\r\ntreal = zeros(length(n),1);\r\ntrand = zeros(length(n),1);\r\n<span class=\"keyword\">for<\/span> id=1:length(n)\r\n   treal(id) = timeit(@() ordfilt2(Im1,ceil(n(id)^2*0.5),<span class=\"keyword\">...<\/span>\r\n       true(n(id))));\r\n   trand(id) = timeit(@() ordfilt2(Im2,ceil(n(id)^2*0.5),<span class=\"keyword\">...<\/span>\r\n       true(n(id))));\r\n<span class=\"keyword\">end<\/span>\r\n\r\nplot(n,treal)\r\ngrid <span class=\"string\">on<\/span>\r\nhold <span class=\"string\">on<\/span>\r\nplot(n,trand)\r\nlegend(<span class=\"string\">'Real World Image'<\/span>,<span class=\"string\">'Random Image'<\/span>)\r\ntitle(<span class=\"string\">'ordfilt2 performance comparison'<\/span>)\r\nxlabel(<span class=\"string\">'window size'<\/span>)\r\nylabel(<span class=\"string\">'Time (s)'<\/span>)\r\nhold <span class=\"string\">off<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/steve\/files\/ordfilt2PerformanceUint16_v20190715_02.png\" alt=\"\"> <p>We also observed that the the performance behavior was not reproducible for uint8 random images. Instead, the algorithm's performance scaled linearly with the window size, as expected.<\/p><pre class=\"codeinput\">Im = im2uint8(rand(1e3));\r\nn = [7 9 11 13 15 21 25 31 35 41 49 51 57 65 73];\r\nt = zeros(length(n),1);\r\n<span class=\"keyword\">for<\/span> id=1:length(n)\r\n   t(id) = timeit(@() ordfilt2(Im,ceil(n(id)^2*0.5),<span class=\"keyword\">...<\/span>\r\n       true(n(id))));\r\n<span class=\"keyword\">end<\/span>\r\n\r\nplot(n,t)\r\ngrid <span class=\"string\">on<\/span>\r\ntitle(<span class=\"string\">'ordfilt2 performance curve (uint8)'<\/span>)\r\nxlabel(<span class=\"string\">'window size'<\/span>)\r\nylabel(<span class=\"string\">'Time (s)'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/steve\/files\/ordfilt2PerformanceUint16_v20190715_03.png\" alt=\"\"> <p>To understand what was going on, it became essential to dig deeper into the inner workings of the underlying algorithm.<\/p><p>As it turns out, while using the sliding window histogram approach, the algorithm also keeps track of the running sum of the histogram so that it does not need to be computed for each pixel in the row. It can be updated accordingly as the window slides by traversing the histogram in the right direction towards the element of interest. This running-sum computation makes the algorithm data dependent. Moreover, the worst case scenario for this data-dependent algorithm happens when the input image contains random data.<\/p><p>But why is the behavior prominently present for small window sizes? It can be argued that, as the window size increases, we draw greater number of samples from the underling distribution resulting in smaller variance. As the variance goes down, the code path becomes more predictable and the algorithm obeys linear time complexity. However, when the window size reduces, the data dependency of the algorithm causes the variance to increase (which is a lot more for random input) making the code path less predictable. This causes the behavior reported by the customer of smaller windows taking more time to process as compared to larger windows.<\/p><p>After some consideration, we have decided to leave the implementation of <tt>ordfilt2<\/tt> as it is. This decision was made based on the fact that the behavior is primarily visible for random input data, which is not usually what customers use <tt>ordfilt2<\/tt> for.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_314fe35005f543b8b30cd24477b7d56f() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='314fe35005f543b8b30cd24477b7d56f ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 314fe35005f543b8b30cd24477b7d56f';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2019 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_314fe35005f543b8b30cd24477b7d56f()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2019a<br><\/p><\/div><!--\r\n314fe35005f543b8b30cd24477b7d56f ##### SOURCE BEGIN #####\r\n%% The curious case of ordfilt2 performance\r\n% _For today's post, I'd like to welcome guest author Kushagr Gupta, a developer \r\n% on the Image Processing Toolbox team. -Steve_\r\n% \r\n% The Image Processing Toolbox function |ordfilt2| can be used to perform various \r\n% types of 2-D statistical filtering, including minimum or maximum filtering and \r\n% median filtering. A straightforward implementation of this function is expected \r\n% to scale with $O(r^2)$, where $r$ is the width of the window (assuming \r\n% a square window). To compute each output pixel, we have to select the $k$-th \r\n% largest value from the set of $r^2$ input image pixels, which can be done in \r\n% $O(r^2)$ time with an algorithm such as quick-select.      \r\n% \r\n% The function |ordfilt2|, however, does better than that; it scales with\r\n% $O\\left(r\\right)$complexity. That's because |ordfilt2| uses a sliding\r\n% window-based histogram approach to filter the data. When the algorithm\r\n% moves from one input image pixel to the next, only part of the histogram\r\n% needs to be updated. (Reference: Huang, T.S., Yang, G.J., and Yang, G.Y.,\r\n% \"A fast two-dimensional median filtering algorithm,\" _IEEE transactions\r\n% on Acoustics, Speech and Signal Processing_, Vol ASSP 27, No. 1, February\r\n% 1979.)\r\n%\r\n% <<https:\/\/blogs.mathworks.com\/steve\/files\/sliding-window-diagram.png>>\r\n%  \r\n% As shown in the figure above, r additions and r subtractions need to be carried \r\n% out for updating the histogram. Computing the output pixel from the histogram \r\n% doesn't depend on the window size, and so the overall complexity is $O(r)$.\r\n%\r\n% We recently received an interesting tech support case related to performance \r\n% of |ordfilt2| for uint16 random input data. One of our customers reported the \r\n% time taken to filter an image with a smaller window size was *higher* than the \r\n% time taken to process the same image with a relatively larger window size. \r\n% \r\n% Was this just a noisy observation? Was this reproducible? Was this a bug in \r\n% our implementation? \r\n% \r\n% We investigated?\r\n% \r\n% The customer was generous enough to give us reproduction code. And, the behavior \r\n% was indeed reproducible! (Many, many times!). What was the explanation? \r\n% \r\n% Here are the reproduction steps:\r\n\r\nIm = im2uint16(rand(1e3)); \r\nn = [7 9 11 13 15 21 25 31 35 41 49 51 57 65 73];\r\nt = zeros(length(n),1);\r\nfor id=1:length(n)\r\n   t(id) = timeit(@() ordfilt2(Im,ceil(n(id)^2*0.5),...\r\n       true(n(id))));\r\nend\r\n\r\nplot(n,t)\r\ngrid on\r\ntitle('ordfilt2 performance curve (uint16)')\r\nxlabel('window size')\r\nylabel('Time (s)')\r\n%% \r\n% A puzzling aspect of this case was that the performance behavior was observed \r\n% only for random input data and not for real-world images. Here is a comparison:\r\n\r\nIm1 = im2uint16(imresize(imread('coins.png'),[500 500]));\r\nIm2 = im2uint16(rand(500));\r\nn = [11 15 21 27 31 35 41 49 51];\r\ntreal = zeros(length(n),1);\r\ntrand = zeros(length(n),1);\r\nfor id=1:length(n)\r\n   treal(id) = timeit(@() ordfilt2(Im1,ceil(n(id)^2*0.5),...\r\n       true(n(id))));\r\n   trand(id) = timeit(@() ordfilt2(Im2,ceil(n(id)^2*0.5),...\r\n       true(n(id))));\r\nend\r\n\r\nplot(n,treal)\r\ngrid on\r\nhold on\r\nplot(n,trand)\r\nlegend('Real World Image','Random Image')\r\ntitle('ordfilt2 performance comparison')\r\nxlabel('window size')\r\nylabel('Time (s)')\r\nhold off\r\n%% \r\n% We also observed that the the performance behavior was not reproducible for \r\n% uint8 random images. Instead, the algorithm's performance scaled linearly with \r\n% the window size, as expected.\r\n\r\nIm = im2uint8(rand(1e3)); \r\nn = [7 9 11 13 15 21 25 31 35 41 49 51 57 65 73]; \r\nt = zeros(length(n),1);\r\nfor id=1:length(n)\r\n   t(id) = timeit(@() ordfilt2(Im,ceil(n(id)^2*0.5),...\r\n       true(n(id))));\r\nend\r\n\r\nplot(n,t)\r\ngrid on\r\ntitle('ordfilt2 performance curve (uint8)')\r\nxlabel('window size')\r\nylabel('Time (s)')\r\n%% \r\n%\r\n% To understand what was going on, it became essential to dig deeper into the \r\n% inner workings of the underlying algorithm.\r\n% \r\n% As it turns out, while using the sliding window histogram approach, the algorithm \r\n% also keeps track of the running sum of the histogram so that it does not need \r\n% to be computed for each pixel in the row. It can be updated accordingly as the \r\n% window slides by traversing the histogram in the right direction towards the \r\n% element of interest. This running-sum computation makes the algorithm data dependent. \r\n% Moreover, the worst case scenario for this data-dependent algorithm happens \r\n% when the input image contains random data.\r\n% \r\n% But why is the behavior prominently present for small window sizes? It can \r\n% be argued that, as the window size increases, we draw greater number of samples \r\n% from the underling distribution resulting in smaller variance. As the variance \r\n% goes down, the code path becomes more predictable and the algorithm obeys linear \r\n% time complexity. However, when the window size reduces, the data dependency \r\n% of the algorithm causes the variance to increase (which is a lot more for random \r\n% input) making the code path less predictable. This causes the behavior reported \r\n% by the customer of smaller windows taking more time to process as compared to \r\n% larger windows.    \r\n% \r\n% After some consideration, we have decided to leave the implementation of |ordfilt2| \r\n% as it is. This decision was made based on the fact that the behavior is primarily \r\n% visible for random input data, which is not usually what customers use |ordfilt2| \r\n% for. \r\n\r\n##### SOURCE END ##### 314fe35005f543b8b30cd24477b7d56f\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/steve\/files\/sliding-window-diagram.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><p>For today's post, I'd like to welcome guest author Kushagr Gupta, a developer on the Image Processing Toolbox team. -SteveThe Image Processing Toolbox function ordfilt2 can be used to perform various... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/steve\/2019\/07\/15\/the-curious-case-of-ordfilt2-performance\/\">read more >><\/a><\/p>","protected":false},"author":42,"featured_media":3370,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[539,70,90,1255,452,76,156,92,705,120,68,460,474,52,100,94,96,130],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/3372"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/users\/42"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/comments?post=3372"}],"version-history":[{"count":2,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/3372\/revisions"}],"predecessor-version":[{"id":3392,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/3372\/revisions\/3392"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/media\/3370"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/media?parent=3372"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/categories?post=3372"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/tags?post=3372"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}