{"id":66,"date":"2017-12-15T07:00:21","date_gmt":"2017-12-15T07:00:21","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=66"},"modified":"2021-04-06T15:52:36","modified_gmt":"2021-04-06T19:52:36","slug":"network-visualization-based-on-occlusion-sensitivity","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2017\/12\/15\/network-visualization-based-on-occlusion-sensitivity\/","title":{"rendered":"Network Visualization Based on Occlusion Sensitivity"},"content":{"rendered":"<div class=\"content\"><p>Have you ever wondered what your favorite deep learning network is looking at? For example, if a network classifies this image as \"French horn,\" what part of the image matters most for the classification?<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/steve\/files\/steve-horn-600x600.jpg\" alt=\"\"> <\/p><p>Birju Patel, a developer on the Computer Vision System Toolbox team, helped me with the main idea and code for today's post. Birju has focused on deep learning for the last couple of years. Before that, he worked on feature extraction methods and on optimizing feature matching.<\/p><p>Let's use the pretrained ResNet-50 network for this experiment. (He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, Sun, Jian. \"Deep Residual Learning for Image Recognition.\" In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.) An easy way to get the ResNet-50 network for MATLAB is to launch the Add-On Explorer (from the HOME tab in MATLAB) and search for resnet.<\/p><pre class=\"codeinput\">net = resnet50;\r\n<\/pre><p>We need to be aware that ResNet-50 expects the input images to be a particular size. The network's initial layer has this information.<\/p><pre class=\"codeinput\">sz = net.Layers(1).InputSize(1:2)\r\n<\/pre><pre class=\"codeoutput\">\r\nsz =\r\n\r\n   224   224\r\n\r\n<\/pre><p>The required image size can be passed directly to the <tt>imresize<\/tt> function.<\/p><pre class=\"codeinput\">url = <span class=\"string\">'https:\/\/blogs.mathworks.com\/steve\/files\/steve-horn.jpg'<\/span>;\r\nrgb = imread(url);\r\nrgb = imresize(rgb,sz);\r\nimshow(rgb)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2017\/12\/occlusion_sensitivity_resnet_01.png\" alt=\"\"> <p>Call <tt>classify<\/tt> with the network and the image to see what category the network thinks is most probable.<\/p><pre class=\"codeinput\">classify(net,rgb)\r\n<\/pre><pre class=\"codeoutput\">\r\nans = \r\n\r\n  categorical\r\n\r\n     French horn \r\n\r\n<\/pre><p>ResNet-50 thinks I am playing the French horn.<\/p><p>Birju was reading <a href=\"https:\/\/arxiv.org\/pdf\/1311.2901.pdf\">a paper by Zeiler and Fergus<\/a> about visualization techniques for convolutional neural networks, and in it he came across the idea of <i>occlusion sensitivity<\/i>. If you block out, or occlude, a portion of the image, how does that affect the probability score of the network? And how does the result vary depending on which portion you occlude?<\/p><p>Let's try it.<\/p><pre class=\"codeinput\">rgb2 = rgb;\r\nrgb2((1:71)+77,(1:71)+108,:) = 128;\r\nimshow(rgb2)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2017\/12\/occlusion_sensitivity_resnet_02.png\" alt=\"\"> <pre class=\"codeinput\">classify(net,rgb2)\r\n<\/pre><pre class=\"codeoutput\">\r\nans = \r\n\r\n  categorical\r\n\r\n     notebook \r\n\r\n<\/pre><p>Hmm. I guess the network \"thinks\" that gray square looks like a notebook. That region must be important for classifying the image. Now let's try the occlusion in a different spot.<\/p><pre class=\"codeinput\">rgb3 = rgb;\r\nrgb3((1:71)+15,(1:71)+80,:) = 128;\r\nimshow(rgb3)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2017\/12\/occlusion_sensitivity_resnet_03.png\" alt=\"\"> <pre class=\"codeinput\">classify(net,rgb3)\r\n<\/pre><pre class=\"codeoutput\">\r\nans = \r\n\r\n  categorical\r\n\r\n     French horn \r\n\r\n<\/pre><p>Hmm. I guess my head is not as important.<\/p><p>Anyway, Birju wrote some MATLAB code to systematically quantify the relative importance of different images regions to the classification result. His code builds up a large batch of images. For each image in the batch, a different region is occluded. For each location of the occlusion mask, the prediction score of the expected class (\"French horn,\" in this case) is recorded.<\/p><p>Let's make a batch of images with 71x71 regions masked out. Start by computing the corners of all the masks, represented as (X1,Y1) and (X2,Y2).<\/p><pre class=\"codeinput\">mask_size = [71 71];\r\n[H,W,~] = size(rgb);\r\n\r\nX = 1:W;\r\nY = 1:H;\r\n\r\n[X1, Y1] = meshgrid(X, Y);\r\n\r\nX1 = X1(:) - (mask_size(2)-1)\/2;\r\nY1 = Y1(:) - (mask_size(1)-1)\/2;\r\n\r\nX2 = X1 + mask_size(2) - 1;\r\nY2 = Y1 + mask_size(1) - 1;\r\n<\/pre><p>Don't let the mask corners stray outside the image boundaries.<\/p><pre class=\"codeinput\">X1 = max(1, X1);\r\nY1 = max(1, Y1);\r\n\r\nX2 = min(W, X2);\r\nY2 = min(H, Y2);\r\n<\/pre><p>Make the batch.<\/p><pre class=\"codeinput\">batch = repmat(rgb,[1 1 1 size(X1,1)]);\r\n\r\n<span class=\"keyword\">for<\/span> i = 1:size(X1,1)\r\n   c = X1(i):X2(i);\r\n   r = Y1(i):Y2(i);\r\n   batch(r,c,:,i) = 128; <span class=\"comment\">% gray mask.<\/span>\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><p>[Note: This batch has more than 50,000 images in it. You'll need a lot of RAM to create and process such a large batch of images all at once.]<\/p><p>Here are a few of the masked images.<\/p><pre class=\"codeinput\">montage(batch(:,:,:,randperm(size(X1,1),9)))\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2017\/12\/occlusion_sensitivity_resnet_04.png\" alt=\"\"> <p>Now I'll use <tt>predict<\/tt> (instead of <tt>classify<\/tt>) to get the prediction scores for each category and for each image in the batch. The <tt>'MiniBatchSize'<\/tt> parameter is used to keep the GPU memory use down. It means that the <tt>predict<\/tt> function will send 64 images at a time to the GPU for processing.<\/p><pre class=\"codeinput\">s = predict(net, batch, <span class=\"string\">'MiniBatchSize'<\/span>,64);\r\n<\/pre><pre class=\"codeinput\">size(s)\r\n<\/pre><pre class=\"codeoutput\">\r\nans =\r\n\r\n       50176        1000\r\n\r\n<\/pre><p>That's a lot of prediction scores! There are 51,529 images in the batch, and there are 1,000 categories. The matrix <tt>s<\/tt> has a score for each category and for each image.<\/p><p>We are specifically interested in the prediction scores for the category predicted for the original image. Let's figure out the category index for that.<\/p><pre class=\"codeinput\">scores = predict(net,rgb);\r\n[~,horn_idx] = max(scores);\r\n<\/pre><p>So, here are the <i>French horn<\/i> scores for every image in the batch:<\/p><pre class=\"codeinput\">s_horn = s(:,horn_idx);\r\n<\/pre><p>Reshape the set of horn scores to be an image and display it.<\/p><pre class=\"codeinput\">S_horn = reshape(s_horn,H,W);\r\nimshow(-S_horn,[])\r\ncolormap(gca,<span class=\"string\">'parula'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2017\/12\/occlusion_sensitivity_resnet_05.png\" alt=\"\"> <p>The brightest regions indicate the locations where the occlusion had the biggest effect on the probability score.<\/p><p>Let's find the location that minimizes the \"French horn\" probability score.<\/p><pre class=\"codeinput\">[min_score,min_idx] = min(s_horn);\r\nrgb_min_score = batch(:,:,:,min_idx);\r\nimshow(rgb_min_score)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2017\/12\/occlusion_sensitivity_resnet_06.png\" alt=\"\"> <p>There you go. To recognize a French horn, it's all about the valves and valve slides. It's not about the bell.<\/p><p>A final note on terminology: Some of my horn-playing friends might give me a hard time about calling my instrument a \"French horn.\" According to the International Horn Society, <a href=\"https:\/\/en.wikipedia.org\/wiki\/French_horn#Name\">the instrument should just be called \"horn.\"<\/a> However, the label stored in ResNet-50 is \"French horn,\" and that is the most commonly used term in the United States, where I live.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_e228fb3b9ea64f6c94e29c328ed538f9() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='e228fb3b9ea64f6c94e29c328ed538f9 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' e228fb3b9ea64f6c94e29c328ed538f9';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2017 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_e228fb3b9ea64f6c94e29c328ed538f9()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2017b<br><\/p><\/div><!--\r\ne228fb3b9ea64f6c94e29c328ed538f9 ##### SOURCE BEGIN #####\r\n%% Occlusion Sensitivity\r\n% Have you ever wondered what your favorite deep learning network is\r\n% looking at? For example, if a network classifies this image as\r\n% \"French horn,\" what part of the image matters most for the\r\n% classification?\r\n%\r\n% <<https:\/\/blogs.mathworks.com\/steve\/files\/steve-horn-600x600.jpg>>\r\n%\r\n% Birju Patel, a developer on the Computer Vision System Toolbox\r\n% team, helped me with the main idea and code for today's post. Birju has\r\n% focused on deep learning for the last couple of years. Before that, he\r\n% worked on feature extraction methods and on optimizing feature matching.\r\n%\r\n% Let's use the pretrained ResNet-50 network for this experiment.\r\n% (He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, Sun, Jian. \r\n% \"Deep Residual Learning for Image Recognition.\" In Proceedings of the\r\n% IEEE conference on computer vision and pattern recognition, pp.\r\n% 770-778. 2016.) An easy way to get the ResNet-50 network for MATLAB is\r\n% to launch the Add-On Explorer (from the HOME tab in MATLAB) and search\r\n% for resnet.\r\n\r\nnet = resnet50;\r\n\r\n%%\r\n% We need to be aware that ResNet-50 expects the input images to be a\r\n% particular size. The network's initial layer has this information.\r\nsz = net.Layers(1).InputSize(1:2)\r\n\r\n%%\r\n% The required image size can be passed directly to the |imresize|\r\n% function.\r\nurl = 'https:\/\/blogs.mathworks.com\/steve\/files\/steve-horn.jpg';\r\nrgb = imread(url);\r\nrgb = imresize(rgb,sz);\r\nimshow(rgb)\r\n\r\n%%\r\n% Call |classify| with the network and the image to see what category the\r\n% network thinks is most probable.\r\n\r\nclassify(net,rgb)\r\n\r\n%%\r\n% ResNet-50 thinks I am playing the French horn.\r\n%\r\n% Birju was reading <https:\/\/arxiv.org\/pdf\/1311.2901.pdf \r\n% a paper by Zeiler and Fergus> about visualization\r\n% techniques for convolutional neural networks, and in it he came across\r\n% the idea of _occlusion sensitivity_. If you block out, or occlude, a\r\n% portion of the image, how does that affect the probability score of the\r\n% network? And how does the result vary depending on which portion you\r\n% occlude?\r\n\r\n%%\r\n% Let's try it.\r\n\r\nrgb2 = rgb;\r\nrgb2((1:71)+77,(1:71)+108,:) = 128;\r\nimshow(rgb2)\r\n\r\n%%\r\nclassify(net,rgb2)\r\n\r\n%%\r\n% Hmm. I guess the network \"thinks\" that gray square looks like a notebook.\r\n% That region must be important for classifying the image. Now let's try\r\n% the occlusion in a different spot.\r\n\r\nrgb3 = rgb;\r\nrgb3((1:71)+15,(1:71)+80,:) = 128;\r\nimshow(rgb3)\r\n\r\n%%\r\nclassify(net,rgb3)\r\n\r\n%%\r\n% Hmm. I guess my head is not as important.\r\n%\r\n% Anyway, Birju wrote some MATLAB code to systematically quantify\r\n% the relative importance of\r\n% different images regions to the classification result. His code builds up\r\n% a large batch of images. For each image in the batch, a different region\r\n% is occluded.\r\n% For each location of the\r\n% occlusion mask, the prediction score of the expected class (\"French horn,\"\r\n% in this case) is recorded. \r\n%\r\n% Let's make a batch of images with 71x71 regions masked\r\n% out. Start by computing the corners of all the masks, represented as\r\n% (X1,Y1) and (X2,Y2).\r\n\r\nmask_size = [71 71];\r\n[H,W,~] = size(rgb);\r\n\r\nX = 1:W;\r\nY = 1:H;\r\n\r\n[X1, Y1] = meshgrid(X, Y);\r\n\r\nX1 = X1(:) - (mask_size(2)-1)\/2;\r\nY1 = Y1(:) - (mask_size(1)-1)\/2;\r\n\r\nX2 = X1 + mask_size(2) - 1;\r\nY2 = Y1 + mask_size(1) - 1;\r\n\r\n%%\r\n% Don't let the mask corners stray outside the image boundaries.\r\n\r\nX1 = max(1, X1);\r\nY1 = max(1, Y1);\r\n\r\nX2 = min(W, X2);\r\nY2 = min(H, Y2);\r\n\r\n%%\r\n% Make the batch.\r\n\r\nbatch = repmat(rgb,[1 1 1 size(X1,1)]);\r\n\r\nfor i = 1:size(X1,1)\r\n   c = X1(i):X2(i);\r\n   r = Y1(i):Y2(i);\r\n   batch(r,c,:,i) = 128; % gray mask.\r\nend\r\n\r\n%%\r\n% [Note: This batch has more than 50,000 images in it.\r\n% You'll need a lot of RAM to create and process such a large batch of\r\n% images all at once.]\r\n\r\n%%\r\n% Here are a few of the masked images.\r\n\r\nmontage(batch(:,:,:,randperm(size(X1,1),9)))\r\n\r\n%%\r\n% Now I'll use |predict| (instead of |classify|) to get the \r\n% prediction scores for each category and for each image in the batch. The\r\n% |'MiniBatchSize'| parameter is used to keep the GPU memory use down. It\r\n% means that the |predict| function will send 64 images at a time to the\r\n% GPU for processing.\r\n\r\ns = predict(net, batch, 'MiniBatchSize',64);\r\n\r\n%%\r\nsize(s)\r\n\r\n%%\r\n% That's a lot of prediction scores! There are 51,529 images in the batch,\r\n% and there are 1,000 categories. The matrix |s| has a score for each\r\n% category and for each image.\r\n%\r\n% We are specifically interested in the prediction scores for the category\r\n% predicted for the original image. Let's figure out the category index for\r\n% that.\r\n\r\nscores = predict(net,rgb);\r\n[~,horn_idx] = max(scores);\r\n\r\n%%\r\n% So, here are the _French horn_ scores for every image in the batch:\r\n\r\ns_horn = s(:,horn_idx);\r\n\r\n%%\r\n% Reshape the set of horn scores to be an image and display it.\r\n\r\nS_horn = reshape(s_horn,H,W);\r\nimshow(-S_horn,[])\r\ncolormap(gca,'parula')\r\n\r\n%%\r\n% The brightest regions indicate the locations where the occlusion had the\r\n% biggest effect on the probability score.\r\n\r\n%%\r\n% Let's find the location that minimizes the \"French horn\" probability\r\n% score.\r\n\r\n[min_score,min_idx] = min(s_horn);\r\nrgb_min_score = batch(:,:,:,min_idx);\r\nimshow(rgb_min_score)\r\n\r\n%%\r\n% There you go. To recognize a French horn, it's all about the valves and\r\n% valve slides. It's not about the bell.\r\n%\r\n% A final note on terminology: Some of my horn-playing friends might give\r\n% me a hard time about calling my instrument a \"French horn.\" According to\r\n% the International Horn Society, \r\n% <https:\/\/en.wikipedia.org\/wiki\/French_horn#Name \r\n% the instrument should just be called\r\n% \"horn.\"> However, the label stored in ResNet-50 is \"French horn,\" and that\r\n% is the most commonly used term in the United States, where I live.\r\n##### SOURCE END ##### e228fb3b9ea64f6c94e29c328ed538f9\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2017\/12\/occlusion_sensitivity_resnet_05.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><p>Have you ever wondered what your favorite deep learning network is looking at? For example, if a network classifies this image as \"French horn,\" what part of the image matters most for the... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2017\/12\/15\/network-visualization-based-on-occlusion-sensitivity\/\">read more >><\/a><\/p>","protected":false},"author":42,"featured_media":78,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/66"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/42"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=66"}],"version-history":[{"count":3,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/66\/revisions"}],"predecessor-version":[{"id":86,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/66\/revisions\/86"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media\/78"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=66"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=66"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=66"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}