{"id":256,"date":"2018-03-26T19:25:14","date_gmt":"2018-03-26T19:25:14","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=256"},"modified":"2021-04-06T15:52:16","modified_gmt":"2021-04-06T19:52:16","slug":"visualizing-activations-in-googlenet","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2018\/03\/26\/visualizing-activations-in-googlenet\/","title":{"rendered":"Visualizing Activations in GoogLeNet"},"content":{"rendered":"<p>The R2018a release has been available for almost two week now. One of the new features that caught my eye is that computing layer activations has been extended to GoogLeNet and Inception-v3. Today I want to experiment with GoogLeNet. <\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">net = googlenet\r\n<\/pre><pre class=\"output\" style=\"font-family:monospace;border:none;background-color:white;color:rgba(64, 64, 64, 1);\">net = \r\n  DAGNetwork with properties:\r\n\r\n         Layers: [144\u00d71 nnet.cnn.layer.Layer]\r\n    Connections: [170\u00d72 table]\r\n\r\n<\/pre><p>Let's look at just the first few layers.<\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">net.Layers(1:5)\r\n<\/pre><pre class=\"output\" style=\"font-family:monospace;border:none;background-color:white;color:rgba(64, 64, 64, 1);\">ans = \r\n  5x1 Layer array with layers:\r\n\r\n     1   'data'             Image Input                   224x224x3 images with 'zerocenter' normalization\r\n     2   'conv1-7x7_s2'     Convolution                   64 7x7x3 convolutions with stride [2  2] and padding [3  3  3  3]\r\n     3   'conv1-relu_7x7'   ReLU                          ReLU\r\n     4   'pool1-3x3_s2'     Max Pooling                   3x3 max pooling with stride [2  2] and padding [0  1  0  1]\r\n     5   'pool1-norm1'      Cross Channel Normalization   cross channel normalization with 5 channels per element\r\n<\/pre><p>The first layer tells us how big input images should be.<\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">net.Layers(1)\r\n<\/pre><pre class=\"output\" style=\"font-family:monospace;border:none;background-color:white;color:rgba(64, 64, 64, 1);\">ans = \r\n  ImageInputLayer with properties:\r\n\r\n                Name: 'data'\r\n           InputSize: [224 224 3]\r\n\r\n   Hyperparameters\r\n    DataAugmentation: 'none'\r\n       Normalization: 'zerocenter'\r\n\r\n<\/pre><p>The second layer performs 2D convolution.<\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">net.Layers(2)\r\n<\/pre><pre class=\"output\" style=\"font-family:monospace;border:none;background-color:white;color:rgba(64, 64, 64, 1);\">ans = \r\n  Convolution2DLayer with properties:\r\n\r\n           Name: 'conv1-7x7_s2'\r\n\r\n   Hyperparameters\r\n     FilterSize: [7 7]\r\n    NumChannels: 3\r\n     NumFilters: 64\r\n         Stride: [2 2]\r\n    PaddingMode: 'manual'\r\n    PaddingSize: [3 3 3 3]\r\n\r\n   Learnable Parameters\r\n        Weights: [7\u00d77\u00d73\u00d764 single]\r\n           Bias: [1\u00d71\u00d764 single]\r\n\r\n  Show all properties\r\n\r\n<\/pre><p>The hyperparameters tell us that this layer performs 64 different filtering operations on the input channels, and each filter is 7x7x3. The <inline style=\"font-family: monospace, monospace; font-size: inherit;\">[2 2]<\/inline> stride value tells us that the filter output is downsampled by a factor of 2 in each direction. <\/p>\r\n      <p>To experiment with this network, I'll use a picture that I took of myself just now. I will go ahead and resize it to the size expected by the network. <\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">im = imread(<span style=\"color:rgb(160, 32, 240);\">'steve.jpg'<\/span>);\r\nim = imresize(im,net.Layers(1).InputSize(1:2));\r\nimshow(im)\r\n<\/pre><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2018\/03\/VisualizingActivationsInGoogLeNetExample_1-1.png\"><p>Use the <inline style=\"font-family: monospace, monospace; font-size: inherit;\">activations<\/inline> function to compute the neuron activations from the <inline style=\"font-family: monospace, monospace; font-size: inherit;\">conv1-7x7_s2<\/inline> layer. <\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">act = activations(net,im,<span style=\"color:rgb(160, 32, 240);\">'conv1-7x7_s2'<\/span>,<span style=\"color:rgb(160, 32, 240);\">'OutputAs'<\/span>,<span style=\"color:rgb(160, 32, 240);\">'channels'<\/span>);\r\nsize(act)\r\n<\/pre><pre class=\"output\" style=\"font-family:monospace;border:none;background-color:white;color:rgba(64, 64, 64, 1);\">ans = <em>1\u00d73<\/em>\r\n\r\n   112   112    64\r\n\r\n<\/pre><p>I interpret the size of <inline style=\"font-family: monospace, monospace; font-size: inherit;\">act<\/inline> as saying that this layer's output includes 64 different 112x112 images. The range of activation values is roughly -3,000 to 3,000. <\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">min(act(:))\r\n<\/pre><pre class=\"output\" style=\"font-family:monospace;border:none;background-color:white;color:rgba(64, 64, 64, 1);\">ans = <em>single<\/em>\r\n    -2.9851e+03\r\n<\/pre><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">max(act(:))\r\n<\/pre><pre class=\"output\" style=\"font-family:monospace;border:none;background-color:white;color:rgba(64, 64, 64, 1);\">ans = <em>single<\/em>\r\n    2.7232e+03\r\n<\/pre><p>The functions <inline style=\"font-family: monospace, monospace; font-size: inherit;\">mat2gray<\/inline> and <inline style=\"font-family: monospace, monospace; font-size: inherit;\">montage<\/inline> are useful for rescaling these images to the range [0,1] and then displaying them together. <\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">act = reshape(act,size(act,1),size(act,2),1,size(act,3));\r\nact_scaled = mat2gray(act);\r\nmontage(act_scaled)\r\n<\/pre><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2018\/03\/VisualizingActivationsInGoogLeNetExample_2-1.png\"><p>I think it will be easier to see and compare the individual activation images if we apply a contrast stretch. I'll use the Image Processing Toolbox functions <inline style=\"font-family: monospace, monospace; font-size: inherit;\">imadjust<\/inline> and <inline style=\"font-family: monospace, monospace; font-size: inherit;\">stretchlim<\/inline>. (There's a bit of extra code to handle the fact that <inline style=\"font-family: monospace, monospace; font-size: inherit;\">stretchlim<\/inline> and <inline style=\"font-family: monospace, monospace; font-size: inherit;\">imadjust<\/inline> don't support multidimensional inputs.) <\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">tmp = act_scaled(:);\r\ntmp = imadjust(tmp,stretchlim(tmp));\r\nact_stretched = reshape(tmp,size(act_scaled));\r\nmontage(act_stretched)\r\ntitle(<span style=\"color:rgb(160, 32, 240);\">'Activations from the conv1-7x7_s2 layer'<\/span>,<span style=\"color:rgb(160, 32, 240);\">'Interpreter'<\/span>,<span style=\"color:rgb(160, 32, 240);\">'none'<\/span>)\r\n<\/pre><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2018\/03\/VisualizingActivationsInGoogLeNetExample_3-1.png\"><p>Wow. That's a little bit too much of me all at once. Let's zoom in on just a couple of the activation images.<\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">subplot(1,2,1)\r\nimshow(act_stretched(:,:,:,33))\r\ntitle(<span style=\"color:rgb(160, 32, 240);\">'Channel 33'<\/span>)\r\nsubplot(1,2,2)\r\nimshow(act_stretched(:,:,:,34))\r\ntitle(<span style=\"color:rgb(160, 32, 240);\">'Channel 34'<\/span>)\r\n<\/pre><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2018\/03\/VisualizingActivationsInGoogLeNetExample_4-1.png\"><p>From my background in traditional image processing, I kind of recognize these. They are like gradient component images. One \"detects\" horizontal edges, and the other detects vertical edges. <\/p>\r\n      <p>We can see that if we look at the frequency responses of the weights for those layers.<\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">w = net.Layers(2).Weights;\r\nw33 = w(:,:,:,33);\r\nclf\r\nmesh(abs(freqz2(w33(:,:,2))),<span style=\"color:rgb(160, 32, 240);\">'EdgeColor'<\/span>,[0 .4470 .7410]);\r\ntitle(<span style=\"color:rgb(160, 32, 240);\">'Frequency response of the 33rd filter (green channel)'<\/span>)\r\n<\/pre><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2018\/03\/VisualizingActivationsInGoogLeNetExample_5-1.png\"><p>Roughly speaking, that's a bandpass filter in one direction and a lowpass filter in the other. In both directions, the filter cuts off most of the signal in the upper half of the frequency range, which is what I expect from an antialiasing filter designed for use with a factor-of-two downsampling. <\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">w34 = w(:,:,:,34);\r\nmesh(abs(freqz2(w34(:,:,2))),<span style=\"color:rgb(160, 32, 240);\">'EdgeColor'<\/span>,[0 .4470 .7410]);\r\ntitle(<span style=\"color:rgb(160, 32, 240);\">'Frequency response of the 34th filter (green channel)'<\/span>)\r\n<\/pre><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2018\/03\/VisualizingActivationsInGoogLeNetExample_6-1.png\"><p>For this channel, the bandpass and lowpass directions are reversed. <\/p>\r\n      <p>I assume that these filter weights are derived from the training procedure used to create GoogLeNet in the first place. But it does seem like at least some parts of the network can be loosely interpreted in terms of traditional image processing operations. <\/p>\r\n      <p>Now let's look at the output of filter 43.<\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">imshow(act_stretched(:,:,:,43))\r\n<\/pre><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2018\/03\/VisualizingActivationsInGoogLeNetExample_7-1.png\"><p>I would guess that this filter output is serving a kind of color detection function. You can see a relatively high response for my green shirt, and a relatively low response for the skin tones in my face. Here are the weights for three channels of filter 43. <\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">w43 = w(:,:,:,43);\r\nsubplot(2,2,1)\r\nsurf(w43(:,:,2),<span style=\"color:rgb(160, 32, 240);\">'EdgeColor'<\/span>,[0 .4470 .7410])\r\nzlim([-0.3 0.3])\r\ntitle(<span style=\"color:rgb(160, 32, 240);\">'Green channel weights'<\/span>)\r\nsubplot(2,2,2)\r\nsurf(w43(:,:,1),<span style=\"color:rgb(160, 32, 240);\">'EdgeColor'<\/span>,[0 .4470 .7410])\r\nzlim([-0.3 0.3])\r\ntitle(<span style=\"color:rgb(160, 32, 240);\">'Red channel weights'<\/span>)\r\nsubplot(2,2,3)\r\nsurf(w43(:,:,3),<span style=\"color:rgb(160, 32, 240);\">'EdgeColor'<\/span>,[0 .4470 .7410])\r\nzlim([-0.3 0.3])\r\ntitle(<span style=\"color:rgb(160, 32, 240);\">'Blue channel weights'<\/span>)\r\n<\/pre><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2018\/03\/VisualizingActivationsInGoogLeNetExample_8-1.png\"><p>I don't quite have a good interpretation for everything I see here, but I have noticed that the inner portion of the weights for each channel looks fairly flat. Let me zoom in the <em>x<\/em> and <em>y<\/em> directions. <\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">subplot(2,2,1)\r\nxlim([3 5])\r\nylim([3 5])\r\nsubplot(2,2,2)\r\nxlim([3 5])\r\nylim([3 5])\r\nsubplot(2,2,3)\r\nxlim([3 5])\r\nylim([3 5])\r\n<\/pre><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2018\/03\/VisualizingActivationsInGoogLeNetExample_9-1.png\"><p>So, very roughly speaking, what's being computed is the difference between the local average of the green channel and the local average of the red channel. <\/p>\r\n      <p>Let's look at just one more layer, the one immediately following.<\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">net.Layers(3)\r\n<\/pre><pre class=\"output\" style=\"font-family:monospace;border:none;background-color:white;color:rgba(64, 64, 64, 1);\">ans = \r\n  ReLULayer with properties:\r\n\r\n    Name: 'conv1-relu_7x7'\r\n\r\n<\/pre><p> This is a <em>rectified linear unit<\/em> layer. Such a layer just clips any negative number to 0. That means all the variation in the negative values from the output of the previous layer gets removed. What does that look like? <\/p><pre class=\"matlab-code\" id=\"matlabcode\" style=\"background-color: #F7F7F7;font-family: monospace;font-weight:normal;border-style: solid; border-width: 1px ;border-color:#E9E9E9;padding-top:5px;padding-bottom:5px;line-height:150%;\">act2 = activations(net,im,<span style=\"color:rgb(160, 32, 240);\">'conv1-relu_7x7'<\/span>);\r\nact2 = reshape(act2,size(act2,1),size(act2,2),1,size(act2,3));\r\nact2_scaled = mat2gray(act2);\r\ntmp = act2_scaled(:);\r\nlim = stretchlim(tmp);\r\nlim(1) = 0;\r\ntmp = imadjust(tmp,lim);\r\nact2_stretched = reshape(tmp,size(act2_scaled));\r\nclf\r\nmontage(act2_stretched)\r\ntitle(<span style=\"color:rgb(160, 32, 240);\">'Activations from the conv1-relu_7x7 layer'<\/span>,<span style=\"color:rgb(160, 32, 240);\">'Interpreter'<\/span>,<span style=\"color:rgb(160, 32, 240);\">'none'<\/span>)\r\n<\/pre><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2018\/03\/VisualizingActivationsInGoogLeNetExample_10-1.png\"><p>That's two layers down, and 142 still to go! I think I'll save those for another time.<\/p>\r\n      <p>I encourage you to take a look at the documentation example <a href=\"https:\/\/www.mathworks.com\/help\/nnet\/examples\/visualize-activations-of-a-convolutional-neural-network.html\">Visualize Activations of a Convolutional Neural Network<\/a> for another peek at the dreams of a deep learning neural network. <\/p>","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2018\/03\/VisualizingActivationsInGoogLeNetExample_3-1.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><p>The R2018a release has been available for almost two week now. One of the new features that caught my eye is that computing layer activations has been extended to GoogLeNet and Inception-v3. Today I... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2018\/03\/26\/visualizing-activations-in-googlenet\/\">read more >><\/a><\/p>","protected":false},"author":42,"featured_media":248,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/256"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/42"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=256"}],"version-history":[{"count":1,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/256\/revisions"}],"predecessor-version":[{"id":258,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/256\/revisions\/258"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media\/248"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=256"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=256"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=256"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}