{"id":912,"date":"2019-01-18T06:00:36","date_gmt":"2019-01-18T06:00:36","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=912"},"modified":"2021-04-06T15:51:14","modified_gmt":"2021-04-06T19:51:14","slug":"neural-network-feature-visualization","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2019\/01\/18\/neural-network-feature-visualization\/","title":{"rendered":"Neural Network Feature Visualization"},"content":{"rendered":"<h4>Visualization of the data and the semantic content learned by a network<\/h4>\r\n<span style=\"font-family: courier;\">This post comes from Maria Duarte Rosa, who is going to talk about different ways to visualize features learned by networks.<\/span>\r\n<h6><\/h6>\r\nToday, we'll look at two ways to gain insight into a network using two methods:<strong>\u00a0k-nearest neighbors<\/strong> and<strong> t-SNE<\/strong>, which we'll describe in detail below.\r\n<h6><\/h6>\r\n<div id=\"attachment_962\" style=\"width: 570px\" class=\"wp-caption alignnone\"><img aria-describedby=\"caption-attachment-962\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-962 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/clustering1.png\" alt=\"Semantic Clustering using t-SNE\" width=\"560\" height=\"420\" \/><p id=\"caption-attachment-962\" class=\"wp-caption-text\">Visualization of a trained network using t-SNE<\/p><\/div>\r\n<h6><\/h6>\r\n<h3>Dataset and Model<\/h3>\r\n<h6><\/h6>\r\nFor both of these exercises, we'll be using <a href=\"https:\/\/www.mathworks.com\/solutions\/deep-learning\/models.html\">ResNet-18<\/a>,\u00a0and our favorite food dataset, which you can <a href=\"\">download here<\/a>. (Please be aware this is a very large download. We're using this for examples purposes only, since food is relevant to everyone! This code should work with any other dataset you wish).\r\n<h6><\/h6>\r\nThe network has been <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/examples\/train-deep-learning-network-to-classify-new-images.html\">retrained<\/a> to identify the 5 categories of objects from the data:\r\n<h6><\/h6>\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td style=\"text-align: center;\"><strong>Salad<\/strong><\/td>\r\n<td style=\"text-align: center;\"><strong>Pizza<\/strong><\/td>\r\n<td style=\"text-align: center;\"><strong>Fries<\/strong><\/td>\r\n<td style=\"text-align: center;\"><strong>Burger<\/strong><\/td>\r\n<td style=\"text-align: center;\"><strong>Sushi<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"text-align: center;\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-954 size-thumbnail\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/SaladFinalImage1058_resized-150x150.jpg\" alt=\"\" width=\"150\" height=\"150\" \/><\/td>\r\n<td style=\"text-align: center;\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-952 size-thumbnail\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/PizzaFinalImage419_resized-150x150.jpg\" alt=\"\" width=\"150\" height=\"150\" \/><\/td>\r\n<td style=\"text-align: center;\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-950 size-thumbnail\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/FriesFinalImage238_resized-150x150.jpg\" alt=\"\" width=\"150\" height=\"150\" \/><\/td>\r\n<td style=\"text-align: center;\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-948 size-thumbnail\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/BurgersFinalImage023_resized-150x150.jpg\" alt=\"\" width=\"150\" height=\"150\" \/><\/td>\r\n<td style=\"text-align: center;\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-956 size-thumbnail\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/SushiFinalImage1117_resized-150x150.jpg\" alt=\"\" width=\"150\" height=\"150\" \/><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h6><\/h6>\r\nNext we want to visualize our network and understand features used by a neural network to classify data. The following are two ways to visualize high-level features of a network, to gain insight into a network beyond accuracy.\r\n<h6><\/h6>\r\n<h6><\/h6>\r\n<h3>k-nearest neighbors search<\/h3>\r\nA nearest neighbor search is a type of optimization problem where the goal is to find the closest (or most similar) points in space to a given point.\u00a0 K-nearest neighbors search identifies the top k closest neighbors to a point in feature space. Closeness in metric spaces is generally defined using a distance metric such as the Euclidean distance or Minkowski distance. The more similar the points are, the smaller this distance should be. This technique is often used as a machine learning classification method, but can also be used for visualization of data and high-level features of a neural network, which is what we're going to do.\r\n<h6><\/h6>\r\nLet's start with 5 test images from the food dataset:\r\n<h6><\/h6>\r\n<pre>idxTest = [394 97 996 460 737];\r\nim = imtile(string({imdsTest.Files{idxTest}}),'ThumbnailSize',[100 100], 'GridSize', [5 1]);<\/pre>\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"274\" height=\"593\" class=\"alignnone size-full wp-image-972\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/imtile1.png\" alt=\"\" \/><\/h6>\r\n<h6><\/h6>\r\nand look for the 10 nearest neighbors of these images in the training data in the pixel space. The code below is going to\u00a0get the features (i.e. \"activations\") for all test images, and find which ones are element-wise closest to our chosen sample images.\r\n<h6><\/h6>\r\nGet the features, aka activations\r\n<h6><\/h6>\r\n<pre>dataTrainFS = activations(netFood, imdsTrainAu, 'data', 'OutputAs', 'rows');\r\nimgFeatSpaceTest = activations(netFood, imdsTestAu,'data', 'OutputAs', 'rows');\r\ndataTestFS = imgFeatSpaceTest(idxTest,:);\r\n<\/pre>\r\nCreate KNN model and search for nearest neighbours\r\n<pre>Mdl = createns(dataTrainFS,'Distance','euclidean');\r\nidxKnn = knnsearch(Mdl,dataTestFS, 'k', 10);\r\n<\/pre>\r\n<img decoding=\"async\" loading=\"lazy\" class=\"alignleft wp-image-1036 size-large\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/knn1_highlighted2-1024x498.png\" alt=\"\" width=\"1024\" height=\"498\" \/>\r\n\r\n\r\nSearching for similarities in pixel space does not generally return any meaningful information about the semantic content of the image but only similarities in pixel intensity and color distribution. The 10 nearest neighbors in the data (pixel) space do not necessarily correspond to the same class as the test image. There is no \"learning\" taking place.\r\n<h6><\/h6>\r\nTake a look at the 4th row:\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"1210\" height=\"176\" class=\"alignnone size-full wp-image-978\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/knn1_row4.png\" alt=\"\" \/><\/h6>\r\n<h6><\/h6>\r\nThe image of the fries is yellow and brighter at the top, and dark at the bottom.\u00a0Most of the nearest neighbors in pixel space seem to be images of any class that contains the same pixel intensity and color pattern (they are somewhat brighter at the top and dark at the bottom).\r\n\r\nLet's compare this with images passed through the network and search for the 10 nearest neighbors in feature space,\u00a0where the features are the output of the final average pooling layer of the network, pool5.\r\n<pre>dataTrainFS = activations(netFood, imdsTrainAu, 'pool5', 'OutputAs', 'rows');\r\nimgFeatSpaceTest = activations(netFood, imdsTestAu,'pool5', 'OutputAs', 'rows');\r\ndataTestFS = imgFeatSpaceTest(idxTest,:);\r\n<\/pre>\r\nCreate KNN model and search for nearest neighbours\r\n<pre>Mdl = createns(dataTrainFS,'Distance','euclidean');\r\nidxKnn(:,:) = knnsearch(Mdl,dataTestFS, 'k', 10);<\/pre>\r\n<h6><\/h6>\r\n<div id=\"attachment_1038\" style=\"width: 1034px\" class=\"wp-caption alignleft\"><img aria-describedby=\"caption-attachment-1038\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-1038 size-large\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/knn2_highlighted2-1024x493.png\" alt=\"\" width=\"1024\" height=\"493\" \/><p id=\"caption-attachment-1038\" class=\"wp-caption-text\">The first column (highlighted) is the test image, the remaining columns are the 10 nearest neighbors<\/p><\/div>\r\n\r\nNow we can see the color and intensity no longer matter, but rather the higher level features of the objects in the image. The nearest neighbors are now images of the same class. These results show that the features from the deep neural network contain information about the semantic content of the images. In other words, the network learned to discriminate between classes by learning high-level object specific features similarly to what allows humans to distinguish hamburgers from pizzas or Caesar salads from sushi.\r\n<h6><\/h6>\r\n<h5>K-NN: What can we learn from this?<\/h5>\r\nThis can confirm what we expect to see from the network, or simply another visualization of the network in a new way. If the training accuracy of the network is high but the nearest neighbors in feature space (assuming the features are the output of one of the final layers of the network) are not objects from the same class, this may indicate that the network has not captured any semantic knowledge related to the classes but might have learned to classify based on some artifact of the training data.\r\n<h6><\/h6>\r\n<h6><\/h6>\r\n&nbsp;\r\n<h3>Semantic clustering with t-SNE<\/h3>\r\nt-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear dimensionality reduction technique that allows embedding high-dimensional data in a lower-dimensional space. (Typically we choose the lower dimensional space to be two or three dimensions, since this makes it easy to plot and visualize). This lower dimensional space is estimated in such a way that it preserves similarities from the high dimensional space. <strong>In other words, two similar objects have high probability of being nearby in the lower dimensional space, while two dissimilar objects should be represented by distant points. <\/strong> This technique can be used to visualize deep neural network features.\r\n<h6><\/h6>\r\nLet's apply this technique to the training images of the dataset and get a two dimensional and three dimensional embedding of the data.\r\n<h6><\/h6>\r\nSimilar to k-nn example, we'll start by visualizing the original data (pixel space) and the output of the final averaging pooling layer.\r\n<pre>layers = {'data', 'pool5'};\r\nfor k = 1:length(layers)\r\n   dataTrainFS = activations(netFood, imdsTrainAu, layers{k}, 'OutputAs', 'rows');\r\n   AlltSNE2dim(:,:,k) = tsne(dataTrainFS);\r\n   AlltSNE3dim(:,:,k) = tsne(dataTrainFS), 'NumDimensions', 3);\r\nend\r\n\r\nfigure;\r\nsubplot(1,2,1);gscatter(AlltSNE2dim(:,1,1), AlltSNE2dim(:,2,1), labels);\r\ntitle(sprintf('Semantic clustering - %s layer', layers{1}));\r\nsubplot(1,2,2);gscatter(AlltSNE2dim(:,1,end), AlltSNE2dim(:,2,end), labels);\r\ntitle(sprintf('Semantic clustering - %s layer', layers{end}));\r\nfigure;\r\nsubplot(1,2,1);scatter3(AlltSNE3dim(:,1,1),AlltSNE3dim(:,2,1),AlltSNE3dim(:,3,1), 20*ones(3500,1),  labels)\r\ntitle(sprintf('Semantic clustering - %s layer', layers{1}));\r\nsubplot(1,2,2);scatter3(AlltSNE3dim(:,1,end),AlltSNE3dim(:,2,end),AlltSNE3dim(:,3,end), 20*ones(3500,1),  labels)\r\ntitle(sprintf('Semantic clustering - %s layer', layers{end}));<\/pre>\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"560\" height=\"420\" class=\"alignnone size-full wp-image-962\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/clustering1.png\" alt=\"\" \/> <img decoding=\"async\" loading=\"lazy\" width=\"560\" height=\"420\" class=\"alignnone size-full wp-image-982\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/clustering2.png\" alt=\"\" \/><\/h6>\r\n<h6><\/h6>\r\nBoth in the two and three dimensional images, it is possible to see that the data is scattered all over the space - in a very random pattern. But when we plot the embedding for the output of 'pool5' the pattern is very different. Now we can clearly see clusters of points according to the semantic content of the image. The clusters correspond to the 5 different classes available in the data. This means that the <em>high-level<\/em> representations learned by the network contain discriminative information about the objects in the images, which allows the network to accurately predict the class of the object.\r\n<h6><\/h6>\r\nIn addition to the information that these visualizations provide about the network, they can also be useful to inspect the data itself. For example, let's visualize a few images where the images are in the wrong cluster, and see if we can get some insight into why the network miss-predicted the output.\r\n<h6><\/h6>\r\n<h3>Examples of images in the wrong semantic cluster<\/h3>\r\n<h6><\/h6>\r\nLet's take a closer look at the 2D image of the pool5 layer, and zoom in on a few of the misclassified images.\r\n<h6><\/h6>\r\n\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td style=\"text-align: center; padding:10px\"><img decoding=\"async\" loading=\"lazy\" width=\"491\" height=\"381\" class=\"alignnone size-full wp-image-986\" style=\"font-size: 16px;\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/2019-01-16_14-24-46.png\" alt=\"\" \/><\/td>\r\n<td style=\"text-align: center;\"><img decoding=\"async\" loading=\"lazy\" width=\"478\" height=\"392\" class=\"alignnone size-full wp-image-984\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/cluster1b.png\" alt=\"\" \/><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n\r\n<h6><\/h6>\r\n<h6><\/h6>\r\n<pre>im = imread(imdsTrain.Files{1619});\r\nfigure;imshow(im);title('Hamburger that looks like a salad');<\/pre>\r\n<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-996 size-medium\" style=\"font-weight: bold;\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/finalim2-300x209.png\" alt=\"\" width=\"300\" height=\"209\" \/>\r\n<h6><\/h6>\r\nA hamburger in the salad cluster. Unlike other hamburger images, there is a significant amount of salad in the photo and no bread\/bun.\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-990\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/cluster2b.png\" alt=\"\" width=\"542\" height=\"405\" \/>\r\n<h6><\/h6>\r\n<pre>im = imread(imdsTrain.Files{125});\r\nfigure;imshow(im);title('Ceaser salad that looks like a hamburger');<\/pre>\r\n.<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-994 size-medium\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/finalim1-300x190.png\" alt=\"\" width=\"300\" height=\"190\" \/>\r\n<h6><\/h6>\r\nA salad in the hamburger cluster. This may be because the image contains a bun or bread in the background.\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"518\" height=\"398\" class=\"alignnone size-full wp-image-992\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/cluster23b.png\" alt=\"\" \/>\r\n<h6><\/h6>\r\n<pre>im = imread(imdsTrain.Files{3000});\r\nfigure;imshow(im);title('Sushi that looks like a hamburger');<\/pre>\r\n<h6>\u00a0\u00a0<span style=\"font-size: 16px;\">\u00a0<\/span><span style=\"font-size: 16px;\">\u00a0<\/span><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-998 size-medium\" style=\"font-size: 16px;\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/finalim3-276x300.png\" alt=\"\" width=\"276\" height=\"300\" \/><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\nMaybe because it has some features that look like something one could find in a burger?\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\nFinally, I think it is interesting to visualize the t-SNE for all the layers of the network, where we can see the data starts as random points, and slowly becomes clustered appropriately.\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"484\" height=\"391\" class=\"alignnone size-full wp-image-1016\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/testRS6.gif\" alt=\"\" \/><\/h6>\r\n<h6><\/h6>\r\n<span style=\"font-family: courier;\">You can download the code using the small \"Get the MATLAB code\" link below. You'll need to bring your own pretrained network and dataset, since that is not included.<\/span>\r\n<h6><\/h6>\r\n<span style=\"font-family: courier;\">Hopefully you find these visualizations interesting and useful! Have a question for Maria? Leave a comment below!<\/span>\r\n<h6><\/h6>\r\n\r\n<script language=\"JavaScript\"> <!-- \r\n    function grabCode_a710f144b80042c592b9fe35aab1fc59() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='a710f144b80042c592b9fe35aab1fc59 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' a710f144b80042c592b9fe35aab1fc59';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2018 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\">Copyright 2018 The MathWorks, Inc.<br><a href=\"javascript:grabCode_a710f144b80042c592b9fe35aab1fc59()\"><span class=\"get_ml_code\">Get the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      <br><\/p><!--\r\na710f144b80042c592b9fe35aab1fc59 ##### SOURCE BEGIN #####\r\n\r\n\r\n%% Visualizing the data and the semantic content (high-level representations) learned by a deep neural network\r\n% This demo presents two simple techniques that one can use to vizualize the \r\n% data and the semantic content of deep neural features. These techniques are: \r\n% *K-nearest neighbors search* and *t-Distributed Stochastic Neighbor Embedding \r\n% (t-SNE)*. The demo uses the Mathworks food dataset. This dataset comprises 3500 \r\n% training images and 1258 test images of the following 5 classes: pizza, hamburger, \r\n% sushi, french fries and ceaser salad. We use a ResNet-18 model pretrained on \r\n% ImageNet and re-trained using the food dataset by following the usual transfer \r\n% learning workflow.\r\n%% Load data and trained network\r\n\r\nclear all\r\nrng(0, 'twister')\r\n\r\n% Load deep learning network\r\nload trainedResNet18Food\r\nimageSize = netFood.Layers(1).InputSize;\r\n\r\n% \"Load\" data\r\npathToImgsTrain = '\\\\mathworks\\public\\Maria_Duarte_Rosa\\FoodDataVisualizations\\foodData\\train';\r\nimdsTrain = imageDatastore(pathToImgsTrain, 'IncludeSubfolders', 1, 'LabelSource', 'foldernames');\r\npathToImgsTest = '\\\\mathworks\\public\\Maria_Duarte_Rosa\\FoodDataVisualizations\\foodData\\test';\r\nimdsTest = imageDatastore(pathToImgsTest, 'IncludeSubfolders', 1, 'LabelSource', 'foldernames');\r\nnumClasses = numel(categories(imdsTrain.Labels));\r\n\r\n% Rename labels to remove hyphens\r\ntmpLabels = char(imdsTrain.Labels);\r\ntmpLabels(tmpLabels=='_') = ' ';\r\nimdsTrain.Labels = categorical(cellstr(tmpLabels));\r\ntmpLabels = char(imdsTest.Labels);\r\ntmpLabels(tmpLabels=='_') = ' ';\r\nimdsTest.Labels = categorical(cellstr(tmpLabels));\r\n\r\n% Split data into train and validation (validation data was used for\r\n% transfer learning)\r\n[imdsTrain, imdsVal] = splitEachLabel(imdsTrain, 0.7, 0.3, 'randomized');\r\nimdsTrainAu = augmentedImageDatastore(imageSize,imdsTrain);\r\nimdsTestAu = augmentedImageDatastore(imageSize,imdsTest);\r\n\r\n% Save labels\r\nlabels = imdsTrain.Labels;\r\n%% k-Nearest neighbor search\r\n% Nearest neighbor search is a type of optimisation problem where the goal is \r\n% to find the closest (most similar) points in space to a given point. Closeness \r\n% in metric spaces is generally defined using a distance metric such as the euclidean \r\n% distance or the Minkowski distance for example. The more similar the points \r\n% are, the smaller this distance should be. k-nearest neighbor search identifies \r\n% the top k closest neighbors to a point in space. This technique can be used \r\n% for classification, but can also be used for visualization of data and high-level \r\n% features of a neural network. \r\n% \r\n% In this demo we selected 5 test images from the Mathworks food dataset \r\n% and searched for the 10 nearest neighbours of these 5 images in the training \r\n% data. In addition, we repeated this procedure for the same 5 test images but \r\n% instead of searching in the raw data space (pixel space) we passed these images \r\n% through the network and searched for the 10 nearest neighbors in feature space \r\n% (where the features are the output of the final average pooling layer, pool5, \r\n% of the network).\r\n% \r\n% As can be seen in the first figure, the 10 nearest neighbors in the data \r\n% (pixel) space do not necessarily correspond to the same class as the test image. \r\n% Searching for similarities in pixel space does not in general return any meaningful \r\n% information about the semantic content of the image but only similarities in \r\n% pixel intensity and color distribution. This is clear in the 4th row where the \r\n% image of the french fries is yellow and brighter at the top and dark at the \r\n% bottom. Most of the nearest neighbors in pixel space seem to be images of all \r\n% classes (pizza, sushi, hamburger, etc.) that contain the same pixel intensity \r\n% and color pattern (they are somewhat yellow or brighter at the top and dark \r\n% at the bottom).\r\n% \r\n% The next figure shows the same 5 test images but their 10 nearest neighbors \r\n% in feature space (the output of the final average pooling layer, pool5, in the \r\n% network). In this figure one can see that now it does not matter so much the \r\n% color and intensity pattern of the pixels but what is the object in the image. \r\n% The nearest neighbors are now images of the same class, for example for the \r\n% last 3 test images only pizzas, french fries and hamburgers were retrieved, \r\n% respectively, since these are the classes of the test images themselves. These \r\n% results show that the features from the deep neural network contain information \r\n% about the semantic content of the images. In other words, the network learned \r\n% to discriminate between classes by learning high-level object specific features \r\n% similarly to what allows humans to distringuish hamburgers from pizzas or ceasar \r\n% salads from sushi. If the training accuracy of the network is high but the nearest \r\n% neighbors in feature space (assuming the features are the output of one of the \r\n% final layers of the network) are not objects from the same class, this may indicate \r\n% that the network has not captured any semantic knowldge related to the classes \r\n% but might have learned to classify based on some artifact of the training data. \r\n\r\n% Choose two layers (first and one of the final layers)\r\nlayers = {'data', 'pool5'};\r\n\r\n% Choose 5 test images\r\nidxTest = [394    97   996   460   737];\r\nfor i = 1:length(layers)\r\n    nameLayer = layers{i};\r\n    doNN = 0; % Stop loop for faster results\r\n    if doNN\r\n        % Get features (activations)\r\n        dataTrainFS{i} = activations(netFood, imdsTrainAu, nameLayer, 'OutputAs', 'rows');\r\n        imgFeatSpaceTest = activations(netFood, imdsTestAu,nameLayer, 'OutputAs', 'rows');\r\n        dataTestFS{i} = imgFeatSpaceTest(idxTest,:);\r\n        \r\n        % Create KNN model and search for nearest neighbours\r\n        Mdl = createns(dataTrainFS{i},'Distance','euclidean');\r\n        idxKnn(:,:,i) = knnsearch(Mdl,dataTestFS{i}, 'k', 10);\r\n    else \r\n        load KNNresultsResNet18.mat\r\n    end\r\n    % Plot nearest neighbours\r\n    plotKnn(imdsTrain, imdsTest, idxTest, idxKnn, i, nameLayer);\r\nend\r\n%% Semantic clustering with t-SNE\r\n% t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear dimensionality \r\n% reduction technique that allows one to embed high-dimensional data in a lower-dimensional \r\n% space of two or three dimensions. This lower dimensional space is estimated \r\n% in such a way that it preserves similarities from the high dimensional space. \r\n% In other words, two similar objects have high probablity of being nearby in \r\n% the lower dimensional space, while two dissimilar objects should be represented \r\n% by distant points. This technique is often used to visualize deep neural network \r\n% features (i.e. the ouput of the layers in a deep neural network). \r\n% \r\n% Here we applied this technique to the training images of the food dataset \r\n% and obtained both a two dimensional and a three dimensional embedding of the \r\n% data. Again we chose to embed the original data (pixel space) and the output \r\n% of the final averaging pooling layer, pool5, in the network. Both in the two \r\n% and three dimensional space it is possible to see that the data is scaterred \r\n% all over the space. Images of similar objects are not represented by nearby \r\n% points, while objects from different classes are close to each other. But when \r\n% we plot the embedding for the output of the average pooling layer the pattern \r\n% is very different. Now we can clearly see clusters of points according to the \r\n% semantic content of the original image. The clusters correspond to the 5 different \r\n% classes available in the data. This means that the high-level representations \r\n% learned by the network contain discriminative information about the objects \r\n% in the images, which allows the network to accurately predict the class of the \r\n% object. In addition to the information that these visualizations provide about \r\n% the network, they can also be useful to inspect the data itself, in particular \r\n% data points whose features were placed in the wrong cluster (see below).\r\n\r\n% Show t-SNE\r\ndoLoop = 0; % Stop loop for faster results\r\nif doLoop\r\n    for k = 1:length(layers)\r\n        AlltSNE2dim(:,:,k) = tsne(dataTrainFS{k});\r\n        AlltSNE3dim(:,:,k) = tsne(dataTrainFS{k}, 'NumDimensions', 3);\r\n    end\r\nelse\r\n    load tSNEresultsResNet18\r\nend\r\nfigure;\r\nsubplot(1,2,1);gscatter(AlltSNE2dim(:,1,1), AlltSNE2dim(:,2,1), labels);\r\ntitle(sprintf('Semantic clustering - %s layer', layers{1}));\r\nsubplot(1,2,2);gscatter(AlltSNE2dim(:,1,end), AlltSNE2dim(:,2,end), labels);\r\ntitle(sprintf('Semantic clustering - %s layer', layers{end}));\r\nfigure;\r\nsubplot(1,2,1);scatter3(AlltSNE3dim(:,1,1),AlltSNE3dim(:,2,1),AlltSNE3dim(:,3,1), 20*ones(3500,1),  labels)\r\ntitle(sprintf('Semantic clustering - %s layer', layers{1}));\r\nsubplot(1,2,2);scatter3(AlltSNE3dim(:,1,end),AlltSNE3dim(:,2,end),AlltSNE3dim(:,3,end), 20*ones(3500,1),  labels)\r\ntitle(sprintf('Semantic clustering - %s layer', layers{end}));\r\n%% Examples of images in the wrong semantic cluster\r\n% If we visualize some of the images for which the pool5 features were clustered \r\n% with other classes in the two dimensional space, i.e. they were placed in the \r\n% wrong cluster, we can find the following examples:\r\n% \r\n% * An image of sushi in the hamburger cluster. Maybe because it has some features \r\n% that look like something one could find in a burger, such as the sauce on top.\r\n% * A ceaser salad in the hamburger cluster. This may be because the image contains \r\n% a bun of bread in the background.\r\n% * A hamburger in the ceaser salad cluster. Unlike other hamburger images for \r\n% this image one cannot see any bread and there is a significant amount of salad \r\n% in the photo.\r\n\r\n% Sushi in the hamburguer cluster\r\nim = imread(imdsTrain.Files{2817});\r\nfigure;imshow(im);title('Sushi that looks like a hamburger');\r\n% Ceaser salad in the hamburguer cluster\r\nim = imread(imdsTrain.Files{125});\r\nfigure;imshow(im);title('Ceaser salad that looks like a hamburger');\r\n% Hamburger in the ceaser salad cluster\r\nim = imread(imdsTrain.Files{1619});\r\nfigure;imshow(im);title('Hamburger that looks like a salad'); \r\n%% Helper functions\r\n%%\r\nfunction plotKnn(imdsTrain, imdsTest, idxTest, idxKnn, l, nameLayer)\r\nj = 1;\r\nfor t = 1:length(idxTest)\r\n    im = imread(imdsTest.Files{idxTest(t)});\r\n    imgsTile(:,:,:,j) = imresize(im, [100,100]);\r\n    j = j + 1;\r\n    for k = 1:size(idxKnn,2)\r\n        imknn = imread(imdsTrain.Files{idxKnn(t,k,l)});\r\n        imgsTile(:,:,:,j) = imresize(imknn, [100,100]);\r\n        j = j + 1;\r\n    end\r\nend\r\nI = imtile(imgsTile,'ThumbnailSize',[100 100], 'GridSize', [5 11]);\r\nfigure;imshow(I); title(sprintf('10 nearest neighbours of images in the first column - %s layer',nameLayer))\r\nend\r\n\r\n\r\n\r\n##### SOURCE END ##### a710f144b80042c592b9fe35aab1fc59\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/01\/clustering1.png\" onError=\"this.style.display ='none';\" \/><\/div><p>Visualization of the data and the semantic content learned by a network\r\nThis post comes from Maria Duarte Rosa, who is going to talk about different ways to visualize features learned by... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2019\/01\/18\/neural-network-feature-visualization\/\">read more >><\/a><\/p>","protected":false},"author":156,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/912"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/156"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=912"}],"version-history":[{"count":39,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/912\/revisions"}],"predecessor-version":[{"id":2411,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/912\/revisions\/2411"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=912"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=912"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=912"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}