{"id":6520,"date":"2021-03-26T09:31:25","date_gmt":"2021-03-26T13:31:25","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=6520"},"modified":"2021-04-06T15:45:12","modified_gmt":"2021-04-06T19:45:12","slug":"finding-information-in-a-sea-of-noise","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2021\/03\/26\/finding-information-in-a-sea-of-noise\/","title":{"rendered":"Finding Information in a Sea of Noise"},"content":{"rendered":"<em>The following is a guest post by Dr. Brett Shoelson, Principal Application Engineer at MathWorks<\/em>\r\n<h6><\/h6>\r\n<div class=\"content\"><!--introduction--><p>For today's blog, I would like to pose a problem:<\/p><p>Suppose that you had a lot of data representing something very random, but that had a very small \"tell\" that differentiated the data into two different classes--a single nucleotide defect in a noisy genome population, for instance. (Okay, maybe that's a stretch.) But given a single informative \"bit\" in a noisy dataset, how would you find that tell?<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#7764e3a2-3fed-46c2-b509-dc2f390b4178\">Let's create (and visualize) a dataset that sets up the question<\/a><\/li><li><a href=\"#9dfb3b09-dd5a-4632-bdc2-55861c2e8ff7\">Create a \"Tell\":<\/a><\/li><li><a href=\"#c27ad943-f1e0-48d3-9a20-28fcf1dabc56\">Let's take a look at three of each class...can you spot the tell?<\/a><\/li><li><a href=\"#550bfb92-ff6c-4594-a0f5-ecaca48d87a2\">Obfuscating<\/a><\/li><li><a href=\"#f1ab0e6f-b537-41d9-b7cd-1ceb6b29be75\">Enter Deep Learning<\/a><\/li><li><a href=\"#4e48e657-5a89-4844-9636-585272f232d2\">Great! But can we determine the <i>location<\/i> of the tell?<\/a><\/li><li><a href=\"#0e522571-0eb3-48e9-b774-ca30700f07a8\">A final comment<\/a><\/li><\/ul><\/div><h4>Let's create (and visualize) a dataset that sets up the question<a name=\"7764e3a2-3fed-46c2-b509-dc2f390b4178\"><\/a><\/h4><p>First, create 10,000 random 20 x 20 matrices.<\/p><pre class=\"codeinput\">rng(0);\r\nn = 10000;\r\nsz = [20 20];\r\na = rand(sz(1), sz(2), 1, n);\r\n<\/pre><h4>Create a \"Tell\":<a name=\"9dfb3b09-dd5a-4632-bdc2-55861c2e8ff7\"><\/a><\/h4><p>Now we will convert 1 randomly selected pixel to be modified as a class1\/class2 \"tell,\" or \"indicator\":<\/p><pre class=\"codeinput\">randomInformationalElement = randi(sz(1) * sz(2))\r\n[rowIndTrue, colIndTrue] = ind2sub(size(a), randomInformationalElement);\r\n\r\n<span class=\"comment\">% At that randomly selected location, we will set half of the images to<\/span>\r\n<span class=\"comment\">% have one random value, and the other half to have a different random<\/span>\r\n<span class=\"comment\">% value:<\/span>\r\n\r\n<span class=\"comment\">%Class 1:<\/span>\r\nclass1Val = rand(1);\r\n<span class=\"keyword\">for<\/span> ii = 1:n\/2\r\n a(rowIndTrue, colIndTrue, ii) = class1Val;<span class=\"comment\">%1<\/span>\r\n<span class=\"keyword\">end<\/span>\r\n\r\n<span class=\"comment\">%Class 2:<\/span>\r\nclass2Val = rand(1);\r\n<span class=\"keyword\">for<\/span> ii = n\/2 + 1:n\r\n a(rowIndTrue, colIndTrue, ii) = class2Val;<span class=\"comment\">%0.5<\/span>\r\n<span class=\"keyword\">end<\/span>\r\n\r\n<span class=\"comment\">% And we will create categorical labels to keep track of the \"Class\":<\/span>\r\nlabels = [repmat(categorical(<span class=\"string\">\"class1\"<\/span>), n\/2, 1);\r\n repmat(categorical(<span class=\"string\">\"class2\"<\/span>), n\/2, 1)];\r\nsummary(labels)\r\n<\/pre><pre class=\"codeoutput\">randomInformationalElement =\r\n   193\r\n     class1      5000 \r\n     class2      5000 \r\n<\/pre><h4>Let's take a look at three of each class...can you spot the tell?<a name=\"c27ad943-f1e0-48d3-9a20-28fcf1dabc56\"><\/a><\/h4><pre class=\"codeinput\">figure(<span class=\"string\">'Name'<\/span>, <span class=\"string\">'Samples'<\/span>);\r\ninds = [1:3, n-2:n];\r\nlayout = tiledlayout(2, 3, <span class=\"string\">'TileSpacing'<\/span>, <span class=\"string\">'compact'<\/span>);\r\nax = gobjects(2, 3);\r\nind = 1;\r\n<span class=\"keyword\">for<\/span> ii = inds\r\n ax(ind) = nexttile(layout);\r\n imshow(a(:, :, ii))\r\n hold <span class=\"string\">on<\/span>\r\n <span class=\"keyword\">if<\/span> ind == 2\r\n title(<span class=\"string\">'CLASS 1'<\/span>, <span class=\"string\">'color'<\/span>, <span class=\"string\">'r'<\/span>, <span class=\"string\">'fontsize'<\/span>, 18);\r\n <span class=\"keyword\">elseif<\/span> ind == 5\r\n title(<span class=\"string\">'CLASS 2'<\/span>, <span class=\"string\">'color'<\/span>, <span class=\"string\">'r'<\/span>, <span class=\"string\">'fontsize'<\/span>, 18);\r\n <span class=\"keyword\">end<\/span>\r\n ind = ind + 1;\r\n<span class=\"keyword\">end<\/span>\r\n\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/03\/DLBlogPost1_01.png\" alt=\"\"> <h4>How about now?<a name=\"8768c128-046b-4948-abd7-71d1e4c51f30\"><\/a><\/h4><pre class=\"codeinput\"><span class=\"keyword\">for<\/span> ii = 1:6\r\n plot(ax(ii), colIndTrue, rowIndTrue, <span class=\"string\">'gs'<\/span>, <span class=\"string\">'MarkerSize'<\/span>, 12, <span class=\"string\">'LineWidth'<\/span>, 2)\r\n<span class=\"keyword\">end<\/span>\r\nlinkaxes(ax)\r\nset(ax, <span class=\"string\">'xlim'<\/span>, [0.85*colIndTrue, 1.15*colIndTrue], <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'ylim'<\/span>, [0.85*rowIndTrue, 1.15*rowIndTrue])\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/03\/DLBlogPost1_02.png\" alt=\"\"> <h4>Where is the tell?<a name=\"8bda2b14-7207-400f-99ad-2ec113111392\"><\/a><\/h4><p>The goal here is to find a model to detect the informative bit--thereby separating the matrices into two classes. The perceptive reader might realize that one could simply look at the minimum standard deviation of the matrices, for example, to find the informative bit:<\/p><pre class=\"codeinput\">figure(<span class=\"string\">'Name'<\/span>, <span class=\"string\">'Found It!'<\/span>)\r\nstdA = std(a, 1, ndims(a));\r\nimshow(stdA, []);\r\ntitle(<span class=\"string\">'Standard Deviation'<\/span>)\r\ndetection = find(stdA == min(stdA(:)));\r\n[rowIndDetected, colIndDetected] = ind2sub(size(a), detection);\r\nhold <span class=\"string\">on<\/span>\r\nplot(colIndTrue, rowIndTrue, <span class=\"string\">'gs'<\/span>, <span class=\"string\">'MarkerSize'<\/span>, 12, <span class=\"string\">'LineWidth'<\/span>, 4)\r\nplot(colIndDetected, rowIndDetected, <span class=\"string\">'rs'<\/span>, <span class=\"string\">'MarkerSize'<\/span>, 12, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'LineWidth'<\/span>, 1.5);\r\n<span class=\"keyword\">if<\/span> detection == randomInformationalElement\r\n    detected = <span class=\"string\">\"true\"<\/span>;\r\n<span class=\"keyword\">else<\/span>\r\n    detected = <span class=\"string\">\"false\"<\/span>;\r\n<span class=\"keyword\">end<\/span>\r\ntitle(<span class=\"string\">\"Informative Bit = \"<\/span> + detection + <span class=\"string\">\"? (\"<\/span> + detected + <span class=\"string\">\")\"<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/03\/DLBlogPost1_03.png\" alt=\"\"> <h4>Obfuscating<a name=\"550bfb92-ff6c-4594-a0f5-ecaca48d87a2\"><\/a><\/h4><p>So clearly this is a bit contrived. We could further obfuscate it by changing some values in confounding ways. For instance:<\/p><pre class=\"codeinput\"><span class=\"keyword\">for<\/span> jj = 1:10\r\n    confounder = randi(sz(1) * sz(2));\r\n    [rowInd, colInd] = ind2sub(size(a), confounder);\r\n    R = rand(1);\r\n    <span class=\"keyword\">for<\/span> ii = 1:2:n\r\n        a(rowInd, colInd, ii) = R;\r\n    <span class=\"keyword\">end<\/span>\r\n    R = rand(1);\r\n    <span class=\"keyword\">for<\/span> ii = 2:2:n\r\n        a(rowInd, colInd, ii) = R;\r\n    <span class=\"keyword\">end<\/span>\r\n<span class=\"keyword\">end<\/span>\r\n\r\nfigure(<span class=\"string\">'Name'<\/span>, <span class=\"string\">'Obfuscated'<\/span>)\r\nstdA = std(a, 1, ndims(a));\r\nimshow(stdA, []);\r\ntitle(<span class=\"string\">'Standard Deviation'<\/span>)\r\ndetection = find(stdA == min(stdA(:)));\r\n[rowIndDetected, colIndDetected] = ind2sub(size(a), detection);\r\nhold <span class=\"string\">on<\/span>\r\nplot(colIndTrue, rowIndTrue, <span class=\"string\">'gs'<\/span>, <span class=\"string\">'MarkerSize'<\/span>, 12, <span class=\"string\">'LineWidth'<\/span>, 4)\r\nplot(colIndDetected, rowIndDetected, <span class=\"string\">'rs'<\/span>, <span class=\"string\">'MarkerSize'<\/span>, 12, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'LineWidth'<\/span>, 1.5);\r\n<span class=\"keyword\">if<\/span> detection == randomInformationalElement\r\n    detected = <span class=\"string\">\"true\"<\/span>;\r\n<span class=\"keyword\">else<\/span>\r\n    detected = <span class=\"string\">\"false\"<\/span>;\r\n<span class=\"keyword\">end<\/span>\r\ntitle(<span class=\"string\">\"Informative Bit = \"<\/span> + detection + <span class=\"string\">\"? (\"<\/span> + detected + <span class=\"string\">\")\"<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/03\/DLBlogPost1_04.png\" alt=\"\"> <p>Now the information is more obscure!<\/p><h4>A challenge:<a name=\"fd5892ca-faab-47ca-a819-b1ac94590782\"><\/a><\/h4><p>Try to find this informative bit using \"classical machine learning\" (CML) models, leveraging tools like the <a href=\"https:\/\/www.mathworks.com\/help\/stats\/classificationlearner-app.html\">classificationLearner App<\/a>. In my experience, the models afforded by the classificationLearner will churn for a very long time (hours, even!), and none of the models will converge to anything better than 50%. That is, the models will be 100% useless! (I leave that trial to the reader. But I'll send a MATLAB T-shirt to the first person who shares with me a model trained with that app that reliably solves this problem!)<\/p><h4>Constraints!<a name=\"89ba8b49-f7db-47e0-a6f2-1eefa0e58668\"><\/a><\/h4><p>Why do CML models fail? In a word: constraints! Typically, to create an image classifier, we might first aggregate features using a <a href=\"https:\/\/www.mathworks.com\/help\/vision\/ref\/bagoffeatures.html\">\"bag of features\"<\/a>. Then, using those aggregated features, we could train an \"image category classifier.\" (The <a href=\"https:\/\/www.mathworks.com\/help\/vision\/ref\/trainimagecategoryclassifier.html\">trainImageCategoryClassifier<\/a> function makes trivial work of that.) Note that using <tt>bagOfFeatures<\/tt> <em>implicitly calculates <a href=\"https:\/\/www.mathworks.com\/help\/vision\/ref\/detectsurffeatures.html\">SURF Features<\/a><\/em>, and that <tt>trainImageCategoryClassifier<\/tt> <em>implicitly trains a multiclass Support Vector Machine (SVM)<\/em>. Features characterize relationships between pixels, and it's not clear that either SURF or SVM are appropriate for the task at hand. And even if you used non-default detectors, extractors, and classifiers, you would still have a constrained model!<\/p><h4>Enter Deep Learning<a name=\"f1ab0e6f-b537-41d9-b7cd-1ceb6b29be75\"><\/a><\/h4><p>Deep learning is, in contrast, <i>relatively unconstrained<\/i>; we don't have to tell the model what relationships to look at. Rather, we can specify a \"network architecture\" and provide a bunch of \"ground truth,\" and let the computer figure out what to look for!<\/p><p>For instance, here we create just about the simplest \"typical\" network architecture for classifying images:<\/p><pre class=\"codeinput\">sizeOfKernel = [5, 5];\r\nnumberOfFilters = 20;\r\nnClasses = 2;\r\nlayers = [\r\n imageInputLayer([sz(1) sz(2) 1])\r\n convolution2dLayer(sizeOfKernel, numberOfFilters, <span class=\"string\">'Name'<\/span>, <span class=\"string\">'conv'<\/span>)\r\n reluLayer\r\n maxPooling2dLayer(2, <span class=\"string\">'Stride'<\/span>, 2)\r\n fullyConnectedLayer(nClasses, <span class=\"string\">'Name'<\/span>, <span class=\"string\">'fc'<\/span>)\r\n softmaxLayer\r\n classificationLayer()\r\n ];\r\n<\/pre><p>That \"triad\" of \"convolution, relu, and pooling\" layers is very common in deep learning networks designed for image analysis. But note that we haven't overly constrained the model to consider only a specific feature- or model-type; we've simply told the model to calculate 20 5x5 convolutions. And more to the point, we haven't even specified what patterns (convolution kernels) to look for.<\/p><h4>So let's create validation and test sets, and train the model<a name=\"1cf568ae-2cc6-4806-a830-73fdab844190\"><\/a><\/h4><p>Creating a validation set will help us ensure that the model is not overfitting, and a test set will help us to evaluate the model after training.<\/p><pre class=\"codeinput\"><span class=\"comment\">% First, the validation set:<\/span>\r\ninds = 1:100:size(a, 4);\r\nvalidationData = a(:, :, :, inds);\r\nvalidationLabels = labels(inds);\r\n<span class=\"comment\">% Remove the validation labels from the training set:<\/span>\r\na(:, :, :, inds) = [];\r\nlabels(inds) = [];\r\n<span class=\"comment\">% Now the test set:<\/span>\r\ninds = 1:100:size(a, 4);\r\ntestSet = a(:, :, :, inds);\r\ntestLabels = labels(inds);\r\na(:, :, :, inds) = [];\r\nlabels(inds) = [];\r\n\r\n<span class=\"comment\">% ...Specify some training options:<\/span>\r\n\r\nminiBatchSize = 100;\r\noptions = trainingOptions( <span class=\"string\">'adam'<\/span>, <span class=\"keyword\">...<\/span>\r\n <span class=\"string\">'InitialLearnRate'<\/span>, 0.005, <span class=\"keyword\">...<\/span>\r\n <span class=\"string\">'MaxEpochs'<\/span>, 1000, <span class=\"keyword\">...<\/span>\r\n <span class=\"string\">'MiniBatchSize'<\/span>, miniBatchSize, <span class=\"keyword\">...<\/span>\r\n <span class=\"string\">'Plots'<\/span>, <span class=\"string\">'training-progress'<\/span>, <span class=\"keyword\">...<\/span>\r\n <span class=\"string\">'ValidationData'<\/span>, {validationData, validationLabels}, <span class=\"keyword\">...<\/span>\r\n <span class=\"string\">'ValidationFrequency'<\/span>, 10, <span class=\"keyword\">...<\/span>\r\n <span class=\"string\">'ValidationPatience'<\/span>, 30, <span class=\"keyword\">...<\/span>\r\n <span class=\"string\">'OutputFcn'<\/span>, @(info)stopIfAccuracyNotImproving(info, 50));\r\n\r\n<span class=\"comment\">% ... and Train!<\/span>\r\nnet = trainNetwork(a, labels, layers, options);\r\n<\/pre><pre class=\"codeoutput\">Training on single GPU.\r\n\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/03\/DLBlogPost1_05.png\" alt=\"\"> <h4>Wow...<a name=\"b439a998-6c8e-47c4-a46d-fa1183dea2ee\"><\/a><\/h4><p>In just under half a minute, this simple \"deep\" learning model appears to have converged to 95% accuracy!<\/p><pre class=\"codeinput\">predictedLabels = net.classify(testSet);\r\nind = randi(size(testSet, ndims(a)));\r\nnet.classify(testSet(:, :, :, ind));\r\ntogglefig(<span class=\"string\">'Confusion Matrix'<\/span>)\r\nm = confusionchart(testLabels, predictedLabels);\r\ntestAccuracy = sum(predictedLabels == testLabels) \/ numel(testLabels)\r\n<\/pre><pre class=\"codeoutput\">testAccuracy =\r\n      0.94949\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/03\/DLBlogPost1_06.png\" alt=\"\"> <h4>What's going on?<a name=\"d93b69b2-0839-4fa6-aeb6-79ae915cad3f\"><\/a><\/h4><p>That's useful, but it's also puzzling: if the model has figured out where the \"tell\" is (which it must have done, or it couldn't have gotten above 50%), why then isn't it 100% accurate?<\/p><p>The answer to that lies in the network architecture. The \"triad\" of typical convolutional neural network (CNN) layers that we used includes a pooling layer. Since our information is a single bit, the pooling is \"smearing\" the information! What if we removed that layer?<\/p><pre class=\"codeinput\">layers = [\r\n imageInputLayer([sz(1) sz(2) 1])\r\n convolution2dLayer(sizeOfKernel, numberOfFilters, <span class=\"string\">'Name'<\/span>, <span class=\"string\">'conv'<\/span>)\r\n reluLayer\r\n fullyConnectedLayer(2, <span class=\"string\">'Name'<\/span>, <span class=\"string\">'fc'<\/span>)\r\n softmaxLayer\r\n classificationLayer()\r\n ];\r\nnet = trainNetwork(a, labels, layers, options);\r\npredictedLabels = net.classify(testSet);\r\ntestAccuracy = sum(predictedLabels == testLabels) \/ numel(testLabels)\r\n<\/pre><pre class=\"codeoutput\">Training on single GPU.\r\n\r\n<\/pre><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"588\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/03\/DLBlogPost1CODE_07-1024x588.png\" alt=\"\" class=\"alignnone size-large wp-image-6564\" \/> <h4>Sweet!<a name=\"753a309e-3d49-476a-8025-4a9d452e247d\"><\/a><\/h4><p>About 10 seconds to 100% accuracy! That helps us to understand what some of the layers are doing, and why we need to tailor the network to the task at hand! (Note that we could also remove the relu layer; it neither helps nor hinders the model in this particular case.)<\/p><h4>Great! But can we determine the <i>location<\/i> of the tell?<a name=\"4e48e657-5a89-4844-9636-585272f232d2\"><\/a><\/h4><p>Yes! <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ref\/deepdreamimage.html\">\"Deep Dream\"<\/a> is your friend!<\/p><pre class=\"codeinput\">channels = [1, 2];\r\nlayer = 4; <span class=\"comment\">%Fully Connected<\/span>\r\nI = deepDreamImage(net, layer, channels, <span class=\"string\">'PyramidLevels'<\/span>, 1);\r\ntogglefig(<span class=\"string\">'Deep Dream'<\/span>);\r\nsubplot(1, 2, 1)\r\nchannel1Image = I(:, :, :, 1);\r\nimshow(channel1Image);\r\ntitle(<span class=\"string\">'Deep Dream Channel 1 (1-Level)'<\/span>)\r\nsubplot(1, 2, 2)\r\nchannel2Image = I(:, :, :, 2);\r\nimshow(channel2Image);\r\ntitle(<span class=\"string\">'Deep Dream Channel 2 (1-Level)'<\/span>)\r\n[rmax, cmax] = find(channel1Image == min(channel1Image(:)));\r\n<span class=\"comment\">% Or [rmax, cmax] = find(channel2Image == max(channel2Image(:)));<\/span>\r\nfprintf(<span class=\"string\">'TARGET:\\t\\tRowInd = %i;\\tColInd = %i;\\nDETECTION:\\tRow = %i;\\t\\tCol = %i\\n'<\/span>, rowIndTrue, colIndTrue, rmax, cmax)\r\n<\/pre><pre class=\"codeoutput\">|==============================================|\r\n|  Iteration  |  Activation  |  Pyramid Level  |\r\n|             |   Strength   |                 |\r\n|==============================================|\r\n|           1 |         1.55 |               1 |\r\n|           2 |       265.54 |               1 |\r\n|           3 |       533.66 |               1 |\r\n|           4 |       804.17 |               1 |\r\n|           5 |      1075.66 |               1 |\r\n|           6 |      1347.60 |               1 |\r\n|           7 |      1619.17 |               1 |\r\n|           8 |      1891.03 |               1 |\r\n|           9 |      2163.16 |               1 |\r\n|          10 |      2435.12 |               1 |\r\n|==============================================|\r\nTARGET:\t\tRowInd = 13;\tColInd = 10;\r\nDETECTION:\tRow = 13;\t\tCol = 10\r\n<\/pre><img decoding=\"async\" loading=\"lazy\" width=\"585\" height=\"282\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/03\/SNRBlogImage5.png\" alt=\"\" class=\"alignnone size-full wp-image-6838\" \/> <h4>A final comment<a name=\"0e522571-0eb3-48e9-b774-ca30700f07a8\"><\/a><\/h4><p>When we talk about deep learning, \"deep\" refers typically to the number of layers in the network architecture. This model isn't really deep in that regard. But we <i>did<\/i> implement end-to-end learning (i.e., learning directly from data)--and that is another hallmark of deep learning.<\/p>\r\n<p>I hope you found this interesting, even if it is a bit contrived. Your comments are welcome!<\/p>\r\n\r\n<script language=\"JavaScript\"> <!-- \r\n    function grabCode_5ba588cb17fa4daa9efac0cb1b2643be() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='5ba588cb17fa4daa9efac0cb1b2643be ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 5ba588cb17fa4daa9efac0cb1b2643be';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2021 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_5ba588cb17fa4daa9efac0cb1b2643be()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2020b<br><\/p><\/div><!--\r\n5ba588cb17fa4daa9efac0cb1b2643be ##### SOURCE BEGIN #####\r\n%% Detecting a Single Informational Bit in a Sea of Noise\r\n%\r\n% For today's blog, I would like to pose a problem:\r\n%\r\n% Suppose that you had a lot of data representing something very random,\r\n% but that had a very small \"tell\" that differentiated the data into two\r\n% different classesREPLACE_WITH_DASH_DASHa single nucleotide defect in a noisy genome\r\n% population, for instance. (Okay, maybe that's a stretch.) But given a\r\n% single informative \"bit\" in a noisy dataset, how would you find that\r\n% tell?\r\n\r\n%% Let's create (and visualize) a dataset that sets up the question\r\n%\r\n% First, create 10,000 random 20 x 20 matrices, the\r\n% elements of which have values of {0.25, 0.5, 0.75, 1}:\r\nrng(0);\r\nn = 10000;\r\nsz = [20 20];\r\na = rand(sz(1), sz(2), 1, n);\r\n\r\n%% Create a \"Tell\":\r\n%\r\n% Now we will convert 1 randomly selected pixel to be modified as a\r\n% class1\/class2 \"tell,\" or \"indicator\":\r\nrandomInformationalElement = randi(sz(1) * sz(2))\r\n[rowIndTrue, colIndTrue] = ind2sub(size(a), randomInformationalElement);\r\n\r\n% At that randomly selected location, we will set half of the images to\r\n% have one random value, and the other half to have a different random\r\n% value: \r\n\r\n%Class 1:\r\nclass1Val = rand(1);\r\nfor ii = 1:n\/2\r\n a(rowIndTrue, colIndTrue, ii) = class1Val;%1\r\nend\r\n\r\n%Class 2:\r\nclass2Val = rand(1);\r\nfor ii = n\/2 + 1:n\r\n a(rowIndTrue, colIndTrue, ii) = class2Val;%0.5\r\nend\r\n\r\n% And we will create categorical labels to keep track of the \"Class\":\r\nlabels = [repmat(categorical(\"class1\"), n\/2, 1);\r\n repmat(categorical(\"class2\"), n\/2, 1)];\r\nsummary(labels)\r\n\r\n%% Let's take a look at three of each class...can you spot the tell?\r\nfigure('Name', 'Samples');\r\ninds = [1:3, n-2:n];\r\nlayout = tiledlayout(2, 3, 'TileSpacing', 'compact');\r\nax = gobjects(2, 3);\r\nind = 1;\r\nfor ii = inds\r\n ax(ind) = nexttile(layout);\r\n imshow(a(:, :, ii))\r\n hold on\r\n if ind == 2\r\n title('CLASS 1', 'color', 'r', 'fontsize', 18);\r\n elseif ind == 5\r\n title('CLASS 2', 'color', 'r', 'fontsize', 18);\r\n end\r\n ind = ind + 1;\r\nend\r\n\r\n%% How about now?\r\nfor ii = 1:6\r\n plot(ax(ii), colIndTrue, rowIndTrue, 'gs', 'MarkerSize', 12, 'LineWidth', 2)\r\nend\r\nlinkaxes(ax)\r\nset(ax, 'xlim', [0.85*colIndTrue, 1.15*colIndTrue], ...\r\n    'ylim', [0.85*rowIndTrue, 1.15*rowIndTrue])\r\n\r\n%% Where is the tell?\r\n% The goal here is to find a model to detect the informative bitREPLACE_WITH_DASH_DASHthereby\r\n% separating the matrices into two classes. The perceptive reader might\r\n% realize that one could simply look at the minimum standard deviation of\r\n% the matrices to find the informative bit:\r\n\r\nfigure('Name', 'Found It!')\r\nstdA = std(a, 1, ndims(a));\r\nimshow(stdA, []);\r\ntitle('Standard Deviation')\r\ndetection = find(stdA == min(stdA(:)));\r\n[rowIndDetected, colIndDetected] = ind2sub(size(a), detection);\r\nhold on\r\nplot(colIndTrue, rowIndTrue, 'gs', 'MarkerSize', 12, 'LineWidth', 4)\r\nplot(colIndDetected, rowIndDetected, 'rs', 'MarkerSize', 12, ...\r\n    'LineWidth', 1.5);\r\nif detection == randomInformationalElement\r\n    detected = \"true\";\r\nelse\r\n    detected = \"false\";\r\nend\r\ntitle(\"Informative Bit = \" + detection + \"? (\" + detected + \")\")\r\n\r\n%% Obfuscating\r\n% So clearly this is a bit contrived. We could further obfuscate it by\r\n% changing some values in confounding ways. For instance:\r\nfor jj = 1:10\r\n    confounder = randi(sz(1) * sz(2));\r\n    [rowInd, colInd] = ind2sub(size(a), confounder);\r\n    R = rand(1);\r\n    for ii = 1:2:n\r\n        a(rowInd, colInd, ii) = R;\r\n    end\r\n    R = rand(1);\r\n    for ii = 2:2:n\r\n        a(rowInd, colInd, ii) = R;\r\n    end\r\nend\r\n\r\nfigure('Name', 'Obfuscated')\r\nstdA = std(a, 1, ndims(a));\r\nimshow(stdA, []);\r\ntitle('Standard Deviation')\r\ndetection = find(stdA == min(stdA(:)));\r\n[rowIndDetected, colIndDetected] = ind2sub(size(a), detection);\r\nhold on\r\nplot(colIndTrue, rowIndTrue, 'gs', 'MarkerSize', 12, 'LineWidth', 4)\r\nplot(colIndDetected, rowIndDetected, 'rs', 'MarkerSize', 12, ...\r\n    'LineWidth', 1.5);\r\nif detection == randomInformationalElement\r\n    detected = \"true\";\r\nelse\r\n    detected = \"false\";\r\nend\r\ntitle(\"Informative Bit = \" + detection + \"? (\" + detected + \")\")\r\n\r\n%%\r\n% Now the information is more obscure!\r\n\r\n%% A challenge:\r\n% Try to find this informative bit using \"classical machine learning\" (CML)\r\n% models, leveraging tools like the\r\n% <https:\/\/www.mathworks.com\/help\/stats\/classificationlearner-app.html\r\n% classificationLearner App>. In my experience, the models afforded by the\r\n% classificationLearner will churn for a very long time (hours, even!), and\r\n% none of the models will converge to anything better than 50%. That is,\r\n% the models will be 100% useless! (I leave that trial to the reader. But\r\n% I'll send a MATLAB T-shirt to the first person who shares with me a\r\n% model trained with that app that reliably solves this problem!)\r\n\r\n%% Constraints!\r\n% Why do CML models fail? In a word: constraints! Typically, to create an\r\n% image classifier, we might first aggregate features using a\r\n% <https:\/\/www.mathworks.com\/help\/vision\/ref\/bagoffeatures.html \"bag of features\">. \r\n% Then, using those aggregated features, we could train an\r\n% \"image category classifier.\" (The\r\n% <https:\/\/www.mathworks.com\/help\/vision\/ref\/trainimagecategoryclassifier.html trainImageCategoryClassifier> \r\n% function makes trivial work of that.) Note\r\n% that using |bagOfFeatures| _implicitly calculates\r\n% <https:\/\/www.mathworks.com\/help\/vision\/ref\/detectsurffeatures.html SURF\r\n% Features>, _ and that |trainImageCategoryClassifier| _implicitly trains a\r\n% multiclass Support Vector Machine (SVM)_. Features characterize relationships \r\n% between pixels, and it's not clear that either SURF or SVM are \r\n% appropriate for the task at hand. And even if you used\r\n% non-default detectors, extractors, and classifiers, you would still have a\r\n% constrained model!\r\n\r\n%% Enter Deep Learning\r\n% Deep learning is, in contrast, _relatively unconstrained_; we don't have \r\n% to tell the model what relationships to look at. Rather, we can specify a \r\n% \"network architecture\" and provide a bunch of \"ground truth,\" and let the \r\n% computer figure out what to look for!\r\n%\r\n% For instance, here we create just about the simplest \"typical\"\r\n% network architecture for classifying images:\r\n\r\nsizeOfKernel = [5, 5];\r\nnumberOfFilters = 20;\r\nnClasses = 2;\r\nlayers = [\r\n imageInputLayer([sz(1) sz(2) 1])\r\n convolution2dLayer(sizeOfKernel, numberOfFilters, 'Name', 'conv')\r\n reluLayer\r\n maxPooling2dLayer(2, 'Stride', 2)\r\n fullyConnectedLayer(nClasses, 'Name', 'fc')\r\n softmaxLayer\r\n classificationLayer()\r\n ];\r\n\r\n%%\r\n% That \"triad\" of \"convolution, relu, and pooling\" layers is very common in\r\n% deep learning networks designed for image analysis. But note that we\r\n% haven't overly constrained the model to considering only a specific\r\n% feature- or model-type; we've simply told the model to calculate 20 5x5\r\n% convolutions. And more to the point, we haven't even specified what\r\n% patterns (convolution kernels) to look for.\r\n\r\n%% So let's create validation and test sets, and train the model\r\n% Creating a validation set will help us ensure that the model is not\r\n% overfitting, and a test set will help us to evaluate the model after\r\n% training.\r\n\r\n% First, the validation set:\r\ninds = 1:100:size(a, 4);\r\nvalidationData = a(:, :, :, inds);\r\nvalidationLabels = labels(inds);\r\n% Remove the validation labels from the training set:\r\na(:, :, :, inds) = [];\r\nlabels(inds) = [];\r\n% Now the test set:\r\ninds = 1:100:size(a, 4);\r\ntestSet = a(:, :, :, inds);\r\ntestLabels = labels(inds);\r\na(:, :, :, inds) = [];\r\nlabels(inds) = [];\r\n\r\n% ...Specify some training options:\r\n\r\nminiBatchSize = 100;\r\noptions = trainingOptions( 'adam', ...\r\n 'InitialLearnRate', 0.005, ...\r\n 'MaxEpochs', 1000, ...\r\n 'MiniBatchSize', miniBatchSize, ...\r\n 'Plots', 'training-progress', ...\r\n 'ValidationData', {validationData, validationLabels}, ...\r\n 'ValidationFrequency', 10, ...\r\n 'ValidationPatience', 30, ...\r\n 'OutputFcn', @(info)stopIfAccuracyNotImproving(info, 50));\r\n\r\n% ... and Train!\r\nnet = trainNetwork(a, labels, layers, options);\r\n\r\n%% Wow...\r\n% In just under half a minute, this simple \"deep\" learning model appears to have\r\n% converged to 95% accuracy!\r\n\r\npredictedLabels = net.classify(testSet);\r\nind = randi(size(testSet, ndims(a)));\r\nnet.classify(testSet(:, :, :, ind));\r\ntogglefig('Confusion Matrix')\r\nm = confusionchart(testLabels, predictedLabels);\r\ntestAccuracy = sum(predictedLabels == testLabels) \/ numel(testLabels)\r\n\r\n%% What's going on?\r\n% That's useful, but it's also puzzling: if the model has figured out where\r\n% the \"tell\" is (which it must have done, or it couldn't have gotten above\r\n% 50%), why then isn't it 100% accurate?\r\n%\r\n% The answer to that lies in the network architecture. The \"triad\" of\r\n% typical convolutional neural network (CNN) layers that we used includes a\r\n% pooling layer. Since our information is a single bit, the pooling is\r\n% \"smearing\" the information! What if we removed that layer?\r\n\r\nlayers = [\r\n imageInputLayer([sz(1) sz(2) 1])\r\n convolution2dLayer(sizeOfKernel, numberOfFilters, 'Name', 'conv')\r\n reluLayer\r\n fullyConnectedLayer(2, 'Name', 'fc')\r\n softmaxLayer\r\n classificationLayer()\r\n ];\r\nnet = trainNetwork(a, labels, layers, options);\r\npredictedLabels = net.classify(testSet);\r\ntestAccuracy = sum(predictedLabels == testLabels) \/ numel(testLabels)\r\n\r\n%% Sweet!\r\n% About 10 seconds to 100% accuracy! That helps us to understand what\r\n% some of the layers are doing, and why we need to tailor the network to\r\n% the task at hand! (Note that we could also remove the relu layer; it\r\n% neither helps nor hinders the model in this particular case.)\r\n\r\n%% Great! But can we determine the _location_ of the tell?\r\n% Yes! <https:\/\/www.mathworks.com\/help\/deeplearning\/ref\/deepdreamimage.html \"Deep dream\"> \r\n% is your friend!\r\n\r\nchannels = [1, 2];\r\nlayer = 4; %Fully Connected\r\nI = deepDreamImage(net, layer, channels, 'PyramidLevels', 1);\r\ntogglefig('Deep Dream');\r\nsubplot(1, 2, 1)\r\nchannel1Image = I(:, :, :, 1);\r\nimshow(channel1Image);\r\ntitle('Deep Dream Channel 1 (1-Level)')\r\nsubplot(1, 2, 2)\r\nchannel2Image = I(:, :, :, 2);\r\nimshow(channel2Image);\r\ntitle('Deep Dream Channel 2 (1-Level)')\r\n[rmax, cmax] = find(channel1Image == min(channel1Image(:)));\r\n% Or [rmax, cmax] = find(channel2Image == max(channel2Image(:)));\r\nfprintf('TARGET:\\t\\tRowInd = %i;\\tColInd = %i;\\nDETECTION:\\tRow = %i;\\t\\tCol = %i\\n', rowIndTrue, colIndTrue, rmax, cmax)\r\n\r\n%% A final comment\r\n% When we talk about deep learning, \"deep\" refers typically to the number\r\n% of layers in the network architecture. This model isn't really deep in\r\n% that regard. But we _did_ implement end-to-end learning (i.e., learning\r\n% directly from data)REPLACE_WITH_DASH_DASHand that is another hallmark of deep learning.\r\n##### SOURCE END ##### 5ba588cb17fa4daa9efac0cb1b2643be\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/03\/DLBlogPost1_01.png\" onError=\"this.style.display ='none';\" \/><\/div><p>The following is a guest post by Dr. Brett Shoelson, Principal Application Engineer at MathWorks\r\n\r\nFor today's blog, I would like to pose a problem:Suppose that you had a lot of data representing... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2021\/03\/26\/finding-information-in-a-sea-of-noise\/\">read more >><\/a><\/p>","protected":false},"author":156,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/6520"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/156"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=6520"}],"version-history":[{"count":14,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/6520\/revisions"}],"predecessor-version":[{"id":9424,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/6520\/revisions\/9424"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=6520"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=6520"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=6520"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}