{"id":3593,"date":"2018-08-06T12:00:22","date_gmt":"2018-08-06T17:00:22","guid":{"rendered":"https:\/\/blogs.mathworks.com\/cleve\/?p=3593"},"modified":"2018-08-02T19:57:39","modified_gmt":"2018-08-03T00:57:39","slug":"teaching-calculus-to-a-deep-learner","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/cleve\/2018\/08\/06\/teaching-calculus-to-a-deep-learner\/","title":{"rendered":"Teaching Calculus to a Deep Learner"},"content":{"rendered":"\r\n\r\n<div class=\"content\"><!--introduction--><p>MIT's Professor Gil Strang gave two talks in one morning recently at the SIAM annual meeting.  Both talks derived from his experience teaching a new course at MIT on linear algebra and neural nets. His first talk, \"The Structure of a Deep Neural Net\", was in a minisymposium titled \"Deep Learning and Deep Teaching\", which he organized.  Another talk in that minisymposium was by Drexel's Professor Pavel Grinfeld on \"An Informal Approach to Teaching Calculus.\" An hour later, Gil's gave his second talk, \"Teaching About Learning.\" It was an invited talk at the SIAM Conference on Applied Mathematics Education.<\/p><p>Inspired by Pavel's talk about teaching calculus, Gil began his second talk with some spontaneous remarks.  \"Can we teach a deep learner calculus?\" he wondered.  The system might be trained with samples of functions and their derivatives and then be asked to find derivatives of other functions that were not the training set.<\/p><p>Immediately after Gil's second talk, I asked the other MathWorkers attending the meeting if we could take up Gil's challenge.  Within a few hours, Mary Fenelon, Christine Tobler and Razvan Carbunescu had the essential portions of the following demonstration of the Neural Networks Toolbox&reg; working at the MathWorks booth.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#596189be-232a-480e-b131-4a8987155227\">Functions<\/a><\/li><li><a href=\"#6b2f76e3-5f60-4862-8abe-75d482d5515f\">Definitions<\/a><\/li><li><a href=\"#03642fbc-0ca2-43a7-868e-b42005c05a92\">Generate training set<\/a><\/li><li><a href=\"#e1b70097-8f67-4c18-9509-1e882127b57b\">Representive curves<\/a><\/li><li><a href=\"#c658e50a-72f3-41cb-853e-1648699a8aed\">Neural network layers<\/a><\/li><li><a href=\"#d14a40b2-5123-4b86-87c8-4585c28c6deb\">Neural network options<\/a><\/li><li><a href=\"#d70d2595-2044-492a-ac1d-42e1bbbb4fa6\">Train network<\/a><\/li><li><a href=\"#78e2e354-ece8-4aef-8546-33d90e407a99\">Generate test set<\/a><\/li><li><a href=\"#10031d4b-e7ec-4cef-9c0d-7fd608700738\">Classify<\/a><\/li><li><a href=\"#29190998-bb7f-4725-915c-edb50eccc302\">Plot results<\/a><\/li><li><a href=\"#75c1a1bc-3204-4951-bf3a-9c7480633caf\">Typical mismatches<\/a><\/li><li><a href=\"#731544b2-4b52-485b-9830-ebf970a6f812\">Thanks<\/a><\/li><\/ul><\/div><h4>Functions<a name=\"596189be-232a-480e-b131-4a8987155227\"><\/a><\/h4><p>Our first task is less ambitious than differentiation. It is simply to recognize the shapes of functions. By a function we mean the  MATLAB vector obtained by sampling a familiar elementary function at a finite set of ordered random points drawn uniformly from the interval $[-1, 1]$.  Derivatives, which we have not done yet, would be divided differences.  We use six functions, $x$, $x^2$, $x^3$, $x^4$, $\\sin{\\pi x}$, and $\\cos{\\pi x}$. We attach a random sign and add white noise to the samples.<\/p><pre class=\"codeinput\">   F = {@(x) x, @(x) x.^2, @(x) x.^3, @(x) x.^4, <span class=\"keyword\">...<\/span>\r\n        @(x) sin(pi*x), @(x) cos(pi*x)};\r\n   labels = [<span class=\"string\">\"x\"<\/span>, <span class=\"string\">\"x^2\"<\/span>, <span class=\"string\">\"x^3\"<\/span>, <span class=\"string\">\"x^4\"<\/span>, <span class=\"string\">\"sin(pi*x)\"<\/span>, <span class=\"string\">\"cos(pi*x)\"<\/span>];\r\n<\/pre><h4>Definitions<a name=\"6b2f76e3-5f60-4862-8abe-75d482d5515f\"><\/a><\/h4><p>Here are a few definitions and parameters.<\/p><p>Set random number generator state.<\/p><pre class=\"codeinput\">   rng(2)\r\n<\/pre><p>Generate uniform random variable on [-1, 1].<\/p><pre class=\"codeinput\">   randu = @(m,n) (2*rand(m,n)-1);\r\n<\/pre><p>Generate random +1 or -1.<\/p><pre class=\"codeinput\">   randsign = @() sign(randu(1,1));\r\n<\/pre><p>Number of functions.<\/p><pre class=\"codeinput\">   m = length(F);\r\n<\/pre><p>Number of repetitions.<\/p><pre class=\"codeinput\">   n = 1000;\r\n<\/pre><p>Number of samples in the interval.<\/p><pre class=\"codeinput\">   nx = 100;\r\n<\/pre><p>Noise level.<\/p><pre class=\"codeinput\">   noise = .0001;\r\n<\/pre><h4>Generate training set<a name=\"03642fbc-0ca2-43a7-868e-b42005c05a92\"><\/a><\/h4><pre class=\"codeinput\">C = cell(m,n);\r\n<span class=\"keyword\">for<\/span> j = 1:n\r\n    x = sort(randu(nx,1));\r\n    <span class=\"keyword\">for<\/span> i = 1:m\r\n        C{i,j} = randsign()*F{i}(x) + noise*randn(nx,1);\r\n    <span class=\"keyword\">end<\/span>\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><h4>Representive curves<a name=\"e1b70097-8f67-4c18-9509-1e882127b57b\"><\/a><\/h4><p>Let's plot instance one of each function.  (With this initialization of <tt>rng<\/tt>, the $x^2$ and $sin(\\pi x)$ curves have negative signs.)<\/p><pre class=\"codeinput\">    set(gcf,<span class=\"string\">'position'<\/span>,[300 300 300 300])\r\n    <span class=\"keyword\">for<\/span> i = 1:m\r\n        plot(x,C{i,1},<span class=\"string\">'.'<\/span>)\r\n        axis([-1 1 -1 1])\r\n        title(labels(i))\r\n        snapnow\r\n    <span class=\"keyword\">end<\/span>\r\n    close\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/Calculus_blog_01.png\" alt=\"\"> <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/Calculus_blog_02.png\" alt=\"\"> <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/Calculus_blog_03.png\" alt=\"\"> <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/Calculus_blog_04.png\" alt=\"\"> <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/Calculus_blog_05.png\" alt=\"\"> <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/Calculus_blog_06.png\" alt=\"\"> <h4>Neural network layers<a name=\"c658e50a-72f3-41cb-853e-1648699a8aed\"><\/a><\/h4><p>Our deep learning network is one that has proved to be successful in signal processing, text, and other applications with sequential data.  There are six layers. The nonlinear activation layer, <tt>relu<\/tt>, for REctified Linear Unit, is essential.  <tt>ReLU(x)<\/tt> is simply <tt>max(0,x)<\/tt>. LSTM stands for Long Short-Term Memory. Softmax is a generalization of the logistic function used to compute probabilities.<\/p><pre class=\"codeinput\">    inputSize = nx;\r\n    numClasses = m;\r\n    numHiddenUnits = 100;\r\n    layers = [ <span class=\"keyword\">...<\/span>\r\n        sequenceInputLayer(inputSize)\r\n        reluLayer\r\n        lstmLayer(numHiddenUnits,<span class=\"string\">'OutputMode'<\/span>,<span class=\"string\">'last'<\/span>)\r\n        fullyConnectedLayer(numClasses)\r\n        softmaxLayer\r\n        classificationLayer];\r\n<\/pre><h4>Neural network options<a name=\"d14a40b2-5123-4b86-87c8-4585c28c6deb\"><\/a><\/h4><p>The first option, <tt>'adam'<\/tt>, is the stochastic optimization algorithm, adaptive moment estimation. An epoch is one forward pass and one backward pass over all of the training vectors, updating the weights.  In our experience with this network, six passes is enough.<\/p><pre class=\"codeinput\">    maxEpochs = 6;\r\n    miniBatchSize = 27;\r\n    options = trainingOptions(<span class=\"string\">'adam'<\/span>, <span class=\"keyword\">...<\/span>\r\n        <span class=\"string\">'ExecutionEnvironment'<\/span>,<span class=\"string\">'cpu'<\/span>, <span class=\"keyword\">...<\/span>\r\n        <span class=\"string\">'MaxEpochs'<\/span>,maxEpochs, <span class=\"keyword\">...<\/span>\r\n        <span class=\"string\">'MiniBatchSize'<\/span>,miniBatchSize, <span class=\"keyword\">...<\/span>\r\n        <span class=\"string\">'GradientThreshold'<\/span>,1, <span class=\"keyword\">...<\/span>\r\n        <span class=\"string\">'Verbose'<\/span>,0, <span class=\"keyword\">...<\/span>\r\n        <span class=\"string\">'Plots'<\/span>,<span class=\"string\">'training-progress'<\/span>);\r\n<\/pre><h4>Train network<a name=\"d70d2595-2044-492a-ac1d-42e1bbbb4fa6\"><\/a><\/h4><p>With our setting of the <tt>Plots<\/tt> option, <tt>trainNetwork<\/tt> opens a custom figure window that dynamically shows the progress of the optimation.<\/p><p>It takes 15 or 20 seconds to train this network on my laptop. Big time neural nets with more layers and more epochs can make use of GPUs and pools of parallel workers<\/p><pre class=\"codeinput\">    C = reshape(C',1,[]);\r\n    Y = repelem(categorical(labels'), n);\r\n    net = trainNetwork(C,Y,layers,options);\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/Calculus_blog_07.png\" alt=\"\"> <h4>Generate test set<a name=\"78e2e354-ece8-4aef-8546-33d90e407a99\"><\/a><\/h4><p>Generate more functions to form a test set.<\/p><pre class=\"codeinput\">    nt = 100;\r\n    Ctest = cell(m,nt);\r\n    <span class=\"keyword\">for<\/span> j = 1:nt\r\n        x = sort(randu(nx,1));\r\n        <span class=\"keyword\">for<\/span> i = 1:m\r\n            Ctest{i,j} = randsign()*F{i}(x) + noise*randn(nx,1);\r\n        <span class=\"keyword\">end<\/span>\r\n    <span class=\"keyword\">end<\/span>\r\n<\/pre><h4>Classify<a name=\"10031d4b-e7ec-4cef-9c0d-7fd608700738\"><\/a><\/h4><p>Classify the functions in the test set.<\/p><pre class=\"codeinput\">    miniBatchSize = 27;\r\n    Ctest = reshape(Ctest',1,[]);\r\n    Ytest = repelem(categorical(labels'), nt);\r\n    Ypred = classify(net,Ctest,<span class=\"string\">'MiniBatchSize'<\/span>,miniBatchSize);\r\n<\/pre><h4>Plot results<a name=\"29190998-bb7f-4725-915c-edb50eccc302\"><\/a><\/h4><p>Here are the results.  We see scores above 95 percent, except for learning to distinguish between plots of $x^2$ and $x^4$. That's understandable.<\/p><pre class=\"codeinput\">    T = table(Ypred, Ytest);\r\n    heatmap(T, <span class=\"string\">'Ypred'<\/span>, <span class=\"string\">'Ytest'<\/span>);\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/Calculus_blog_08.png\" alt=\"\"> <h4>Typical mismatches<a name=\"75c1a1bc-3204-4951-bf3a-9c7480633caf\"><\/a><\/h4><p>Here are typical failed tests.  It's not hard to see why the network is having trouble with these.<\/p><pre class=\"codeinput\">    set(gcf,<span class=\"string\">'position'<\/span>,[300 300 300 300])\r\n    <span class=\"keyword\">for<\/span> j = 1:m\r\n        <span class=\"keyword\">for<\/span> i = [1:j-1 j+1:m]\r\n            mismatch = find(Ytest == labels(i) &amp; Ypred == labels(j));\r\n            <span class=\"keyword\">if<\/span> ~isempty(mismatch)\r\n                <span class=\"comment\">% Plot one of each type of mismatch<\/span>\r\n                <span class=\"keyword\">for<\/span> k = 1\r\n                    plot(linspace(-1, 1), Ctest{mismatch(k)},<span class=\"string\">'.'<\/span>)\r\n                    title(labels(i)+<span class=\"string\">\" or \"<\/span>+labels(j))\r\n                    snapnow\r\n                <span class=\"keyword\">end<\/span>\r\n            <span class=\"keyword\">end<\/span>\r\n        <span class=\"keyword\">end<\/span>\r\n    <span class=\"keyword\">end<\/span>\r\n    close\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/Calculus_blog_09.png\" alt=\"\"> <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/Calculus_blog_10.png\" alt=\"\"> <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/Calculus_blog_11.png\" alt=\"\"> <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/Calculus_blog_12.png\" alt=\"\"> <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"http:\/\/blogs.mathworks.com\/cleve\/files\/Calculus_blog_13.png\" alt=\"\"> <h4>Thanks<a name=\"731544b2-4b52-485b-9830-ebf970a6f812\"><\/a><\/h4><p>Thanks to my colleagues attending the SIAM meeting -- Mary, Christine and Razvan.  And thanks to Gil for his spontaneous idea.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_3de5698b950b4a27ace421f8009f3461() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='3de5698b950b4a27ace421f8009f3461 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 3de5698b950b4a27ace421f8009f3461';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2018 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_3de5698b950b4a27ace421f8009f3461()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2018a<br><\/p><\/div><!--\r\n3de5698b950b4a27ace421f8009f3461 ##### SOURCE BEGIN #####\r\n%% Teaching Calculus to a Deep Learner\r\n% MIT's Professor Gil Strang gave two talks in one morning recently\r\n% at the SIAM annual meeting.  Both talks derived from his experience\r\n% teaching a new course at MIT on linear algebra and neural nets.\r\n% His first talk, \"The Structure of a Deep Neural Net\", was in a\r\n% minisymposium titled \"Deep Learning and Deep Teaching\", which he\r\n% organized.  Another talk in that minisymposium was by Drexel's Professor\r\n% Pavel Grinfeld on \"An Informal Approach to Teaching Calculus.\"\r\n% An hour later, Gil's gave his second talk, \"Teaching About Learning.\"\r\n% It was an invited talk at the SIAM Conference on Applied Mathematics\r\n% Education.\r\n%\r\n% Inspired by Pavel's talk about teaching calculus, Gil began his second\r\n% talk with some spontaneous remarks.  \"Can we teach a deep learner\r\n% calculus?\" he wondered.  The system might be trained with samples\r\n% of functions and their derivatives and then be asked to find\r\n% derivatives of other functions that were not the training set.\r\n%\r\n% Immediately after Gil's second talk, I asked the other MathWorkers\r\n% attending the meeting if we could take up Gil's challenge.  Within a few\r\n% hours, Mary Fenelon, Christine Tobler and Razvan Carbunescu had the\r\n% essential portions of the following demonstration of the Neural\r\n% Networks Toolbox(R) working at the MathWorks booth.\r\n\r\n%% Functions\r\n% Our first task is less ambitious than differentiation.\r\n% It is simply to recognize the shapes of functions.\r\n% By a function we mean the  MATLAB vector obtained by sampling\r\n% a familiar elementary function at a finite set of ordered random points\r\n% drawn uniformly from the interval $[-1, 1]$.  Derivatives, which we\r\n% have not done yet, would be divided differences.  We use six functions, \r\n% $x$, $x^2$, $x^3$, $x^4$, $\\sin{\\pi x}$, and $\\cos{\\pi x}$.\r\n% We attach a random sign and add white noise to the samples.\r\n\r\n   F = {@(x) x, @(x) x.^2, @(x) x.^3, @(x) x.^4, ...\r\n        @(x) sin(pi*x), @(x) cos(pi*x)};\r\n   labels = [\"x\", \"x^2\", \"x^3\", \"x^4\", \"sin(pi*x)\", \"cos(pi*x)\"];\r\n\r\n%% Definitions\r\n% Here are a few definitions and parameters.\r\n%\r\n% Set random number generator state.\r\n\r\n   rng(2)\r\n\r\n%%\r\n% Generate uniform random variable on [-1, 1].\r\n\r\n   randu = @(m,n) (2*rand(m,n)-1);\r\n\r\n%%\r\n% Generate random +1 or -1.\r\n\r\n   randsign = @() sign(randu(1,1));\r\n\r\n%%\r\n% Number of functions.\r\n\r\n   m = length(F);\r\n\r\n%%\r\n% Number of repetitions.\r\n\r\n   n = 1000;\r\n\r\n%%\r\n% Number of samples in the interval.\r\n\r\n   nx = 100;\r\n\r\n%%\r\n% Noise level.\r\n\r\n   noise = .0001;\r\n\r\n%% Generate training set\r\n\r\nC = cell(m,n);\r\nfor j = 1:n\r\n    x = sort(randu(nx,1));\r\n    for i = 1:m\r\n        C{i,j} = randsign()*F{i}(x) + noise*randn(nx,1);\r\n    end\r\nend\r\n\r\n%% Representive curves\r\n% Let's plot instance one of each function.  (With this initialization\r\n% of |rng|, the $x^2$ and $sin(\\pi x)$ curves have negative signs.)\r\n\r\n    set(gcf,'position',[300 300 300 300])  \r\n    for i = 1:m\r\n        plot(x,C{i,1},'.')\r\n        axis([-1 1 -1 1])\r\n        title(labels(i))\r\n        snapnow\r\n    end\r\n    close\r\n\r\n%% Neural network layers\r\n% Our deep learning network is one that has proved to be\r\n% successful in signal processing, text, and other applications with\r\n% sequential data.  There are six layers.\r\n% The nonlinear activation layer, |relu|, for REctified Linear\r\n% Unit, is essential.  |ReLU(x)| is simply |max(0,x)|.\r\n% LSTM stands for Long Short-Term Memory.\r\n% Softmax is a generalization of the logistic function used to\r\n% compute probabilities.\r\n\r\n\r\n\r\n    inputSize = nx;\r\n    numClasses = m;\r\n    numHiddenUnits = 100;\r\n    layers = [ ...\r\n        sequenceInputLayer(inputSize)\r\n        reluLayer\r\n        lstmLayer(numHiddenUnits,'OutputMode','last')\r\n        fullyConnectedLayer(numClasses)\r\n        softmaxLayer\r\n        classificationLayer];\r\n\r\n%% Neural network options\r\n% The first option, |'adam'|, is the stochastic optimization algorithm,\r\n% adaptive moment estimation.\r\n% An epoch is one forward pass and one backward pass over all of the\r\n% training vectors, updating the weights.  In our experience with this\r\n% network, six passes is enough.\r\n\r\n    maxEpochs = 6;\r\n    miniBatchSize = 27;\r\n    options = trainingOptions('adam', ...\r\n        'ExecutionEnvironment','cpu', ...\r\n        'MaxEpochs',maxEpochs, ...\r\n        'MiniBatchSize',miniBatchSize, ...\r\n        'GradientThreshold',1, ...\r\n        'Verbose',0, ...\r\n        'Plots','training-progress');\r\n\r\n%% Train network\r\n% With our setting of the |Plots| option, |trainNetwork| opens a\r\n% custom figure window that dynamically shows the progress of the\r\n% optimation.\r\n%\r\n% It takes 15 or 20 seconds to train this network on my laptop.\r\n% Big time neural nets with more layers and more epochs can make use\r\n% of GPUs and pools of parallel workers\r\n\r\n    C = reshape(C',1,[]);\r\n    Y = repelem(categorical(labels'), n);\r\n    net = trainNetwork(C,Y,layers,options);\r\n\r\n%% Generate test set\r\n% Generate more functions to form a test set.\r\n\r\n    nt = 100;\r\n    Ctest = cell(m,nt);\r\n    for j = 1:nt\r\n        x = sort(randu(nx,1));\r\n        for i = 1:m\r\n            Ctest{i,j} = randsign()*F{i}(x) + noise*randn(nx,1);\r\n        end\r\n    end\r\n\r\n%% Classify\r\n% Classify the functions in the test set.\r\n\r\n    miniBatchSize = 27;\r\n    Ctest = reshape(Ctest',1,[]);\r\n    Ytest = repelem(categorical(labels'), nt);\r\n    Ypred = classify(net,Ctest,'MiniBatchSize',miniBatchSize);\r\n\r\n%% Plot results\r\n% Here are the results.  We see scores above 95 percent, except\r\n% for learning to distinguish between plots of $x^2$ and $x^4$.\r\n% That's understandable.\r\n\r\n    T = table(Ypred, Ytest);\r\n    heatmap(T, 'Ypred', 'Ytest');\r\n\r\n%% Typical mismatches\r\n% Here are typical failed tests.  It's not hard to see why the\r\n% network is having trouble with these.\r\n\r\n    set(gcf,'position',[300 300 300 300])\r\n    for j = 1:m\r\n        for i = [1:j-1 j+1:m]\r\n            mismatch = find(Ytest == labels(i) & Ypred == labels(j));\r\n            if ~isempty(mismatch)\r\n                % Plot one of each type of mismatch\r\n                for k = 1\r\n                    plot(linspace(-1, 1), Ctest{mismatch(k)},'.')\r\n                    title(labels(i)+\" or \"+labels(j))\r\n                    snapnow\r\n                end\r\n            end\r\n        end\r\n    end\r\n    close\r\n    \r\n%% Thanks\r\n% Thanks to my colleagues attending the SIAM meeting REPLACE_WITH_DASH_DASH\r\n% Mary, Christine and Razvan.  And thanks to Gil for his\r\n% spontaneous idea.\r\n##### SOURCE END ##### 3de5698b950b4a27ace421f8009f3461\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/cleve\/files\/heatmap_small.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><!--introduction--><p>MIT's Professor Gil Strang gave two talks in one morning recently at the SIAM annual meeting.  Both talks derived from his experience teaching a new course at MIT on linear algebra and neural nets. His first talk, \"The Structure of a Deep Neural Net\", was in a minisymposium titled \"Deep Learning and Deep Teaching\", which he organized.  Another talk in that minisymposium was by Drexel's Professor Pavel Grinfeld on \"An Informal Approach to Teaching Calculus.\" An hour later, Gil's gave his second talk, \"Teaching About Learning.\" It was an invited talk at the SIAM Conference on Applied Mathematics Education.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/cleve\/2018\/08\/06\/teaching-calculus-to-a-deep-learner\/\">read more >><\/a><\/p>","protected":false},"author":78,"featured_media":3641,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[12,8],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/3593"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/users\/78"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/comments?post=3593"}],"version-history":[{"count":3,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/3593\/revisions"}],"predecessor-version":[{"id":3643,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/3593\/revisions\/3643"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/media\/3641"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/media?parent=3593"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/categories?post=3593"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/tags?post=3593"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}