{"id":122,"date":"2018-01-19T14:47:15","date_gmt":"2018-01-19T14:47:15","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=122"},"modified":"2021-04-06T15:52:28","modified_gmt":"2021-04-06T19:52:28","slug":"defining-your-own-network-layer-revisited","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2018\/01\/19\/defining-your-own-network-layer-revisited\/","title":{"rendered":"Defining Your Own Network Layer (Revisited)"},"content":{"rendered":"<div class=\"content\"><p>Today I want to follow up on my previous post, <a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2018\/01\/05\/defining-your-own-network-layer\/\">Defining Your Own Network Layer<\/a>. There were two reader comments that caught my attention.<\/p><p>The first comment, from Eric Shields, points out a key conclusion from the <a href=\"https:\/\/arxiv.org\/pdf\/1511.07289.pdf\">Clevert, Unterthiner, and Hichreiter paper<\/a> that I overlooked. I initially focused just on the definition of the exponential linear unit function, but Eric pointed out that the authors concluded that batch normalization, which I used in my simple network, might not be needed when using an ELU layer.<\/p><p>Here's a reminder (from the previous post) about what the ELU curve looks like.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2017\/12\/defining_elu_layer_01.png\" alt=\"\"> <\/p><p>And here's the simple network that used last time.<\/p><pre class=\"codeinput\">layers = [ <span class=\"keyword\">...<\/span>\r\n    imageInputLayer([28 28 1])\r\n    convolution2dLayer(5,20)\r\n    batchNormalizationLayer\r\n    eluLayer(20)\r\n    fullyConnectedLayer(10)\r\n    softmaxLayer\r\n    classificationLayer];\r\n<\/pre><p>I used the sample digits training set.<\/p><pre class=\"codeinput\">[XTrain, YTrain] = digitTrain4DArrayData;\r\nimshow(XTrain(:,:,:,1010),<span class=\"string\">'InitialMagnification'<\/span>,<span class=\"string\">'fit'<\/span>)\r\nYTrain(1010)\r\n<\/pre><pre class=\"codeoutput\">\r\nans = \r\n\r\n  categorical\r\n\r\n     2 \r\n\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2018\/01\/defining_elu_layer_2_01.png\" alt=\"\"> <p>Now I'll train the network again, using the same options as last time.<\/p><pre class=\"codeinput\">options = trainingOptions(<span class=\"string\">'sgdm'<\/span>);\r\nnet = trainNetwork(XTrain,YTrain,layers,options);\r\n<\/pre><pre class=\"codeoutput\">Training on single GPU.\r\nInitializing image normalization.\r\n|=========================================================================================|\r\n|     Epoch    |   Iteration  | Time Elapsed |  Mini-batch  |  Mini-batch  | Base Learning|\r\n|              |              |  (seconds)   |     Loss     |   Accuracy   |     Rate     |\r\n|=========================================================================================|\r\n|            1 |            1 |         0.01 |       2.5899 |       10.16% |       0.0100 |\r\n|            2 |           50 |         0.61 |       0.4156 |       85.94% |       0.0100 |\r\n|            3 |          100 |         1.20 |       0.1340 |       96.88% |       0.0100 |\r\n|            4 |          150 |         1.80 |       0.0847 |       98.44% |       0.0100 |\r\n|            6 |          200 |         2.41 |       0.0454 |      100.00% |       0.0100 |\r\n|            7 |          250 |         3.01 |       0.0253 |      100.00% |       0.0100 |\r\n|            8 |          300 |         3.60 |       0.0219 |      100.00% |       0.0100 |\r\n|            9 |          350 |         4.19 |       0.0141 |      100.00% |       0.0100 |\r\n|           11 |          400 |         4.85 |       0.0128 |      100.00% |       0.0100 |\r\n|           12 |          450 |         5.46 |       0.0126 |      100.00% |       0.0100 |\r\n|           13 |          500 |         6.05 |       0.0099 |      100.00% |       0.0100 |\r\n|           15 |          550 |         6.66 |       0.0079 |      100.00% |       0.0100 |\r\n|           16 |          600 |         7.27 |       0.0084 |      100.00% |       0.0100 |\r\n|           17 |          650 |         7.86 |       0.0075 |      100.00% |       0.0100 |\r\n|           18 |          700 |         8.45 |       0.0081 |      100.00% |       0.0100 |\r\n|           20 |          750 |         9.05 |       0.0066 |      100.00% |       0.0100 |\r\n|           21 |          800 |         9.64 |       0.0057 |      100.00% |       0.0100 |\r\n|           22 |          850 |        10.24 |       0.0054 |      100.00% |       0.0100 |\r\n|           24 |          900 |        10.83 |       0.0049 |      100.00% |       0.0100 |\r\n|           25 |          950 |        11.44 |       0.0055 |      100.00% |       0.0100 |\r\n|           26 |         1000 |        12.04 |       0.0046 |      100.00% |       0.0100 |\r\n|           27 |         1050 |        12.66 |       0.0041 |      100.00% |       0.0100 |\r\n|           29 |         1100 |        13.25 |       0.0044 |      100.00% |       0.0100 |\r\n|           30 |         1150 |        13.84 |       0.0038 |      100.00% |       0.0100 |\r\n|           30 |         1170 |        14.08 |       0.0042 |      100.00% |       0.0100 |\r\n|=========================================================================================|\r\n<\/pre><p>Note that the training took about 14.1 seconds.<\/p><p>Check the accuracy of the trained network.<\/p><pre class=\"codeinput\">[XTest, YTest] = digitTest4DArrayData;\r\nYPred = classify(net, XTest);\r\naccuracy = sum(YTest==YPred)\/numel(YTest)\r\n<\/pre><pre class=\"codeoutput\">\r\naccuracy =\r\n\r\n    0.9878\r\n\r\n<\/pre><p>Now let's make another network without the batch normalization layer.<\/p><pre class=\"codeinput\">layers2 = [ <span class=\"keyword\">...<\/span>\r\n    imageInputLayer([28 28 1])\r\n    convolution2dLayer(5,20)\r\n    eluLayer(20)\r\n    fullyConnectedLayer(10)\r\n    softmaxLayer\r\n    classificationLayer];\r\n<\/pre><p>Train it up again.<\/p><pre class=\"codeinput\">net2 = trainNetwork(XTrain,YTrain,layers2,options);\r\n<\/pre><pre class=\"codeoutput\">Training on single GPU.\r\nInitializing image normalization.\r\n|=========================================================================================|\r\n|     Epoch    |   Iteration  | Time Elapsed |  Mini-batch  |  Mini-batch  | Base Learning|\r\n|              |              |  (seconds)   |     Loss     |   Accuracy   |     Rate     |\r\n|=========================================================================================|\r\n|            1 |            1 |         0.01 |       2.3022 |        7.81% |       0.0100 |\r\n|            2 |           50 |         0.52 |       1.6631 |       51.56% |       0.0100 |\r\n|            3 |          100 |         1.04 |       1.4368 |       52.34% |       0.0100 |\r\n|            4 |          150 |         1.58 |       1.0426 |       61.72% |       0.0100 |\r\n|            6 |          200 |         2.12 |       0.8223 |       72.66% |       0.0100 |\r\n|            7 |          250 |         2.67 |       0.6842 |       80.47% |       0.0100 |\r\n|            8 |          300 |         3.21 |       0.6461 |       78.13% |       0.0100 |\r\n|            9 |          350 |         3.79 |       0.4181 |       85.94% |       0.0100 |\r\n|           11 |          400 |         4.33 |       0.4163 |       86.72% |       0.0100 |\r\n|           12 |          450 |         4.88 |       0.2115 |       96.09% |       0.0100 |\r\n|           13 |          500 |         5.42 |       0.1817 |       97.66% |       0.0100 |\r\n|           15 |          550 |         5.96 |       0.1809 |       96.09% |       0.0100 |\r\n|           16 |          600 |         6.53 |       0.1001 |      100.00% |       0.0100 |\r\n|           17 |          650 |         7.07 |       0.0899 |      100.00% |       0.0100 |\r\n|           18 |          700 |         7.61 |       0.0934 |       99.22% |       0.0100 |\r\n|           20 |          750 |         8.14 |       0.0739 |       99.22% |       0.0100 |\r\n|           21 |          800 |         8.68 |       0.0617 |      100.00% |       0.0100 |\r\n|           22 |          850 |         9.22 |       0.0462 |      100.00% |       0.0100 |\r\n|           24 |          900 |         9.76 |       0.0641 |      100.00% |       0.0100 |\r\n|           25 |          950 |        10.29 |       0.0332 |      100.00% |       0.0100 |\r\n|           26 |         1000 |        10.86 |       0.0317 |      100.00% |       0.0100 |\r\n|           27 |         1050 |        11.41 |       0.0378 |       99.22% |       0.0100 |\r\n|           29 |         1100 |        11.96 |       0.0235 |      100.00% |       0.0100 |\r\n|           30 |         1150 |        12.51 |       0.0280 |      100.00% |       0.0100 |\r\n|           30 |         1170 |        12.73 |       0.0307 |      100.00% |       0.0100 |\r\n|=========================================================================================|\r\n<\/pre><p>That took about 12.7 seconds to train, about a 10% reduction. Check the accuracy.<\/p><pre class=\"codeinput\">[XTest, YTest] = digitTest4DArrayData;\r\nYPred = classify(net2, XTest);\r\naccuracy = sum(YTest==YPred)\/numel(YTest)\r\n<\/pre><pre class=\"codeoutput\">\r\naccuracy =\r\n\r\n    0.9808\r\n\r\n<\/pre><p>Eric said he got the same accuracy, whereas I am seeing a slightly lower accuracy. But I haven't really explored this further, and I so I wouldn't draw any conclusions. I just wanted to take the opportunity to go back and mention one of the important points of the paper that I overlooked last time.<\/p><p>A second reader, another Eric, wanted to know if alpha could be specified as a learnable or non learnable parameter at run time.<\/p><p>The answer: Yes, but not without defining a second class. Recall this portion of the template for defining a layer with learnable properties:<\/p><pre class=\"language-matlab\">\r\n    properties (Learnable)\r\n        <span class=\"comment\">% (Optional) Layer learnable parameters<\/span>\r\n\r\n        <span class=\"comment\">% Layer learnable parameters go here<\/span>\r\n    <span class=\"keyword\">end<\/span>\r\n\r\n<\/pre><p>That <tt>Learnable<\/tt> attribute of the properties block is a fixed part of the class definition. It can't be changed dynamically. So, you need to define a second class. I'll call mine <tt>eluLayerFixedAlpha<\/tt>. Here's the properties block:<\/p><pre class=\"language-matlab\">\r\n    properties\r\n        alpha\r\n    <span class=\"keyword\">end<\/span>\r\n\r\n<\/pre><p>And here's a constructor that takes <tt>alpha<\/tt> as an input argument.<\/p><pre class=\"language-matlab\">\r\n    methods\r\n        <span class=\"keyword\">function<\/span> layer = eluLayerFixedAlpha(alpha,name)\r\n            layer.Type = <span class=\"string\">'Exponential Linear Unit'<\/span>;\r\n            layer.alpha = alpha;\r\n            \r\n            <span class=\"comment\">% Assign layer name if it is passed in.<\/span>\r\n            <span class=\"keyword\">if<\/span> nargin &gt; 1\r\n                layer.Name = name;\r\n            <span class=\"keyword\">end<\/span>\r\n            \r\n            <span class=\"comment\">% Give the layer a meaningful description.<\/span>\r\n            layer.Description = <span class=\"string\">\"Exponential linear unit with alpha: \"<\/span> + <span class=\"keyword\">...<\/span>\r\n                alpha;\r\n        <span class=\"keyword\">end<\/span>\r\n\r\n<\/pre><p>I also modifed the <tt>backward<\/tt> method to remove the computation and output argument associated with the derivative of the loss function with respect to alpha.<\/p><pre class=\"language-matlab\">\r\n        <span class=\"keyword\">function<\/span> dLdX = backward(layer, X, Z, dLdZ, ~)\r\n            <span class=\"comment\">% Backward propagate the derivative of the loss function through <\/span>\r\n            <span class=\"comment\">% the layer<\/span>\r\n            <span class=\"comment\">%<\/span>\r\n            <span class=\"comment\">% Inputs:<\/span>\r\n            <span class=\"comment\">%         layer             - Layer to backward propagate through<\/span>\r\n            <span class=\"comment\">%         X                 - Input data<\/span>\r\n            <span class=\"comment\">%         Z                 - Output of layer forward function            <\/span>\r\n            <span class=\"comment\">%         dLdZ              - Gradient propagated from the deeper layer<\/span>\r\n            <span class=\"comment\">%         memory            - Memory value which can be used in<\/span>\r\n            <span class=\"comment\">%                             backward propagation [unused]<\/span>\r\n            <span class=\"comment\">% Output:<\/span>\r\n            <span class=\"comment\">%         dLdX              - Derivative of the loss with<\/span>\r\n            <span class=\"comment\">%                             respect to the input data<\/span>\r\n            \r\n            dLdX = dLdZ .* ((X &gt; 0) + <span class=\"keyword\">...<\/span>\r\n                ((layer.alpha + Z) .* (X &lt;= 0)));            \r\n        <span class=\"keyword\">end<\/span>\r\n\r\n<\/pre><p>Let's try it. I'm just going to make up a value for <tt>alpha<\/tt>.<\/p><pre class=\"codeinput\">alpha = 1.0;\r\nlayers3 = [ <span class=\"keyword\">...<\/span>\r\n    imageInputLayer([28 28 1])\r\n    convolution2dLayer(5,20)\r\n    eluLayerFixedAlpha(alpha)\r\n    fullyConnectedLayer(10)\r\n    softmaxLayer\r\n    classificationLayer];\r\nnet3 = trainNetwork(XTrain,YTrain,layers3,options);\r\nYPred = classify(net3, XTest);\r\naccuracy = sum(YTest==YPred)\/numel(YTest)\r\n<\/pre><pre class=\"codeoutput\">Training on single GPU.\r\nInitializing image normalization.\r\n|=========================================================================================|\r\n|     Epoch    |   Iteration  | Time Elapsed |  Mini-batch  |  Mini-batch  | Base Learning|\r\n|              |              |  (seconds)   |     Loss     |   Accuracy   |     Rate     |\r\n|=========================================================================================|\r\n|            1 |            1 |         0.01 |       2.3005 |       14.06% |       0.0100 |\r\n|            2 |           50 |         0.48 |       1.4979 |       53.13% |       0.0100 |\r\n|            3 |          100 |         0.97 |       1.2162 |       57.81% |       0.0100 |\r\n|            4 |          150 |         1.48 |       1.1427 |       67.97% |       0.0100 |\r\n|            6 |          200 |         1.99 |       0.9837 |       67.19% |       0.0100 |\r\n|            7 |          250 |         2.50 |       0.8110 |       70.31% |       0.0100 |\r\n|            8 |          300 |         3.04 |       0.7347 |       75.00% |       0.0100 |\r\n|            9 |          350 |         3.55 |       0.5937 |       81.25% |       0.0100 |\r\n|           11 |          400 |         4.05 |       0.5686 |       78.13% |       0.0100 |\r\n|           12 |          450 |         4.56 |       0.4678 |       85.94% |       0.0100 |\r\n|           13 |          500 |         5.06 |       0.3461 |       88.28% |       0.0100 |\r\n|           15 |          550 |         5.57 |       0.3515 |       87.50% |       0.0100 |\r\n|           16 |          600 |         6.07 |       0.2582 |       92.97% |       0.0100 |\r\n|           17 |          650 |         6.58 |       0.2216 |       92.97% |       0.0100 |\r\n|           18 |          700 |         7.08 |       0.1705 |       96.09% |       0.0100 |\r\n|           20 |          750 |         7.59 |       0.1212 |       98.44% |       0.0100 |\r\n|           21 |          800 |         8.09 |       0.0925 |       98.44% |       0.0100 |\r\n|           22 |          850 |         8.59 |       0.1045 |       97.66% |       0.0100 |\r\n|           24 |          900 |         9.10 |       0.1289 |       96.09% |       0.0100 |\r\n|           25 |          950 |         9.60 |       0.0710 |       99.22% |       0.0100 |\r\n|           26 |         1000 |        10.10 |       0.0722 |       99.22% |       0.0100 |\r\n|           27 |         1050 |        10.60 |       0.0600 |       99.22% |       0.0100 |\r\n|           29 |         1100 |        11.10 |       0.0688 |       99.22% |       0.0100 |\r\n|           30 |         1150 |        11.61 |       0.0519 |      100.00% |       0.0100 |\r\n|           30 |         1170 |        11.82 |       0.0649 |       99.22% |       0.0100 |\r\n|=========================================================================================|\r\n\r\naccuracy =\r\n\r\n    0.9702\r\n\r\n<\/pre><p>Thanks for your comments and questions, Eric and Eric.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_62b4dcb460a7453791620f21f3c10037() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='62b4dcb460a7453791620f21f3c10037 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 62b4dcb460a7453791620f21f3c10037';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2018 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_62b4dcb460a7453791620f21f3c10037()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2017b<br><\/p><\/div><!--\r\n62b4dcb460a7453791620f21f3c10037 ##### SOURCE BEGIN #####\r\n%% Defining Your Own Network Layer - Revisited\r\n% Today I want to follow up on my previous post, \r\n% <https:\/\/blogs.mathworks.com\/deep-learning\/2018\/01\/05\/defining-your-own-network-layer\/ \r\n% Defining Your Own Network Layer>. There were two reader comments that\r\n% caught my attention.\r\n% \r\n% The first comment, from Eric Shields, points out a key conclusion from\r\n% the <https:\/\/arxiv.org\/pdf\/1511.07289.pdf \r\n% Clevert, Unterthiner, and Hichreiter paper> that I overlooked. I\r\n% initially focused just on the definition of the exponential linear unit\r\n% function, but Eric pointed out that the authors concluded that batch\r\n% normalization, which I used in my simple network, might not be needed\r\n% when using an ELU layer.\r\n%\r\n% Here's a reminder (from the previous post) about what the ELU curve looks\r\n% like.\r\n%\r\n% <<https:\/\/blogs.mathworks.com\/deep-learning\/files\/2017\/12\/defining_elu_layer_01.png>>\r\n%\r\n% And here's the simple network that used last time.\r\n\r\nlayers = [ ...\r\n    imageInputLayer([28 28 1])\r\n    convolution2dLayer(5,20)\r\n    batchNormalizationLayer\r\n    eluLayer(20)\r\n    fullyConnectedLayer(10)\r\n    softmaxLayer\r\n    classificationLayer];\r\n\r\n%%\r\n% I used the sample digits training set.\r\n\r\n[XTrain, YTrain] = digitTrain4DArrayData;\r\nimshow(XTrain(:,:,:,1010),'InitialMagnification','fit')\r\nYTrain(1010)\r\n\r\n%%\r\n% Now I'll train the network again, using the same options as last time.\r\n\r\noptions = trainingOptions('sgdm');\r\nnet = trainNetwork(XTrain,YTrain,layers,options);\r\n\r\n%%\r\n% Note that the training took about 14.3 seconds.\r\n%\r\n% Check the accuracy of the trained network.\r\n[XTest, YTest] = digitTest4DArrayData;\r\nYPred = classify(net, XTest);\r\naccuracy = sum(YTest==YPred)\/numel(YTest)\r\n\r\n%%\r\n% Now let's make another network without the batch normalization layer.\r\n\r\nlayers2 = [ ...\r\n    imageInputLayer([28 28 1])\r\n    convolution2dLayer(5,20)\r\n    eluLayer(20)\r\n    fullyConnectedLayer(10)\r\n    softmaxLayer\r\n    classificationLayer];\r\n\r\n%%\r\n% Train it up again.\r\nnet2 = trainNetwork(XTrain,YTrain,layers2,options);\r\n\r\n%%\r\n% That took about 12.7 seconds to train, about an 11% reduction. Check the\r\n% accuracy.\r\n\r\n[XTest, YTest] = digitTest4DArrayData;\r\nYPred = classify(net2, XTest);\r\naccuracy = sum(YTest==YPred)\/numel(YTest)\r\n\r\n%%\r\n% Eric said he got the same accuracy, whereas I am seeing a slightly lower\r\n% accuracy. But I haven't really explored this further, and I so I wouldn't\r\n% draw any conclusions. I just wanted to take the opportunity to go back\r\n% and mention one of the important points of the paper that I overlooked\r\n% last time.\r\n%\r\n% A second reader, another Eric, wanted to know if alpha could be specified as\r\n% a learnable or non learnable parameter at run time.\r\n%\r\n% The answer: Yes, but not without defining a second class. Recall this\r\n% portion of the template for defining a layer with learnable properties:\r\n%\r\n% <include>learnable_parameter_template_learnable_properties_block.m<\/include>\r\n%\r\n% That |Learnable| attribute of the properties block is a fixed part of the\r\n% class definition. It can't be changed dynamically. So, you need to\r\n% define a second class. I'll call mine |eluLayerFixedAlpha|. Here's the\r\n% properties block:\r\n%\r\n% <include>fixed_alpha_property_block.m<\/include>\r\n%\r\n% And here's a constructor that takes |alpha| as an input argument.\r\n%\r\n% <include>fixed_alpha_constructor.m<\/include>\r\n%\r\n% I also modifed the |backward| method to remove the computation and output\r\n% argument associated with the derivative of the loss function with respect\r\n% to alpha.\r\n%\r\n% <include>modified_backward.m<\/include>\r\n%\r\n% Let's try it. I'm just going to make up a value for |alpha|.\r\n\r\nalpha = 1.0;\r\nlayers3 = [ ...\r\n    imageInputLayer([28 28 1])\r\n    convolution2dLayer(5,20)\r\n    eluLayerFixedAlpha(alpha)\r\n    fullyConnectedLayer(10)\r\n    softmaxLayer\r\n    classificationLayer];\r\nnet3 = trainNetwork(XTrain,YTrain,layers3,options);\r\nYPred = classify(net3, XTest);\r\naccuracy = sum(YTest==YPred)\/numel(YTest)\r\n\r\n%%\r\n% Thanks for your comments and questions, Eric and Eric.\r\n##### SOURCE END ##### 62b4dcb460a7453791620f21f3c10037\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2018\/01\/defining_elu_layer_2_01.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><p>Today I want to follow up on my previous post, Defining Your Own Network Layer. There were two reader comments that caught my attention.The first comment, from Eric Shields, points out a key... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2018\/01\/19\/defining-your-own-network-layer-revisited\/\">read more >><\/a><\/p>","protected":false},"author":42,"featured_media":126,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/122"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/42"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=122"}],"version-history":[{"count":4,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/122\/revisions"}],"predecessor-version":[{"id":132,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/122\/revisions\/132"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media\/126"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=122"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=122"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}