{"id":88,"date":"2018-01-05T07:00:54","date_gmt":"2018-01-05T07:00:54","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=88"},"modified":"2021-04-06T15:52:32","modified_gmt":"2021-04-06T19:52:32","slug":"defining-your-own-network-layer","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2018\/01\/05\/defining-your-own-network-layer\/","title":{"rendered":"Defining Your Own Network Layer"},"content":{"rendered":"<div class=\"content\"><!--introduction-->\r\n<p><i>Note: Post updated 27-Sep-2018 to correct a typo in the implementation of the backward function.<\/i><\/p>\r\n<p>One of the new Neural Network Toolbox features of R2017b is the ability to define your own network layer. Today I'll show you how to make an <i>exponential linear unit<\/i> (ELU) layer.<\/p><p><a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/692126-joe-hicklin\">Joe<\/a> helped me with today's post. Joe is one of the few developers who have been around MathWorks longer than I have. In fact, he's one of the people who interviewed me when I applied for a job here. I've had the pleasure of working closely with Joe for the past several years on many aspects of MATLAB design. He really loves tinkering with deep learning networks.<\/p><p>Joe came across the paper <a href=\"https:\/\/arxiv.org\/pdf\/1511.07289.pdf\">\"Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs),\"<\/a> by Clevert, Unterthiner, and Hichreiter, and he wanted to make an ELU layer using R2017b.<\/p><p>$f(x) = \\left\\{\\begin{array}{ll}    x &amp; x &gt; 0\\\\    \\alpha(e^x - 1) &amp; x \\leq 0 \\end{array} \\right.$<\/p><!--\/introduction--><p>Let's compare the ELU shape with a couple of other commonly used activation functions.<\/p><pre class=\"codeinput\">alpha1 = 1;\r\nelu_fcn = @(x) x.*(x &gt; 0) + alpha1*(exp(x) - 1).*(x &lt;= 0);\r\n\r\nalpha2 = 0.1;\r\nleaky_relu_fcn = @(x) alpha2*x.*(x &lt;= 0) + x.*(x &gt; 0);\r\n\r\nrelu_fcn = @(x) x.*(x &gt; 0);\r\n\r\nfplot(elu_fcn,[-10 3],<span class=\"string\">'LineWidth'<\/span>,2)\r\nhold <span class=\"string\">on<\/span>\r\nfplot(leaky_relu_fcn,[-10 3],<span class=\"string\">'LineWidth'<\/span>,2)\r\nfplot(relu_fcn,[-10 3],<span class=\"string\">'LineWidth'<\/span>,2)\r\nhold <span class=\"string\">off<\/span>\r\nax = gca;\r\nax.XAxisLocation = <span class=\"string\">'origin'<\/span>;\r\nax.YAxisLocation = <span class=\"string\">'origin'<\/span>;\r\nbox <span class=\"string\">off<\/span>\r\nlegend({<span class=\"string\">'ELU'<\/span>,<span class=\"string\">'Leaky ReLU'<\/span>,<span class=\"string\">'ReLU'<\/span>},<span class=\"string\">'Location'<\/span>,<span class=\"string\">'northwest'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2017\/12\/defining_elu_layer_01.png\" alt=\"\"> <p>Joe wanted to make a ELU layer with one learned alpha value per channel. He followed the procedure outlined in <a href=\"https:\/\/www.mathworks.com\/help\/nnet\/ug\/define-layer-with-learnable-parameters.html\">Define a Layer with Learnable Parameters<\/a> to make an ELU layer that works with the Neural Network Toolbox.<\/p><p>Below is the template for a layer with learnable parameters. We'll explore how to fill in this template to make an ELU layer.<\/p><pre class=\"language-matlab\">\r\n<span class=\"keyword\">classdef<\/span> myLayer &lt; nnet.layer.Layer\r\n\r\n    <span class=\"keyword\">properties<\/span>\r\n        <span class=\"comment\">% (Optional) Layer properties<\/span>\r\n\r\n        <span class=\"comment\">% Layer properties go here<\/span>\r\n    <span class=\"keyword\">end<\/span>\r\n\r\n    <span class=\"keyword\">properties<\/span> (Learnable)\r\n        <span class=\"comment\">% (Optional) Layer learnable parameters<\/span>\r\n\r\n        <span class=\"comment\">% Layer learnable parameters go here<\/span>\r\n    <span class=\"keyword\">end<\/span>\r\n    \r\n    <span class=\"keyword\">methods<\/span>\r\n        <span class=\"keyword\">function<\/span> layer = myLayer()\r\n            <span class=\"comment\">% (Optional) Create a myLayer<\/span>\r\n            <span class=\"comment\">% This function must have the same name as the layer<\/span>\r\n\r\n            <span class=\"comment\">% Layer constructor function goes here<\/span>\r\n        <span class=\"keyword\">end<\/span>\r\n        \r\n        <span class=\"keyword\">function<\/span> Z = predict(layer, X)\r\n            <span class=\"comment\">% Forward input data through the layer at prediction time and<\/span>\r\n            <span class=\"comment\">% output the result<\/span>\r\n            <span class=\"comment\">%<\/span>\r\n            <span class=\"comment\">% Inputs:<\/span>\r\n            <span class=\"comment\">%         layer    -    Layer to forward propagate through<\/span>\r\n            <span class=\"comment\">%         X        -    Input data<\/span>\r\n            <span class=\"comment\">% Output:<\/span>\r\n            <span class=\"comment\">%         Z        -    Output of layer forward function<\/span>\r\n            \r\n            <span class=\"comment\">% Layer forward function for prediction goes here<\/span>\r\n        <span class=\"keyword\">end<\/span>\r\n\r\n        <span class=\"keyword\">function<\/span> [Z, memory] = forward(layer, X)\r\n            <span class=\"comment\">% (Optional) Forward input data through the layer at training<\/span>\r\n            <span class=\"comment\">% time and output the result and a memory value<\/span>\r\n            <span class=\"comment\">%<\/span>\r\n            <span class=\"comment\">% Inputs:<\/span>\r\n            <span class=\"comment\">%         layer  - Layer to forward propagate through<\/span>\r\n            <span class=\"comment\">%         X      - Input data<\/span>\r\n            <span class=\"comment\">% Output:<\/span>\r\n            <span class=\"comment\">%         Z      - Output of layer forward function<\/span>\r\n            <span class=\"comment\">%         memory - Memory value which can be used for<\/span>\r\n            <span class=\"comment\">%                  backward propagation<\/span>\r\n\r\n            <span class=\"comment\">% Layer forward function for training goes here<\/span>\r\n        <span class=\"keyword\">end<\/span>\r\n\r\n        <span class=\"keyword\">function<\/span> [dLdX, dLdW1, <span class=\"keyword\">...<\/span><span class=\"comment\">, dLdWn] = backward(layer, X, Z, dLdZ, memory)<\/span>\r\n            <span class=\"comment\">% Backward propagate the derivative of the loss function through <\/span>\r\n            <span class=\"comment\">% the layer<\/span>\r\n            <span class=\"comment\">%<\/span>\r\n            <span class=\"comment\">% Inputs:<\/span>\r\n            <span class=\"comment\">%         layer             - Layer to backward propagate through<\/span>\r\n            <span class=\"comment\">%         X                 - Input data<\/span>\r\n            <span class=\"comment\">%         Z                 - Output of layer forward function            <\/span>\r\n            <span class=\"comment\">%         dLdZ              - Gradient propagated from the deeper layer<\/span>\r\n            <span class=\"comment\">%         memory            - Memory value which can be used in<\/span>\r\n            <span class=\"comment\">%                             backward propagation<\/span>\r\n            <span class=\"comment\">% Output:<\/span>\r\n            <span class=\"comment\">%         dLdX              - Derivative of the loss with respect to the<\/span>\r\n            <span class=\"comment\">%                             input data<\/span>\r\n            <span class=\"comment\">%         dLdW1, ..., dLdWn - Derivatives of the loss with respect to each<\/span>\r\n            <span class=\"comment\">%                             learnable parameter<\/span>\r\n            \r\n            <span class=\"comment\">% Layer backward function goes here<\/span>\r\n        end\r\n    end\r\nend\r\n\r\n<\/pre><p>For our ELU layer with a learnable alpha parameter, here's one way to write the constructor and the <tt>Learnable<\/tt> property block.<\/p><pre class=\"language-matlab\">\r\n<span class=\"keyword\">classdef<\/span> eluLayer &lt; nnet.layer.Layer\r\n\r\n    <span class=\"keyword\">properties<\/span> (Learnable)\r\n        alpha\r\n    <span class=\"keyword\">end<\/span>\r\n    \r\n    <span class=\"keyword\">methods<\/span>\r\n        <span class=\"keyword\">function<\/span> layer = eluLayer(num_channels,name)\r\n            layer.Type = <span class=\"string\">'Exponential Linear Unit'<\/span>;\r\n            \r\n            <span class=\"comment\">% Assign layer name if it is passed in.<\/span>\r\n            <span class=\"keyword\">if<\/span> nargin &gt; 1\r\n                layer.Name = name;\r\n            <span class=\"keyword\">end<\/span>\r\n            \r\n            <span class=\"comment\">% Give the layer a meaningful description.<\/span>\r\n            layer.Description = <span class=\"string\">\"Exponential linear unit with \"<\/span> + <span class=\"keyword\">...<\/span>\r\n                num_channels + <span class=\"string\">\" channels\"<\/span>;\r\n            \r\n            <span class=\"comment\">% Initialize the learnable alpha parameter.<\/span>\r\n            layer.alpha = rand(1,1,num_channels);\r\n        <span class=\"keyword\">end<\/span>\r\n\r\n<\/pre><p>The <tt>predict<\/tt> function is where we implement the activation function. Remember its mathematical form:<\/p><p>$f(x) = \\left\\{\\begin{array}{ll}    x &amp; x &gt; 0\\\\    \\alpha(e^x - 1) &amp; x \\leq 0 \\end{array} \\right.$<\/p><p>Note: The expression <tt>(exp(min(X,0)) - 1)<\/tt> in the predict function is written that way to avoid computing the exponential of large positive numbers, which could result in infinities and NaNs popping up.<\/p><pre class=\"language-matlab\">\r\n        <span class=\"keyword\">function<\/span> Z = predict(layer,X)\r\n            <span class=\"comment\">% Forward input data through the layer at prediction time and<\/span>\r\n            <span class=\"comment\">% output the result<\/span>\r\n            <span class=\"comment\">%<\/span>\r\n            <span class=\"comment\">% Inputs:<\/span>\r\n            <span class=\"comment\">%         layer    -    Layer to forward propagate through<\/span>\r\n            <span class=\"comment\">%         X        -    Input data<\/span>\r\n            <span class=\"comment\">% Output:<\/span>\r\n            <span class=\"comment\">%         Z        -    Output of layer forward function<\/span>\r\n            \r\n            <span class=\"comment\">% Expressing the computation in vectorized form allows it to<\/span>\r\n            <span class=\"comment\">% execute directly on the GPU.<\/span>\r\n            Z = (X .* (X &gt; 0)) + <span class=\"keyword\">...<\/span>\r\n                (layer.alpha.*(exp(min(X,0)) - 1) .* (X &lt;= 0));\r\n        <span class=\"keyword\">end<\/span>\r\n\r\n<\/pre><p>The <tt>backward<\/tt> function implements the derivatives of the loss function, which are needed for training. The <a href=\"https:\/\/www.mathworks.com\/help\/nnet\/ug\/define-layer-with-learnable-parameters.html\">Define a Layer with Learnable Parameters<\/a> documentation page explains how to derive the needed quantities.<\/p><pre class=\"language-matlab\">\r\n        <span class=\"keyword\">function<\/span> [dLdX, dLdAlpha] = backward(layer, X, Z, dLdZ, ~)\r\n            <span class=\"comment\">% Backward propagate the derivative of the loss function through <\/span>\r\n            <span class=\"comment\">% the layer<\/span>\r\n            <span class=\"comment\">%<\/span>\r\n            <span class=\"comment\">% Inputs:<\/span>\r\n            <span class=\"comment\">%         layer             - Layer to backward propagate through<\/span>\r\n            <span class=\"comment\">%         X                 - Input data<\/span>\r\n            <span class=\"comment\">%         Z                 - Output of layer forward function            <\/span>\r\n            <span class=\"comment\">%         dLdZ              - Gradient propagated from the deeper layer<\/span>\r\n            <span class=\"comment\">%         memory            - Memory value which can be used in<\/span>\r\n            <span class=\"comment\">%                             backward propagation [unused]<\/span>\r\n            <span class=\"comment\">% Output:<\/span>\r\n            <span class=\"comment\">%         dLdX              - Derivative of the loss with<\/span>\r\n            <span class=\"comment\">%                             respect to the input data<\/span>\r\n            <span class=\"comment\">%         dLdAlpha          - Derivatives of the loss with<\/span>\r\n            <span class=\"comment\">%                             respect to alpha<\/span>\r\n            \r\n            <span class=\"comment\">% Original expression:<\/span>\r\n            <span class=\"comment\">% dLdX = (dLdZ .* (X &gt; 0)) + ...<\/span>\r\n            <span class=\"comment\">%     (dLdZ .* (layer + Z) .* (X &lt;= 0));<\/span>\r\n            <span class=\"comment\">%<\/span>\r\n            <span class=\"comment\">% Optimized expression:<\/span>\r\n            dLdX = dLdZ .* ((X &gt; 0) + <span class=\"keyword\">...<\/span>\r\n                ((layer.alpha + Z) .* (X &lt;= 0)));            \r\n            \r\n            dLdAlpha = (exp(min(X,0)) - 1) .* dLdZ;\r\n            <span class=\"comment\">% Sum over the image rows and columns.<\/span>\r\n            dLdAlpha = sum(sum(dLdAlpha,1),2);\r\n            <span class=\"comment\">% Sum over all the observations in the mini-batch.<\/span>\r\n            dLdAlpha = sum(dLdAlpha,4);\r\n        <span class=\"keyword\">end<\/span>\r\n\r\n<\/pre><p>That's all we need for our layer. We don't need to implement the <tt>forward<\/tt> function because our layer doesn't have memory and doesn't need to do anything special for training.<\/p><p>Load in the sample digits training set, and show one of the images from it.<\/p><pre class=\"codeinput\">[XTrain, YTrain] = digitTrain4DArrayData;\r\nimshow(XTrain(:,:,:,1010),<span class=\"string\">'InitialMagnification'<\/span>,<span class=\"string\">'fit'<\/span>)\r\nYTrain(1010)\r\n<\/pre><pre class=\"codeoutput\">\r\nans = \r\n\r\n  categorical\r\n\r\n     2 \r\n\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2017\/12\/defining_elu_layer_02.png\" alt=\"\"> <p>Make a network that uses our new ELU layer.<\/p><pre class=\"codeinput\">layers = [ <span class=\"keyword\">...<\/span>\r\n    imageInputLayer([28 28 1])\r\n    convolution2dLayer(5,20)\r\n    batchNormalizationLayer\r\n    eluLayer(20)\r\n    fullyConnectedLayer(10)\r\n    softmaxLayer\r\n    classificationLayer];\r\n<\/pre><p>Train the network.<\/p><pre class=\"codeinput\">options = trainingOptions(<span class=\"string\">'sgdm'<\/span>);\r\nnet = trainNetwork(XTrain,YTrain,layers,options);\r\n<\/pre><pre class=\"codeoutput\">Training on single GPU.\r\nInitializing image normalization.\r\n|=========================================================================================|\r\n|     Epoch    |   Iteration  | Time Elapsed |  Mini-batch  |  Mini-batch  | Base Learning|\r\n|              |              |  (seconds)   |     Loss     |   Accuracy   |     Rate     |\r\n|=========================================================================================|\r\n|            1 |            1 |         0.03 |       2.5173 |        5.47% |       0.0100 |\r\n|            2 |           50 |         0.63 |       0.4548 |       85.16% |       0.0100 |\r\n|            3 |          100 |         1.20 |       0.1550 |       96.88% |       0.0100 |\r\n|            4 |          150 |         1.78 |       0.0951 |       99.22% |       0.0100 |\r\n|            6 |          200 |         2.37 |       0.0499 |       99.22% |       0.0100 |\r\n|            7 |          250 |         2.96 |       0.0356 |      100.00% |       0.0100 |\r\n|            8 |          300 |         3.55 |       0.0270 |      100.00% |       0.0100 |\r\n|            9 |          350 |         4.13 |       0.0168 |      100.00% |       0.0100 |\r\n|           11 |          400 |         4.74 |       0.0145 |      100.00% |       0.0100 |\r\n|           12 |          450 |         5.32 |       0.0118 |      100.00% |       0.0100 |\r\n|           13 |          500 |         5.89 |       0.0119 |      100.00% |       0.0100 |\r\n|           15 |          550 |         6.45 |       0.0074 |      100.00% |       0.0100 |\r\n|           16 |          600 |         7.03 |       0.0079 |      100.00% |       0.0100 |\r\n|           17 |          650 |         7.60 |       0.0086 |      100.00% |       0.0100 |\r\n|           18 |          700 |         8.18 |       0.0065 |      100.00% |       0.0100 |\r\n|           20 |          750 |         8.76 |       0.0066 |      100.00% |       0.0100 |\r\n|           21 |          800 |         9.34 |       0.0052 |      100.00% |       0.0100 |\r\n|           22 |          850 |         9.92 |       0.0054 |      100.00% |       0.0100 |\r\n|           24 |          900 |        10.51 |       0.0051 |      100.00% |       0.0100 |\r\n|           25 |          950 |        11.12 |       0.0044 |      100.00% |       0.0100 |\r\n|           26 |         1000 |        11.73 |       0.0049 |      100.00% |       0.0100 |\r\n|           27 |         1050 |        12.31 |       0.0040 |      100.00% |       0.0100 |\r\n|           29 |         1100 |        12.93 |       0.0041 |      100.00% |       0.0100 |\r\n|           30 |         1150 |        13.56 |       0.0040 |      100.00% |       0.0100 |\r\n|           30 |         1170 |        13.80 |       0.0043 |      100.00% |       0.0100 |\r\n|=========================================================================================|\r\n<\/pre><p>Check the accuracy of the network on our test set.<\/p><pre class=\"codeinput\">[XTest, YTest] = digitTest4DArrayData;\r\nYPred = classify(net, XTest);\r\naccuracy = sum(YTest==YPred)\/numel(YTest)\r\n<\/pre><pre class=\"codeoutput\">\r\naccuracy =\r\n\r\n    0.9872\r\n\r\n<\/pre><p>Look at one of the images in the test set and see how it was classified by the network.<\/p><pre class=\"codeinput\">k = 1500;\r\nimshow(XTest(:,:,:,k),<span class=\"string\">'InitialMagnification'<\/span>,<span class=\"string\">'fit'<\/span>)\r\nYPred(k)\r\n<\/pre><pre class=\"codeoutput\">\r\nans = \r\n\r\n  categorical\r\n\r\n     2 \r\n\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2017\/12\/defining_elu_layer_03.png\" alt=\"\"> <p>Now you've seen how to define your own layer, include it in a network, and train it up.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_1dec022b4d404e119a202bf79c99690b() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='1dec022b4d404e119a202bf79c99690b ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 1dec022b4d404e119a202bf79c99690b';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2017 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_1dec022b4d404e119a202bf79c99690b()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2017b<br><\/p><\/div><!--\r\n1dec022b4d404e119a202bf79c99690b ##### SOURCE BEGIN #####\r\n%% Defining Your Own Network Layer\r\n% One of the new Neural Network Toolbox features of R2017b is the ability\r\n% to define your own network layer. Today I'll show you how to make an\r\n% _exponential linear unit_ (ELU) layer.\r\n%\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/692126-joe-hicklin\r\n% Joe> helped me with today's post. Joe is one of the few developers who\r\n% have been around MathWorks longer than I have. In fact, he's one of the\r\n% people who interviewed me when I applied for a job here. I've had the\r\n% pleasure of working closely with Joe for the past several years on many\r\n% aspects of MATLAB design. He really loves tinkering with deep learning\r\n% networks.\r\n%\r\n% <<https:\/\/www.mathworks.com\/responsive_image\/150\/0\/0\/0\/0\/cache\/matlabcentral\/profiles\/692126_1506088706195.jpg>>\r\n%\r\n% Joe came across the paper \r\n% <https:\/\/arxiv.org\/pdf\/1511.07289.pdf \r\n% \"Fast and Accurate Deep Network\r\n% Learning by Exponential Linear Units (ELUs),\"> by Clevert, Unterthiner,\r\n% and Hichreiter, and he wanted to make an ELU layer using R2017b. \r\n%\r\n% $f(x) = \\left\\{\\begin{array}{ll}\r\n%    x & x > 0\\\\\r\n%    \\alpha(e^x - 1) & x \\leq 0\r\n% \\end{array} \\right.$\r\n%\r\n\r\n%%\r\n% Let's compare the ELU shape with a couple of other commonly used\r\n% activation functions.\r\nalpha1 = 1;\r\nelu_fcn = @(x) x.*(x > 0) + alpha1*(exp(x) - 1).*(x <= 0);\r\n\r\nalpha2 = 0.1;\r\nleaky_relu_fcn = @(x) alpha2*x.*(x <= 0) + x.*(x > 0);\r\n\r\nrelu_fcn = @(x) x.*(x > 0);\r\n\r\nfplot(elu_fcn,[-10 3],'LineWidth',2)\r\nhold on\r\nfplot(leaky_relu_fcn,[-10 3],'LineWidth',2)\r\nfplot(relu_fcn,[-10 3],'LineWidth',2)\r\nhold off\r\nax = gca;\r\nax.XAxisLocation = 'origin';\r\nax.YAxisLocation = 'origin';\r\nbox off\r\nlegend({'ELU','Leaky ReLU','ReLU'},'Location','northwest')\r\n\r\n%%\r\n% Joe wanted to make a ELU layer with one learned alpha value per channel.\r\n% He followed the procedure outlined in <https:\/\/www.mathworks.com\/help\/nnet\/ug\/define-layer-with-learnable-parameters.html\r\n% Define a Layer with\r\n% Learnable Parameters> to make an ELU layer that works with the\r\n% Neural Network Toolbox.\r\n%\r\n% Below is the template for a layer with learnable parameters. We'll\r\n% explore how to fill in this template to make an ELU layer.\r\n%\r\n% <include>learnable_parameter_template<\/include>\r\n\r\n%%\r\n% For our ELU layer with a learnable alpha parameter, here's one way to\r\n% write the constructor and the |Learnable| property block.\r\n%\r\n% <include>eluLayer_alpha_and_constructor.m<\/include>\r\n\r\n%%\r\n% The |predict| function is where we implement the activation function.\r\n% Remember its mathematical form:\r\n%\r\n% $f(x) = \\left\\{\\begin{array}{ll}\r\n%    x & x > 0\\\\\r\n%    \\alpha(e^x - 1) & x \\leq 0\r\n% \\end{array} \\right.$\r\n%\r\n% Note: The expression |(exp(min(X,0)) - 1)| in the predict function is written\r\n% that way to avoid computing the exponential of large positive numbers,\r\n% which could result in infinities and NaNs popping up.\r\n%\r\n% <include>eluLayer_predict.m<\/include>\r\n\r\n%%\r\n% The |backward| function implements the derivatives of the loss function,\r\n% which are needed for training. The <https:\/\/www.mathworks.com\/help\/nnet\/ug\/define-layer-with-learnable-parameters.html\r\n% Define a Layer with\r\n% Learnable Parameters> documentation page explains how\r\n% to derive the needed quantities.\r\n%\r\n% <include>eluLayer_backward.m<\/include>\r\n\r\n%%\r\n% That's all we need for our layer. We don't need to implement the\r\n% |forward| function because our layer doesn't have memory and doesn't need\r\n% to do anything special for training.\r\n%\r\n% Load in the sample digits training set, and show one of the images from\r\n% it.\r\n[XTrain, YTrain] = digitTrain4DArrayData;\r\nimshow(XTrain(:,:,:,1010),'InitialMagnification','fit')\r\nYTrain(1010)\r\n\r\n%%\r\n% Make a network that uses our new ELU layer.\r\n\r\nlayers = [ ...\r\n    imageInputLayer([28 28 1])\r\n    convolution2dLayer(5,20)\r\n    batchNormalizationLayer\r\n    eluLayer(20)\r\n    fullyConnectedLayer(10)\r\n    softmaxLayer\r\n    classificationLayer];\r\n\r\n%%\r\n% Train the network.\r\noptions = trainingOptions('sgdm');\r\nnet = trainNetwork(XTrain,YTrain,layers,options);\r\n\r\n%%\r\n% Check the accuracy of the network on our test set.\r\n[XTest, YTest] = digitTest4DArrayData;\r\nYPred = classify(net, XTest);\r\naccuracy = sum(YTest==YPred)\/numel(YTest)\r\n\r\n%%\r\n% Look at one of the images in the test set and see how it was classified\r\n% by the network.\r\n\r\nk = 1500;\r\nimshow(XTest(:,:,:,k),'InitialMagnification','fit')\r\nYPred(k)\r\n\r\n%%\r\n% Now you've seen how to define your own layer, include it in a network,\r\n% and train it up.\r\n\r\n##### SOURCE END ##### 1dec022b4d404e119a202bf79c99690b\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2017\/12\/defining_elu_layer_01.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><p>\r\nNote: Post updated 27-Sep-2018 to correct a typo in the implementation of the backward function.\r\nOne of the new Neural Network Toolbox features of R2017b is the ability to define your own network... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2018\/01\/05\/defining-your-own-network-layer\/\">read more >><\/a><\/p>","protected":false},"author":42,"featured_media":90,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/88"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/42"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=88"}],"version-history":[{"count":5,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/88\/revisions"}],"predecessor-version":[{"id":597,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/88\/revisions\/597"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media\/90"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=88"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=88"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=88"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}