Comments on: Defining Your Own Network Layer

By: Kookmin University

Kookmin University — Thu, 18 Oct 2018 12:02:36 +0000

Hi Steve. Thank you so much for your explanation.
I am trying to use this to build my own fully connected layer.
But I meet a problem here is that how could I retrieve the output of the previous layer? In keras, there is a flatten layer followed the convolution layer.
And in this case, how could Matlab recognize the shape of the output layer?
Could you please help me?

By: Sunny Arokia Swamy Bellary

Sunny Arokia Swamy Bellary — Mon, 24 Sep 2018 23:08:54 +0000

Hi Steve… Thanks for the explanation… I tried to implement for prediction problem using LSTM… How can I add the custom defined layer to thats model imported from KERAS?

Thanks and Regards,
Sunny

By: Jack Xiao

Jack Xiao — Thu, 19 Jul 2018 09:02:04 +0000

Hi Steve,
Why is the backward (the derivative of the loss function) is so?
I think the backward used in this example is the the derivative of the active function but not the derivative of the loss function.
maybe we should confirm the loss (such as MAE, MSE) first , then we can get the final backward in terms of the active function and the loss function. Is it so?
another question:
why the example in https://ww2.mathworks.cn/help/nnet/ug/define-custom-regression-output-layer.html used backwardloss or forwardloss but not backward or forward? does it have any difference?

By: Steve Eddins

Steve Eddins — Wed, 24 Jan 2018 15:53:03 +0000

Guillaume—Thanks for the idea!

By: guillaume godin

guillaume godin — Wed, 24 Jan 2018 09:10:15 +0000

Hi Steve, I implemented the ELU like this: to optimize computation (using memory data)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    function Z = predict(layer, X)
            % Forward input data through the layer and output the result
            Z = max(0, X) + layer.Alpha .*(exp(min(0, X))-1);
            
        end
        
        
        function [Z, memory] = forward(layer, X)
            % (Optional) Forward input data through the layer at training
            % time and output the result and a memory value
            %
            % Inputs:
            %         layer  - Layer to forward propagate through
            %         X      - Input data
            % Output:
            %         Z      - Output of layer forward function
            %         memory - Memory value which can be used for
            %                  backward propagation

            % Layer forward function for training goes here
            memory = exp(min(0, X))-1;
            Z = max(0, X) + layer.Alpha .* memory;

        end
        
        
        function [dLdX, dLdAlpha] = backward(layer, X, Z, dLdZ, memory)
            % Backward propagate the derivative of the loss function through 
            % the layer 
            % y = dLdZ
            
            dLdX = (layer.Alpha+Z) .* dLdZ; % negative part => (a+f(x))*y
            dLdX(X>0) = dLdZ(X>0); % positive part => y
            
            % derivat the the 
            dLdAlpha = memory .* dLdZ; % negative part only => exp(x)-1 
            dLdAlpha = sum(sum(dLdAlpha,1),2);
            
            % Sum over all observations in mini-batch
            dLdAlpha = sum(dLdAlpha,4);            
         end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

my results are similar to yours:

|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:00 |        8.59% |       2.6340 |          0.0100 |
|       2 |          50 |       00:00:04 |       80.47% |       0.5700 |          0.0100 |
|       3 |         100 |       00:00:07 |       96.88% |       0.1470 |          0.0100 |
|       4 |         150 |       00:00:11 |       97.66% |       0.1322 |          0.0100 |
|       6 |         200 |       00:00:14 |       99.22% |       0.0621 |          0.0100 |
|       7 |         250 |       00:00:18 |       99.22% |       0.0395 |          0.0100 |
|       8 |         300 |       00:00:21 |      100.00% |       0.0212 |          0.0100 |
|       9 |         350 |       00:00:24 |      100.00% |       0.0191 |          0.0100 |
|      11 |         400 |       00:00:28 |      100.00% |       0.0170 |          0.0100 |
|      12 |         450 |       00:00:31 |      100.00% |       0.0119 |          0.0100 |
|      13 |         500 |       00:00:35 |      100.00% |       0.0116 |          0.0100 |
|      15 |         550 |       00:00:38 |      100.00% |       0.0056 |          0.0100 |
|      16 |         600 |       00:00:42 |      100.00% |       0.0099 |          0.0100 |
|      17 |         650 |       00:00:45 |      100.00% |       0.0080 |          0.0100 |
|      18 |         700 |       00:00:49 |      100.00% |       0.0058 |          0.0100 |
|      20 |         750 |       00:00:52 |      100.00% |       0.0063 |          0.0100 |
|      21 |         800 |       00:00:56 |      100.00% |       0.0055 |          0.0100 |
|      22 |         850 |       00:01:00 |      100.00% |       0.0060 |          0.0100 |
|      24 |         900 |       00:01:03 |      100.00% |       0.0045 |          0.0100 |
|      25 |         950 |       00:01:06 |      100.00% |       0.0039 |          0.0100 |
|      26 |        1000 |       00:01:10 |      100.00% |       0.0033 |          0.0100 |
|      27 |        1050 |       00:01:13 |      100.00% |       0.0046 |          0.0100 |
|      29 |        1100 |       00:01:17 |      100.00% |       0.0042 |          0.0100 |
|      30 |        1150 |       00:01:20 |      100.00% |       0.0040 |          0.0100 |
|      30 |        1170 |       00:01:21 |      100.00% |       0.0042 |          0.0100 |
|========================================================================================|
>> [XTest, YTest] = digitTest4DArrayData;
YPred = classify(net, XTest);
accuracy = sum(YTest==YPred)/numel(YTest)

accuracy =

    0.9896

BR, Guillaume

By: Binbin Qi

Binbin Qi — Wed, 17 Jan 2018 02:58:51 +0000

sorry, It is my fault, I set a wrong parameter.

By: Binbin Qi

Binbin Qi — Wed, 17 Jan 2018 01:32:24 +0000

when use this layers, the channels of images is 3, it does not work

By: Eric Shields

Eric Shields — Fri, 05 Jan 2018 15:37:04 +0000

Batch normalization may not be necessary with ELUs. Clevert, et al, indicate that “Batch normalization improved ReLU and LReLU networks, but did not improve ELU and SReLU networks.” On the example code I get 10% faster performance for the same accuracy by removing the batch normalization layer.

By: Eric

Eric — Fri, 05 Jan 2018 15:16:08 +0000

Thanks for a great blog post. Is there an easy way to modify this code so that the user can determine at run-time whether alpha is learned or fixed? Or does a separate class need to be defined with alpha outside of the Learnable properties block?