Deep Learning

Understanding and using deep learning networks

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Defining Your Own Network Layer (Revisited) 1

Posted by Steve Eddins,

Today I want to follow up on my previous post, Defining Your Own Network Layer. There were two reader comments that caught my attention.

The first comment, from Eric Shields, points out a key conclusion from the Clevert, Unterthiner, and Hichreiter paper that I overlooked. I initially focused just on the definition of the exponential linear unit function, but Eric pointed out that the authors concluded that batch normalization, which I used in my simple network, might not be needed when using an ELU layer.

Here's a reminder (from the previous post) about what the ELU curve looks like.

And here's the simple network that used last time.

layers = [ ...
    imageInputLayer([28 28 1])
    convolution2dLayer(5,20)
    batchNormalizationLayer
    eluLayer(20)
    fullyConnectedLayer(10)
    softmaxLayer
    classificationLayer];

I used the sample digits training set.

[XTrain, YTrain] = digitTrain4DArrayData;
imshow(XTrain(:,:,:,1010),'InitialMagnification','fit')
YTrain(1010)
ans = 

  categorical

     2 

Now I'll train the network again, using the same options as last time.

options = trainingOptions('sgdm');
net = trainNetwork(XTrain,YTrain,layers,options);
Training on single GPU.
Initializing image normalization.
|=========================================================================================|
|     Epoch    |   Iteration  | Time Elapsed |  Mini-batch  |  Mini-batch  | Base Learning|
|              |              |  (seconds)   |     Loss     |   Accuracy   |     Rate     |
|=========================================================================================|
|            1 |            1 |         0.01 |       2.5899 |       10.16% |       0.0100 |
|            2 |           50 |         0.61 |       0.4156 |       85.94% |       0.0100 |
|            3 |          100 |         1.20 |       0.1340 |       96.88% |       0.0100 |
|            4 |          150 |         1.80 |       0.0847 |       98.44% |       0.0100 |
|            6 |          200 |         2.41 |       0.0454 |      100.00% |       0.0100 |
|            7 |          250 |         3.01 |       0.0253 |      100.00% |       0.0100 |
|            8 |          300 |         3.60 |       0.0219 |      100.00% |       0.0100 |
|            9 |          350 |         4.19 |       0.0141 |      100.00% |       0.0100 |
|           11 |          400 |         4.85 |       0.0128 |      100.00% |       0.0100 |
|           12 |          450 |         5.46 |       0.0126 |      100.00% |       0.0100 |
|           13 |          500 |         6.05 |       0.0099 |      100.00% |       0.0100 |
|           15 |          550 |         6.66 |       0.0079 |      100.00% |       0.0100 |
|           16 |          600 |         7.27 |       0.0084 |      100.00% |       0.0100 |
|           17 |          650 |         7.86 |       0.0075 |      100.00% |       0.0100 |
|           18 |          700 |         8.45 |       0.0081 |      100.00% |       0.0100 |
|           20 |          750 |         9.05 |       0.0066 |      100.00% |       0.0100 |
|           21 |          800 |         9.64 |       0.0057 |      100.00% |       0.0100 |
|           22 |          850 |        10.24 |       0.0054 |      100.00% |       0.0100 |
|           24 |          900 |        10.83 |       0.0049 |      100.00% |       0.0100 |
|           25 |          950 |        11.44 |       0.0055 |      100.00% |       0.0100 |
|           26 |         1000 |        12.04 |       0.0046 |      100.00% |       0.0100 |
|           27 |         1050 |        12.66 |       0.0041 |      100.00% |       0.0100 |
|           29 |         1100 |        13.25 |       0.0044 |      100.00% |       0.0100 |
|           30 |         1150 |        13.84 |       0.0038 |      100.00% |       0.0100 |
|           30 |         1170 |        14.08 |       0.0042 |      100.00% |       0.0100 |
|=========================================================================================|

Note that the training took about 14.1 seconds.

Check the accuracy of the trained network.

[XTest, YTest] = digitTest4DArrayData;
YPred = classify(net, XTest);
accuracy = sum(YTest==YPred)/numel(YTest)
accuracy =

    0.9878

Now let's make another network without the batch normalization layer.

layers2 = [ ...
    imageInputLayer([28 28 1])
    convolution2dLayer(5,20)
    eluLayer(20)
    fullyConnectedLayer(10)
    softmaxLayer
    classificationLayer];

Train it up again.

net2 = trainNetwork(XTrain,YTrain,layers2,options);
Training on single GPU.
Initializing image normalization.
|=========================================================================================|
|     Epoch    |   Iteration  | Time Elapsed |  Mini-batch  |  Mini-batch  | Base Learning|
|              |              |  (seconds)   |     Loss     |   Accuracy   |     Rate     |
|=========================================================================================|
|            1 |            1 |         0.01 |       2.3022 |        7.81% |       0.0100 |
|            2 |           50 |         0.52 |       1.6631 |       51.56% |       0.0100 |
|            3 |          100 |         1.04 |       1.4368 |       52.34% |       0.0100 |
|            4 |          150 |         1.58 |       1.0426 |       61.72% |       0.0100 |
|            6 |          200 |         2.12 |       0.8223 |       72.66% |       0.0100 |
|            7 |          250 |         2.67 |       0.6842 |       80.47% |       0.0100 |
|            8 |          300 |         3.21 |       0.6461 |       78.13% |       0.0100 |
|            9 |          350 |         3.79 |       0.4181 |       85.94% |       0.0100 |
|           11 |          400 |         4.33 |       0.4163 |       86.72% |       0.0100 |
|           12 |          450 |         4.88 |       0.2115 |       96.09% |       0.0100 |
|           13 |          500 |         5.42 |       0.1817 |       97.66% |       0.0100 |
|           15 |          550 |         5.96 |       0.1809 |       96.09% |       0.0100 |
|           16 |          600 |         6.53 |       0.1001 |      100.00% |       0.0100 |
|           17 |          650 |         7.07 |       0.0899 |      100.00% |       0.0100 |
|           18 |          700 |         7.61 |       0.0934 |       99.22% |       0.0100 |
|           20 |          750 |         8.14 |       0.0739 |       99.22% |       0.0100 |
|           21 |          800 |         8.68 |       0.0617 |      100.00% |       0.0100 |
|           22 |          850 |         9.22 |       0.0462 |      100.00% |       0.0100 |
|           24 |          900 |         9.76 |       0.0641 |      100.00% |       0.0100 |
|           25 |          950 |        10.29 |       0.0332 |      100.00% |       0.0100 |
|           26 |         1000 |        10.86 |       0.0317 |      100.00% |       0.0100 |
|           27 |         1050 |        11.41 |       0.0378 |       99.22% |       0.0100 |
|           29 |         1100 |        11.96 |       0.0235 |      100.00% |       0.0100 |
|           30 |         1150 |        12.51 |       0.0280 |      100.00% |       0.0100 |
|           30 |         1170 |        12.73 |       0.0307 |      100.00% |       0.0100 |
|=========================================================================================|

That took about 12.7 seconds to train, about a 10% reduction. Check the accuracy.

[XTest, YTest] = digitTest4DArrayData;
YPred = classify(net2, XTest);
accuracy = sum(YTest==YPred)/numel(YTest)
accuracy =

    0.9808

Eric said he got the same accuracy, whereas I am seeing a slightly lower accuracy. But I haven't really explored this further, and I so I wouldn't draw any conclusions. I just wanted to take the opportunity to go back and mention one of the important points of the paper that I overlooked last time.

A second reader, another Eric, wanted to know if alpha could be specified as a learnable or non learnable parameter at run time.

The answer: Yes, but not without defining a second class. Recall this portion of the template for defining a layer with learnable properties:

    properties (Learnable)
        % (Optional) Layer learnable parameters

        % Layer learnable parameters go here
    end

That Learnable attribute of the properties block is a fixed part of the class definition. It can't be changed dynamically. So, you need to define a second class. I'll call mine eluLayerFixedAlpha. Here's the properties block:

    properties
        alpha
    end

And here's a constructor that takes alpha as an input argument.

    methods
        function layer = eluLayerFixedAlpha(alpha,name)
            layer.Type = 'Exponential Linear Unit';
            layer.alpha = alpha;
            
            % Assign layer name if it is passed in.
            if nargin > 1
                layer.Name = name;
            end
            
            % Give the layer a meaningful description.
            layer.Description = "Exponential linear unit with alpha: " + ...
                alpha;
        end

I also modifed the backward method to remove the computation and output argument associated with the derivative of the loss function with respect to alpha.

        function dLdX = backward(layer, X, Z, dLdZ, ~)
            % Backward propagate the derivative of the loss function through 
            % the layer
            %
            % Inputs:
            %         layer             - Layer to backward propagate through
            %         X                 - Input data
            %         Z                 - Output of layer forward function            
            %         dLdZ              - Gradient propagated from the deeper layer
            %         memory            - Memory value which can be used in
            %                             backward propagation [unused]
            % Output:
            %         dLdX              - Derivative of the loss with
            %                             respect to the input data
            
            dLdX = dLdZ .* ((X > 0) + ...
                ((layer.alpha + Z) .* (X <= 0)));            
        end

Let's try it. I'm just going to make up a value for alpha.

alpha = 1.0;
layers3 = [ ...
    imageInputLayer([28 28 1])
    convolution2dLayer(5,20)
    eluLayerFixedAlpha(alpha)
    fullyConnectedLayer(10)
    softmaxLayer
    classificationLayer];
net3 = trainNetwork(XTrain,YTrain,layers3,options);
YPred = classify(net3, XTest);
accuracy = sum(YTest==YPred)/numel(YTest)
Training on single GPU.
Initializing image normalization.
|=========================================================================================|
|     Epoch    |   Iteration  | Time Elapsed |  Mini-batch  |  Mini-batch  | Base Learning|
|              |              |  (seconds)   |     Loss     |   Accuracy   |     Rate     |
|=========================================================================================|
|            1 |            1 |         0.01 |       2.3005 |       14.06% |       0.0100 |
|            2 |           50 |         0.48 |       1.4979 |       53.13% |       0.0100 |
|            3 |          100 |         0.97 |       1.2162 |       57.81% |       0.0100 |
|            4 |          150 |         1.48 |       1.1427 |       67.97% |       0.0100 |
|            6 |          200 |         1.99 |       0.9837 |       67.19% |       0.0100 |
|            7 |          250 |         2.50 |       0.8110 |       70.31% |       0.0100 |
|            8 |          300 |         3.04 |       0.7347 |       75.00% |       0.0100 |
|            9 |          350 |         3.55 |       0.5937 |       81.25% |       0.0100 |
|           11 |          400 |         4.05 |       0.5686 |       78.13% |       0.0100 |
|           12 |          450 |         4.56 |       0.4678 |       85.94% |       0.0100 |
|           13 |          500 |         5.06 |       0.3461 |       88.28% |       0.0100 |
|           15 |          550 |         5.57 |       0.3515 |       87.50% |       0.0100 |
|           16 |          600 |         6.07 |       0.2582 |       92.97% |       0.0100 |
|           17 |          650 |         6.58 |       0.2216 |       92.97% |       0.0100 |
|           18 |          700 |         7.08 |       0.1705 |       96.09% |       0.0100 |
|           20 |          750 |         7.59 |       0.1212 |       98.44% |       0.0100 |
|           21 |          800 |         8.09 |       0.0925 |       98.44% |       0.0100 |
|           22 |          850 |         8.59 |       0.1045 |       97.66% |       0.0100 |
|           24 |          900 |         9.10 |       0.1289 |       96.09% |       0.0100 |
|           25 |          950 |         9.60 |       0.0710 |       99.22% |       0.0100 |
|           26 |         1000 |        10.10 |       0.0722 |       99.22% |       0.0100 |
|           27 |         1050 |        10.60 |       0.0600 |       99.22% |       0.0100 |
|           29 |         1100 |        11.10 |       0.0688 |       99.22% |       0.0100 |
|           30 |         1150 |        11.61 |       0.0519 |      100.00% |       0.0100 |
|           30 |         1170 |        11.82 |       0.0649 |       99.22% |       0.0100 |
|=========================================================================================|

accuracy =

    0.9702

Thanks for your comments and questions, Eric and Eric.


Get the MATLAB code

Published with MATLAB® R2017b

1 CommentsOldest to Newest

Jack Xiao replied on : 1 of 1
thanks for sharing. i find that Xtrain and Ytrain are read-ahead. so if they are large scale, then how to deal with them. I know "imageDatastore" can be applied in classification problems. but for regression problems, how to it? can it support read online if Xtrain and Ytrain are large scale? thanks!

Add A Comment

Your email address will not be published. Required fields are marked *

Preview: hide