# Advanced Deep Learning: Part 1

Build any Deep Learning Network For the next few posts, I would like us all to step out of our comfort zone. I will be exploring and featuring more advanced deep learning topics. Release 19b introduced many new and exciting features that I have been hesitant to try because people start throwing around terms like, custom training loops, automatic differentiation (or even “autodiff” if you’re really in the know). But I think it’s time to dive in and explore new concepts, not just to understand them but understand where and why to use them. There is a lot to digest beyond the basics of deep learning, so I’ve decided to create a series of posts. The post you are reading now will serve as a gentle introduction to lay groundwork and key terms, followed by a series of posts that look at individual network types (Autoencoders, Siamese networks, GANs and Attention mechanisms). The advanced deep learning basics First, let’s start with the why: "why should I bother using the extended deep learning framework? I've gotten by just fine until now." First, you get a flexible training structure which allows you to create any network in MATLAB. The more complicated structures featured in the next posts require the extended framework to address features like:
• Multiple Inputs and Outputs
• Custom loss functions
• Weight sharing
• Automatic Differentiation
• Special visualizations during training
I'll show a simple deep learning example and then rewrite it to use the extended framework, even though it doesn’t need it. Why? Because then when the more complicated examples come, we’ll already know the structure and what to do. Let's start with a simple example we all know and love: MNIST. This simple handwriting example has various spinoffs (like my Pictionary example) and is easy to implement in minimal lines of code. Basic MNIST Example The steps for each version (simple framework and advanced framework) will be the same:
1. Define Network Layers
2. Specify Training Options
3. Train Network
You can follow along with the full example in documentation, which offers more descriptions and explanations of each line of code.

[XTrain,YTrain] = digitTrain4DArrayData;
[XTest, YTest] = digitTest4DArrayData;
classes = categories(YTrain);
numClasses = numel(classes);


#### 1. Define Network Layers

Create a network, consisting of a simple series of layers.
layers = [
imageInputLayer([28 28 1])

batchNormalizationLayer
reluLayer

maxPooling2dLayer(2,'Stride',2)

fullyConnectedLayer(10)
softmaxLayer
classificationLayer];


#### 2. Specify Training Options

options = trainingOptions('sgdm', ...
'InitialLearnRate',0.01, ...
'MaxEpochs',4, ...
'Plots','training-progress');

These are simple training options, and not necessarily intended to give the best results. In fact, trainingOptions only requires you to set the optimizer, and the rest can use default values.

#### 3. Train the network

net = trainNetwork(XTrain,YTrain,layers,options);

Simple enough! Now let's do the same thing in the extended framework. Extended Framework Example Same example, just using the extended framework, or "DLNetwork" as I'll refer to this approach moving forward. This is a modified version of the code. To follow along with the complete example, the full code is in the doc example.

This is exactly the same, no need to show repeat code. Now we can show the differences between the simple approach and the DLNetwork approach: Let's compare each of the following steps side by side to see highlight the differences.

#### 1. Define Network Layers

Layers are almost the same: we just need add names for each of the layers. This is handled explicitly in the simple framework, but we're required to do a little more pre-work.
layers = [...
imageInputLayer([28 28 1], 'Name', 'input','Mean',mean(Xtrain,4))
convolution2dLayer(5, 20, 'Name', 'conv1')
reluLayer('Name', 'relu1')
maxPooling2dLayer(2, 'Stride', 2, 'Name', 'pool1')
fullyConnectedLayer(10, 'Name', 'fc')
softmaxLayer('Name','softmax')
];
Notice in the layers, there is no classification layer anymore. This will be handled in the training loop, since this is what we want to customize. Then convert the layers into a layerGraph, which makes them usable in a custom training loop. Also, specify the dlnet structure containing the network.
lgraph = layerGraph(layers);
dlnet = dlnetwork(lgraph);
A dlnetwork has properties such as layers and connections (which can handle Series or DAG networks) and also a place to store 'Learnables'. More on this later.

#### 2. Specify Training Options

You'll notice quite a few more non-optional parameters explicitly defined: these are parameters you will use in the custom training loop. Also, we no longer have the option of a pretty training plot like in the basic framework.
miniBatchSize = 128;
numEpochs = 30;
numObservations = numel(YTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
initialLearnRate = 0.01;
momentum = 0.9;
executionEnvironment = "auto";
vel = [];
You are now responsible for your own visualization, but this also means you could create your own visualizations throughout training, and customize to your liking to show anything about the network that would help to understand the network as it trains. For now, let's setup a plot to display the loss/error as the network trains.
plots = "training-progress";
if plots == "training-progress"
figure
lineLossTrain = animatedline;
xlabel("Total Iterations")
ylabel("Loss")
end


#### Train network using custom training loop

Basics you need to know before going into the training loop:
• An Epoch is one iteration through the entire dataset. So if you have 10 epochs, you are running through all files 10 times.
• A Mini-batch is a smaller chunk of the dataset. Datasets are often too big to fit in memory or on a GPU at the same time, so we process the data in batches.
So according to our defined parameters above, our custom training loop will loop through the entire dataset 30 times, and since our mini-batch size is 128, and our total number of images is 5000, it'll take 39 iterations to loop through the data 1 time. Here's the structure of the custom training loop. The full code is in the doc example, and I'll warn you the full script is quite a few lines of code, but a lot of it is straightforward once you understand the overall structure.
for epoch = 1:numEpochs
...
for ii = 1:numIterationsPerEpoch
% *Setup: read data, convert to dlarray, pass to GPU
...

% Evaluate the model gradients and loss

% Update custom learn rate
learnRate = initialLearnRate/(1 + decay*iteration);

% Update network parameters using SGDM optimizer
[dlnet.Learnables, vel] = sgdmupdate(dlnet.Learnables, gradients, vel, learnRate, momentum);

% Update training plot
...
end
end

For completeness, you create the function modelGradients where you define the gradients and loss function. More on the specifics of this in the next post.
function [gradients, loss] = modelGradients(dlnet, dlX, Y)

dlYPred = forward(dlnet, dlX);

loss = crossentropy(dlYPred, Y);
end