{"id":3563,"date":"2020-02-28T07:01:42","date_gmt":"2020-02-28T07:01:42","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=3563"},"modified":"2021-04-06T15:48:58","modified_gmt":"2021-04-06T19:48:58","slug":"advanced-deep-learning-part-1","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2020\/02\/28\/advanced-deep-learning-part-1\/","title":{"rendered":"Advanced Deep Learning: Part 1"},"content":{"rendered":"<span style=\"font-size: 15px; color: #f27d1d;\"><strong>Build any Deep Learning Network<\/strong><\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">For the next few posts, I would like us all to step out of our comfort zone. I will be exploring and featuring more advanced deep learning topics. Release 19b introduced many new and exciting features that I have been hesitant to try because people start throwing around terms like, custom training loops, automatic differentiation (or even \u201cautodiff\u201d if you\u2019re really in the know). But I think it\u2019s time to dive in and explore new concepts, not just to understand them but understand where and why to use them. <\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">There is a lot to digest beyond the basics of deep learning, so I\u2019ve decided to create a series of posts. The post you are reading now will serve as a gentle introduction to lay groundwork and key terms, followed by a series of posts that look at individual network types (Autoencoders, Siamese networks, GANs and Attention mechanisms). <\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 15px; color: #f27d1d;\"><strong>The advanced deep learning basics<\/strong><\/span>\r\n\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">First, let\u2019s start with the <em>why<\/em>: \"why should I bother using the extended deep learning framework? I've gotten by just fine until now.\" First, you get a flexible training structure which allows you to create any network in MATLAB. The more complicated structures featured in the next posts require the extended framework to address features like:<\/span>\r\n<h6><\/h6>\r\n<ul>\r\n \t<li>Multiple Inputs and Outputs<\/li>\r\n<li>Custom loss functions<\/li>\r\n<li>Weight sharing<\/li>\r\n<li>Automatic Differentiation<\/li>\r\n<li>Special visualizations during training<\/li>\r\n<\/ul>\r\n\r\n\r\n<span style=\"font-size: 14px;\">I'll show a simple deep learning example and then rewrite it to use the extended framework, even though it doesn\u2019t need it. Why? Because then when the more complicated examples come, we\u2019ll already know the structure and what to do. <\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Let's start with a simple example we all know and love: MNIST. This simple handwriting example has various spinoffs (like my <a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2018\/06\/22\/deep-learning-in-action-part-1\">Pictionary example<\/a>) and is easy to implement in minimal lines of code.<\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 15px; color: #f27d1d;\"><strong>Basic MNIST Example<\/strong><\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">The steps for each version (simple framework and advanced framework) will be the same:<\/span>\r\n<h6><\/h6>\r\n<ol>\r\n<li>Define Network Layers<\/li>\r\n<li>Specify Training Options<\/li>\r\n<li>Train Network<\/li>\r\n<\/ol>\r\n\r\n<em>You can follow along with the <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/examples\/create-simple-deep-learning-network-for-classification.html\">full example in documentation<\/a>, which offers more descriptions and explanations of each line of code. <\/em>\r\n<h6><\/h6>\r\n<h4>Load the data <\/h4>\r\n<pre>[XTrain,YTrain] = digitTrain4DArrayData;\r\n[XTest, YTest] = digitTest4DArrayData; \r\nclasses = categories(YTrain);\r\nnumClasses = numel(classes);\r\n<\/pre>\r\n<h6><\/h6>\r\n\r\n<img decoding=\"async\" loading=\"lazy\" width=\"560\" height=\"420\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/02\/TrainABasicConvolutionalNeuralNetworkForClassificationExample_01.png\" alt=\"\" class=\"alignnone size-full wp-image-3583\" \/>\r\n\r\n<h6><\/h6>\r\n<h4>1. Define Network Layers<\/h4>\r\n<span style=\"font-size: 14px;\">Create a network, consisting of a simple series of layers.<\/span>\r\n<h6><\/h6>\r\n<pre>layers = [\r\n    imageInputLayer([28 28 1])\r\n    \r\n    convolution2dLayer(5,20,'Padding','same')\r\n    batchNormalizationLayer\r\n    reluLayer\r\n     \r\n    maxPooling2dLayer(2,'Stride',2)\r\n            \r\n    fullyConnectedLayer(10)\r\n    softmaxLayer\r\n    classificationLayer];\r\n<\/pre>\r\n<h6><\/h6>\r\n<h4>2. Specify Training Options<\/h4>\r\n<h6><\/h6>\r\n<pre>options = trainingOptions('sgdm', ...\r\n    'InitialLearnRate',0.01, ...\r\n    'MaxEpochs',4, ...\r\n    'Plots','training-progress');\r\n<\/pre>\r\n<span style=\"font-size: 14px;\">These are simple training options, and not necessarily intended to give the best results. In fact, <span style =\"font-family:courier\">trainingOptions<\/span> only requires you to set the optimizer, and the rest can use default values. <\/span>\r\n<h6><\/h6>\r\n<h4>3. Train the network<\/h4>\r\n<pre>net = trainNetwork(XTrain,YTrain,layers,options);\r\n<\/pre>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Simple enough! Now let's do the same thing in the extended framework. <\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 15px; color: #f27d1d;\"><strong>Extended Framework Example<\/strong><\/span>\r\n<h6><\/h6>\r\n\r\n<span style=\"font-size: 14px;\">Same example, just using the extended framework, or \"DLNetwork\" as I'll refer to this approach moving forward. This is a modified version of the code. To follow along with the complete example, the full code is in the <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ug\/train-network-using-custom-training-loop.html\"><u>doc example<\/u><\/a><\/span>.\r\n<h6><\/h6>\r\n\r\n<h6><\/h6>\r\n<h4>Load data<\/h4>\r\nThis is exactly the same, no need to show repeat code.\r\n<h6><\/h6>\r\nNow we can show the differences between the simple approach and the DLNetwork approach: Let's compare each of the following steps side by side to see highlight the differences. \r\n\r\n<h6><\/h6>\r\n\r\n\r\n<a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/03\/step1a.png\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"375\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/03\/step1a-1024x375.png\" alt=\"\" class=\"alignnone size-large wp-image-3801\" \/><\/a>\r\n\r\n\r\n\r\n<h4>1. Define Network Layers<\/h4>\r\n<span style=\"font-size: 14px;\">Layers are <em>almost <\/em>the same: we just need add names for each of the layers. This is handled explicitly in the simple framework, but we're required to do a little more pre-work. <\/span>\r\n<h6><\/h6>\r\n<pre>layers = [...\r\n    imageInputLayer([28 28 1], 'Name', 'input','Mean',mean(Xtrain,4))\r\n    convolution2dLayer(5, 20, 'Name', 'conv1')\r\n    reluLayer('Name', 'relu1')\r\n    maxPooling2dLayer(2, 'Stride', 2, 'Name', 'pool1')\r\n    fullyConnectedLayer(10, 'Name', 'fc')\r\n    softmaxLayer('Name','softmax')\r\n];<\/pre>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Notice in the layers, there is no classification layer anymore. This will be handled in the training loop, since this is what we want to customize.<\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Then convert the layers into a layerGraph, which makes them usable in a custom training loop. Also, specify the dlnet structure containing the network.<\/span>\r\n<pre>lgraph = layerGraph(layers);\r\ndlnet = dlnetwork(lgraph);<\/pre>\r\n<h6><\/h6>\r\n\r\n<img decoding=\"async\" loading=\"lazy\" width=\"397\" height=\"221\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/03\/dlnet_command.png\" alt=\"\" class=\"alignnone size-full wp-image-3593\" \/>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">A dlnetwork has properties such as layers and connections (which can handle Series or DAG networks) and also a place to store 'Learnables'. More on this later.<\/span>\r\n<h6><\/h6>\r\n<h4>2. Specify Training Options<\/h4>\r\n\r\n\r\n<a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/03\/step2a.png\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"380\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/03\/step2a-1024x380.png\" alt=\"\" class=\"alignnone size-large wp-image-3803\" \/><\/a>\r\n\r\n\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">You'll notice quite a few more non-optional parameters explicitly defined: these are parameters you will use in the custom training loop. Also, we no longer have the option of a pretty training plot like in the basic framework.<\/span> \r\n<h6><\/h6>\r\n\r\n<h6><\/h6>\r\n<pre>miniBatchSize = 128;\r\nnumEpochs = 30;\r\nnumObservations = numel(YTrain);\r\nnumIterationsPerEpoch = floor(numObservations.\/miniBatchSize);\r\ninitialLearnRate = 0.01;\r\nmomentum = 0.9;\r\nexecutionEnvironment = \"auto\";\r\nvel = [];<\/pre>\r\n\r\n<span style=\"font-size: 14px;\">You are now responsible for your own visualization, but this also means you could create your own visualizations throughout training, and customize to your liking to show anything about the network that would help to understand the network as it trains. <\/span>\r\n\r\n<span style=\"font-size: 14px;\">For now, let's setup a plot to display the loss\/error as the network trains.<\/span>\r\n\r\n<pre>plots = \"training-progress\";\r\nif plots == \"training-progress\"\r\n    figure\r\n    lineLossTrain = animatedline;\r\n    xlabel(\"Total Iterations\")\r\n    ylabel(\"Loss\")\r\nend\r\n<\/pre>\r\n<h4>Train network using custom training loop<\/h4>\r\n\r\n\r\n\r\n<a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/03\/step3a.png\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"368\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/03\/step3a-1024x368.png\" alt=\"\" class=\"alignnone size-large wp-image-3805\" \/><\/a>\r\n\r\n\r\n\r\n<span style=\"font-size: 14px;\">Basics you need to know before going into the training loop:<\/span>\r\n<h6><\/h6>\r\n<ul>\r\n\t<li>An <strong>Epoch <\/strong>is one iteration through the entire dataset. So if you have 10 epochs, you are running through all files 10 times.<\/li>\r\n\t<li>A <strong>Mini-batch<\/strong> is a smaller chunk of the dataset. Datasets are often too big to fit in memory or on a GPU at the same time, so we process the data in batches.<\/li>\r\n<\/ul>\r\n\r\n\r\n<span style=\"font-size: 14px;>\"I like to think of the visual like this:<\/span>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"995\" height=\"390\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/03\/minibatch_figure.png\" alt=\"\" class=\"alignnone size-full wp-image-3623\" \/>\r\n<h6><\/h6>\r\n\r\n<span style=\"font-size: 14px;\">So according to our defined parameters above, our custom training loop will loop through the entire dataset 30 times, and since our mini-batch size is 128, and our total number of images is 5000, it'll take 39 iterations to loop through the data 1 time. <\/span>\r\n<h6><\/h6>\r\n\r\n<span style=\"font-size: 14px;\">Here's the structure of the custom training loop. The full code is in the <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ug\/train-network-using-custom-training-loop.html#TrainNetworkUsingCustomTrainingLoopExample-5\"><u>doc example<\/u><\/a>, and I'll warn you the full script is quite a few lines of code, but a lot of it is straightforward once you understand the overall structure.<\/span>\r\n<h6><\/h6>\r\n\r\n\r\n<pre>for epoch = 1:numEpochs\r\n     ... \r\n  for ii = 1:numIterationsPerEpoch\r\n    <span class=\"comment\">% *Setup: read data, convert to dlarray, pass to GPU <\/span>\r\n           ... \r\n    \r\n    <span class=\"comment\">% Evaluate the model gradients and loss<\/span>\r\n    [gradients, loss] = dlfeval(@modelGradients, dlnet, dlX, Y);\r\n\r\n    <span class=\"comment\">% Update custom learn rate<\/span>\r\n    learnRate = initialLearnRate\/(1 + decay*iteration);\r\n    \r\n    <span class=\"comment\">% Update network parameters using SGDM optimizer<\/span>\r\n    [dlnet.Learnables, vel] = sgdmupdate(dlnet.Learnables, gradients, vel, learnRate, momentum);\r\n\r\n    <span class=\"comment\">% Update training plot <\/span>\r\n        ...\r\n  end\r\nend\r\n<\/pre>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">For completeness, you create the function <span style =\"font-family:courier\">modelGradients<\/span> where you define the gradients and loss function. More on the specifics of this in the next post. <\/span>\r\n<h6><\/h6>\r\n<pre>function [gradients, loss] = modelGradients(dlnet, dlX, Y)\r\n\r\n  dlYPred = forward(dlnet, dlX);\r\n\r\n  loss = crossentropy(dlYPred, Y);\r\n  gradients = dlgradient(loss, dlnet.Learnables);\r\n\r\nend<\/pre>\r\n\r\n<span style=\"font-size: 14px;\">In the simple example, one function <span style =\"font-family:courier\">trainnetwork<\/span> has expanded into a series of loops and code. We're doing this so we have more flexibility when networks require it, and we can revert back to the simpler method when it's overkill. The good news is, this is as complicated as it gets: Once you understand this structure, it's all about putting the right information into it!<\/span>\r\n\r\n<h6><\/h6>\r\n\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">For those that want to visualize what's happening in the loop, I see it like this:<\/span>\r\n\r\n<img decoding=\"async\" loading=\"lazy\" width=\"500\" height=\"328\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/03\/trainingLoopVisual2-3.png\" alt=\"\" class=\"alignnone size-full wp-image-3711\" \/>\r\n<h6><\/h6>\r\n\r\n\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">And as you may have guessed based on what's highlighted in the visualization above, the next post in this series will go into more detail on the inner workings of the loop, and what you need to know to understand what's happening with loss, gradients, learning rate, and updating network parameters. <\/span>\r\n<h6><\/h6>\r\n\r\n\r\n<h6><\/h6>\r\n<span style=\"font-size: 15px; color: #f27d1d;\"><strong>Three model approaches<\/strong><\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">One final point to keep in mind, while I used the extended framework with the <strong>DLNetwork approach<\/strong>, there is also a <strong>Model Function<\/strong> approach to use when you also want control of initializing and explicitly defining the network weights and biases. The example can also use the model function approach and you can follow along with this <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ug\/train-network-using-custom-training-loop.html\"><u>doc example<\/u><\/a> to learn more. This approach gives you the most control of the 3 approaches, but is also the most complex. <\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">The entire landscape looks like this:<\/span>\r\n<a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/03\/Step4a.png\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"442\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/03\/Step4a-1024x442.png\" alt=\"\" class=\"alignnone size-large wp-image-3807\" \/><\/a>\r\n\r\n<h6><\/h6>\r\n\r\n<span style=\"font-size: 14px;\">That's it for this post. It was a lot of information, but hopefully you found something informative within it. If you have any questions or clarifications, please leave a comment below!<\/span>\r\n","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/02\/TrainABasicConvolutionalNeuralNetworkForClassificationExample_01.png\" onError=\"this.style.display ='none';\" \/><\/div><p>Build any Deep Learning Network\r\n\r\nFor the next few posts, I would like us all to step out of our comfort zone. I will be exploring and featuring more advanced deep learning topics. Release 19b... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2020\/02\/28\/advanced-deep-learning-part-1\/\">read more >><\/a><\/p>","protected":false},"author":156,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/3563"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/156"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=3563"}],"version-history":[{"count":81,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/3563\/revisions"}],"predecessor-version":[{"id":6121,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/3563\/revisions\/6121"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=3563"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=3563"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=3563"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}