{"id":8433,"date":"2021-11-16T09:00:48","date_gmt":"2021-11-16T14:00:48","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=8433"},"modified":"2021-12-03T19:46:18","modified_gmt":"2021-12-04T00:46:18","slug":"matlabs-best-model-deep-learning-basics","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2021\/11\/16\/matlabs-best-model-deep-learning-basics\/","title":{"rendered":"MATLAB&#8217;s Best Model: Deep Learning Basics"},"content":{"rendered":"<em>This post is from Heather Gorr, MATLAB product marketing. You can follow her on social media: <\/em><a href=\"https:\/\/www.instagram.com\/heather.codes\/\" target=\"_blank\" rel=\"noopener\"><em>@heather.codes<\/em><\/a><em>,\u00a0<\/em><a href=\"https:\/\/www.tiktok.com\/@heather.codes\" target=\"_blank\" rel=\"noopener\"><em>@heather.codes<\/em><\/a><em>,\u00a0<\/em><a href=\"https:\/\/twitter.com\/HeatherGorr\" target=\"_blank\" rel=\"noopener\"><em>@HeatherGorr<\/em><\/a><em>, and\u00a0<\/em><a href=\"https:\/\/www.linkedin.com\/in\/heather-gorr-phd\/\" target=\"_blank\" rel=\"noopener\"><em>@heather-gorr-phd<\/em><\/a><em>.\u00a0This blog post follows the\u00a0fabulous modeling competition LIVE on YouTube, <\/em><a href=\"https:\/\/youtu.be\/HILyfTwNwBo\" target=\"_blank\" rel=\"noopener\"><em>MATLAB's Best Model: Deep Learning Basics<\/em><\/a><em>\u00a0to guide you in how to choose the best model<\/em><em>. For deep learning models, there are different ways to assess what is the \u201cbest\u201d model. It could be a) comparing different networks (problem 1) or b) finding the right parameters for a particular network (problem 2).<\/em>\r\n\r\n<em>How can this be managed efficiently and quickly? Using a low code tool in MATLAB, the <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ref\/experimentmanager-app.html\" target=\"_blank\" rel=\"noopener\">Experiment Manager app<\/a>! <\/em>\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-8655 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/11\/Picture12.png\" alt=\"\" width=\"618\" height=\"127\" \/><\/h6>\r\n<h1>Approach<\/h1>\r\nWe created two <em>problems<\/em> for image classification and timeseries regression. Based on the data sets, we considered two types of models: Convolutional (CNN) and Long Short-Term Memory (LSTM) networks. The image below shows some common networks used for different data types.\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"482\" height=\"253\" class=\"alignnone size-full wp-image-8439\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/11\/Picture1.png\" alt=\"\" \/>\r\n<h6><\/h6>\r\n<em>Fig 1: Common data sets and networks<\/em>\r\n<h6><\/h6>\r\nWe used doc examples for repeatability (plus, reasonably sized data sets for a livestream!) and used apps in MATLAB to explore, train, and compare the models quickly. We'll discuss more as we get into the details!\r\n<h6><\/h6>\r\n<h1>Problem 1: Image classification<\/h1>\r\nFor our first problem, we compared CNN models to classify types of flowers. CNNs are very common as they involve a series of operations, which we can generally understand: convolutions, mathematical operations, and aggregations.\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"525\" height=\"88\" class=\"alignnone size-full wp-image-8610\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/11\/flowers.png\" alt=\"\" \/><\/h6>\r\n<h6><\/h6>\r\n<em>Fig 2: Convolutional Neural Network (CNN) diagram<\/em>\r\n<h6><\/h6>\r\nAs you may recall from previous posts, we have some great starting points in this field! We used <em>transfer learning<\/em>, where you update a pretrained network with your data.\r\n<h6><\/h6>\r\n<h2>Choosing networks<\/h2>\r\nWe started by exploring pretrained models using the <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/deep-network-designer-app.html?searchHighlight=Deep%20Network%20Designer%20app&amp;s_tid=srchtitle_Deep%20Network%20Designer%20app_3\" target=\"_blank\" rel=\"noopener\">Deep Network Designer app<\/a> which provides a sense of the overall network architecture to help us select before investigating the detail.\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"922\" class=\"alignnone size-large wp-image-8742\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/11\/DND-1-1024x922.jpg\" alt=\"\" \/><\/h6>\r\n<em>Fig 3: Pretrained models in Deep Network Designer <\/em>\r\n<h6><\/h6>\r\nWe wanted varying levels of complexity for our competition, so we decided on <a href=\"https:\/\/uk.mathworks.com\/help\/deeplearning\/ref\/squeezenet.html\" target=\"_blank\" rel=\"noopener\">squeezenet<\/a>, <a href=\"https:\/\/uk.mathworks.com\/help\/deeplearning\/ref\/googlenet.html\" target=\"_blank\" rel=\"noopener\">googlenet<\/a>, and <a href=\"https:\/\/uk.mathworks.com\/help\/deeplearning\/ref\/inceptionv3.html\" target=\"_blank\" rel=\"noopener\">inceptionv3<\/a>.\r\n<h6><\/h6>\r\n<h2>Comparing networks<\/h2>\r\nNext, we needed to train and validate all 3 networks and compare the results! The <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/experiment-manager.html?searchHighlight=Experiment%20Manager%20app&amp;s_tid=srchtitle_Experiment%20Manager%20app_2\" target=\"_blank\" rel=\"noopener\">Experiment Manager app<\/a> is super helpful to stay organized and automate this part.\r\n\r\n&nbsp;\r\n<h6><\/h6>\r\nThis doc example walks through setting up and running the experiment:\r\n<h6><\/h6>\r\n<pre>cd(setupExample('nnet\/ExpMgrTransferLearningExample'));setupExpMgr('FlowerTransferLearningProject');<\/pre>\r\n&nbsp;\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"888\" height=\"682\" class=\"alignnone size-full wp-image-8745\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/11\/fig4cropped.png\" alt=\"\" \/><\/h6>\r\n<em>Fig 4: Setting network parameters in Experiment Manager App<\/em>\r\n<h6><\/h6>\r\nAs you probably know, training networks can take some time! Here we are training 3 of them - so you want to consider your hardware and problem before hitting <em>run<\/em>. You can adjust setting to use GPUs and run experiments in parallel easily through the app.\r\n<h6><\/h6>\r\nI started the experiment a bit early to ensure we had time to compare and ran it on my Linux machine for multi-GPU action!\r\n<h6><\/h6>\r\n<h2>The judges' scores<\/h2>\r\nHow did our models perform? There are a few criteria we used to assess:\r\n<ul>\r\n \t<li>Accuracy<\/li>\r\n \t<li>Speed<\/li>\r\n \t<li>Overall quality<\/li>\r\n \t<li>Explainability<\/li>\r\n<\/ul>\r\nMost of these measures can be quickly found in the app - more on explainability below, as it's much more nuanced!\r\n\r\n&nbsp;\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"829\" height=\"480\" class=\"alignnone size-full wp-image-8697\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/11\/expmgr2-Small.png\" alt=\"\" \/><\/h6>\r\n<em>Fig 5: Classification results in Experiment Manager App<\/em>\r\n<h6><\/h6>\r\nWe found that in this example, inceptionv3 performed <em>best<\/em>\u00a0in terms of accuracy (91.9 %) but takes <em>much <\/em>longer as it\u2019s a more complicated architecture compared to the others. Looking at the next runners-up, googlenet might be a better compromise since it was much faster and still has similarly good validation accuracy (91%). The squeezenet model trained the fastest but has worse accuracy, though I wouldn't rule it out! Every problem is different when it comes to what\u2019s most important! Finally, we checked the confusion matrices which looked quite similar and balanced. This is a very important visual to help ensure you don\u2019t have imbalanced accuracies amongst classes... which leads us to our last criteria.\r\n<h6><\/h6>\r\n<h2>Explainability<\/h2>\r\nBeing able to interpret the models is increasingly important, and <em>model explainability<\/em>\u00a0is an area of active research in the field of deep learning. We'll keep this section brief as we have a lot more to come on these topics. Basically, you understand what's happening, especially if something goes wrong (the developer, the team, even the users need to understand). There are some good techniques such as <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ug\/visualize-activations-of-a-convolutional-neural-network.html\" target=\"_blank\" rel=\"noopener\">Network Activations and Visualizations<\/a>, and other strategies.\r\n<h6><\/h6>\r\nA few last tips - be sure to document well, and if you've used a pretrained model, make sure the training data and model info are transparent and unbiased.\r\n<h6><\/h6>\r\n<h2>Tuning<\/h2>\r\nA huge part of deep learning is tuning the networks once you are satisfied with the approach. There are many parameters to adjust for improvements in the layer architecture, solvers, and data representation. Again, the apps will help with this as you can examine and adjust the parameters easily in Deep Network Designer, then perform a parameter sweep using the Experiment Manager.\r\n<h6><\/h6>\r\nWe followed a <a href=\"https:\/\/mathworks.com\/help\/deeplearning\/ug\/exp-mgr-classification-example.html\" target=\"_blank\" rel=\"noopener\">doc example<\/a> which shows trying three solvers with googlenet and a simple 'default' network:\r\n<h6><\/h6>\r\n<pre>cd(setupExample('nnet\/ExperimentManagerClassificationExample'));setupExpMgr('MerchandiseClassificationProject');<\/pre>\r\nWe won\u2019t get into the details of available solvers in this post, but this is a great way to explore if you forget the difference between Stochastic Gradient Descent with Momentum (sgdm) and Root Mean Square Propagation (RMSProp) off hand! There's a lot more in the <a href=\"https:\/\/mathworks.com\/help\/deeplearning\/ref\/trainingoptions.html\" target=\"_blank\" rel=\"noopener\">doc<\/a> including a quick overview all parameters available to tune.\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"436\" class=\"alignnone size-large wp-image-8736\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/11\/fig6cropped-1024x436.png\" alt=\"\" \/><\/h6>\r\n<em>Fig 6: Exploring results in Experiment Manager App<\/em>\r\n<h6><\/h6>\r\nWe ran the experiment and googlenet performed much better here (though obviously took longer to train). It's interesting that there is no clear difference in accuracy when comparing the solvers - more data would likely help examine this. However, the solvers made a big difference in the default algorithm with minimal layers (70 vs 80%). This is the type of situation worth checking into if you see such variation!\r\n<h6><\/h6>\r\n<h1>Problem 2: Time series regression<\/h1>\r\nNext we focused on the timeseries regression problem. First, let\u2019s think about the overall architecture.\r\n<h6><\/h6>\r\nCNNs are broadly useful for many problems, but there are times when the model needs to know info from previous time steps. This is where Recursive Neural Networks (<a href=\"https:\/\/mathworks.com\/discovery\/rnn.html\" target=\"_blank\" rel=\"noopener\">RNN<\/a>) come in handy as they retain memory through the system which makes them well-suited to timeseries, video, text, and other sequential problems. In deep learning terminology, CNNs are feed forward, while RNN's are feed backward to carry some memory through the inputs and outputs of the layers.\r\n<h6><\/h6>\r\nIn this case, we looked specifically at <a href=\"https:\/\/mathworks.com\/help\/deeplearning\/ug\/long-short-term-memory-networks.html\" target=\"_blank\" rel=\"noopener\">LSTM<\/a> which is an RNN with extra gates for inputs and outputs. This facilitates retaining longer-term trends in the data, important for time series problems. The illustration below compares the two networks.\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-8613 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/11\/LSTM.png\" alt=\"\" width=\"560\" height=\"150\" \/><\/h6>\r\n<em>Fig 7: Comparison of RNN (left) and LSTM network (right)<\/em>\r\n<h6><\/h6>\r\nWith LSTMs, you often don\u2019t need as many layers as CNNs - the art is in choosing parameters to best represent the data and trends. While I\u2019ve encountered very deep LSTMs, most often the network can learn well with very few layers. For example, the Deep Network Designer has a template with 6 layers: input, lstm, dropout, fullyConnected, softmax, and a classification or regression layer. This is a straightforward architecture where the data prep and layer parameters have a lot of influence.\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"494\" height=\"480\" class=\"alignnone size-full wp-image-8700\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/11\/DNDLSTM-Small.jpg\" alt=\"\" \/><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<em>Fig 8: Deep Network Designer 6 layer template<\/em>\r\n<h6><\/h6>\r\nWe use the same approach as above to compare different network parameters using the Experiment Manager and a <a href=\"https:\/\/mathworks.com\/help\/deeplearning\/ug\/exp-mgr-sequence-regression-example.html\" target=\"_blank\" rel=\"noopener\">doc example<\/a> predicting remaining useful life (RUL) of an engine:\r\n<h6><\/h6>\r\n<pre>cd(setupExample('nnet\/ExperimentManagerSequenceRegressionExample'));setupExpMgr('TurbofanSequenceRegressionProject');<\/pre>\r\n<h6><\/h6>\r\n<h2>Selecting parameters<\/h2>\r\nWe compared two main network parameters: the <em>threshold<\/em>\u00a0and <em>LSTM depth<\/em>. The threshold represents a cutoff value for the response data and the LSTMDepth is the number of layers.\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"549\" height=\"480\" class=\"alignnone size-full wp-image-8706\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/11\/expmgr4-Small.png\" alt=\"\" \/><\/h6>\r\n<em>Fig 9: Comparing main network parameters<\/em>\r\n<h6><\/h6>\r\nA custom metric was used MeanMaxAbsoluteError, which is helpful as you could include any methods you like to judge the goodness-of-fit. We checked the setupfunction, ran the experiment, and anxiously awaited the results!\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"413\" class=\"alignnone size-large wp-image-8679\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/11\/cpScreenshot-from-2021-09-13-10-40-24-1-1024x413.png\" alt=\"\" \/>\r\n\r\n&nbsp;\r\n<h6><\/h6>\r\n<em>Fig 10: Running the experiment and comparing results<\/em>\r\n<h6><\/h6>\r\n<h2>The judges' scores<\/h2>\r\nWith regression problems, where a numeric value is predicted, the common measure of accuracy is RMSE (root mean squared error) between known and predicted data. Ideally, the RMSE is as close to zero as possible.\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"913\" height=\"583\" class=\"alignnone size-full wp-image-8733\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/11\/fig11cropped.png\" alt=\"\" \/><\/h6>\r\n<em>Fig 11: RMSE results<\/em>\r\n<h6><\/h6>\r\nThe <em>best<\/em>\u00a0model (with minimal RMSE) is the network with the smallest threshold (150) and smallest depth (1). In this case, there wasn't any improvement in the results based on depth of the network, so again simplicity is something to consider when setting up your LSTMs and will help with explainability as noted above.\r\n<h6><\/h6>\r\nThere are excellent examples in the doc to show LSTM training and assessment in more details for several problems including video, audio, and text. Sadly, we couldn\u2019t do more comparisons in an hour but maybe next time we can get into more complicated problems now that we've covered the basics! Check out more examples <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/examples.html?category=experiment-manager&amp;s_tid=CRUX_topnav\" target=\"_blank\" rel=\"noopener\">here<\/a>.\r\n<h6><\/h6>\r\n<h1>Summary<\/h1>\r\nWe were able to train, compare and assess these beautiful models (in under an hour!) Hopefully this can give you a sense of how to choose networks for your data and how to set up experiments to tune and compare the networks. Using the apps and carefully thinking about the criteria are super helpful during this process.\u00a0If you\u2019d like to learn more about setting up your own experiments, visit these <a href=\"https:\/\/www.mathworks.com\/videos\/series\/deep-neural-networks.html#experiment-management\" target=\"_blank\" rel=\"noopener\">2 video tutorials from Joe Hicklin<\/a>.\r\n<h6><\/h6>\r\nWe'll be back again for our modeling competition series - subscribe to the <a href=\"https:\/\/www.youtube.com\/channel\/UCgdHSFcXvkN6O3NXvif0-pA\" target=\"_blank\" rel=\"noopener\">@matlab YouTube channel <\/a>to stay tuned for more and stay connected on social media and in the comments. Let us know what you'd like to see next!","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/11\/Picture12.png\" onError=\"this.style.display ='none';\" \/><\/div><p>This post is from Heather Gorr, MATLAB product marketing. You can follow her on social media: @heather.codes,\u00a0@heather.codes,\u00a0@HeatherGorr, and\u00a0@heather-gorr-phd.\u00a0This blog post follows the\u00a0fabulous... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2021\/11\/16\/matlabs-best-model-deep-learning-basics\/\">read more >><\/a><\/p>","protected":false},"author":156,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/8433"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/156"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=8433"}],"version-history":[{"count":70,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/8433\/revisions"}],"predecessor-version":[{"id":8748,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/8433\/revisions\/8748"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=8433"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=8433"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=8433"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}