{"id":2319,"date":"2019-06-03T16:30:07","date_gmt":"2019-06-03T16:30:07","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=2319"},"modified":"2021-04-06T15:50:40","modified_gmt":"2021-04-06T19:50:40","slug":"ensemble-learning","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2019\/06\/03\/ensemble-learning\/","title":{"rendered":"Ensemble Learning"},"content":{"rendered":"<strong>Combining Deep Learning networks to increase prediction accuracy.<\/strong>\r\n<h6><\/h6>\r\n<em>The following post is from Maria Duarte Rosa, who wrote a great post on <a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2019\/01\/18\/neural-network-feature-visualization\/\">neural network feature visualization<\/a>, talking about ways to increase your model prediction accuracy. <\/em>\r\n<h6><\/h6>\r\n<ul>\r\n \t<li>Have you tried training different architectures from scratch?<\/li>\r\n \t<li>Have you tried different weight initializations?<\/li>\r\n \t<li>Have you tried transfer learning using different pretrained models?<\/li>\r\n \t<li>Have you run cross-validation to find the best hyperparameters?<\/li>\r\n<\/ul>\r\nIf you answered Yes to any of these questions, this post will show you how to take advantage of your trained models to increase the accuracy of your predictions. Even if you answered No to all 4 questions, the simple techniques below may still help to increase your prediction accuracy.\r\n<h6><\/h6>\r\nFirst, let's talk about ensemble learning.\r\n<h6><\/h6>\r\n<h3><strong>What is ensemble learning?<\/strong><\/h3>\r\n<h6><\/h6>\r\nEnsemble learning or model ensembling, is a well-established set of machine learning and statistical techniques [LINK: <a href=\"https:\/\/doi.org\/10.1002\/widm.1249\">https:\/\/doi.org\/10.1002\/widm.1249<\/a>] for improving predictive performance through the combination of different learning algorithms. The combination of the predictions from different models is generally more accurate than any of the individual models making up the ensemble. Ensemble methods come in different flavours and levels of complexity (for a review see <a href=\"https:\/\/arxiv.org\/pdf\/1106.0257.pdf\">https:\/\/arxiv.org\/pdf\/1106.0257.pdf<\/a>), but here we focus on combining the predictions of multiple deep learning networks that have been previously trained.\r\n<h6><\/h6>\r\nDifferent networks make different mistakes and the combination of these mistakes can be leveraged through model ensembling. Although not so popular in the deep learning literature as it is for more traditional machine learning research, model ensembling for deep learning has led to impressive results, specially in highly popular competitions, such as ImageNet and other Kaggle challenges. These competitions are commonly won by ensembles of deep learning architectures.\r\n<h6><\/h6>\r\nIn this post, we focus on three very simple ways of combining predictions from different deep neural networks:\r\n<ol>\r\n \t<li><strong>Averaging<\/strong>: a simple average over all the predictions (output of the softmax layer) from the different networks<\/li>\r\n \t<li><strong>Weighted average:<\/strong> the weights are proportional to an individual model's performance. For example, the predictions for the best model could be weighted by 2, while the rest of the models have no weight;<\/li>\r\n \t<li><strong>Majority voting:<\/strong> for each test observation, the prediction is the most frequent class in all predictions<\/li>\r\n<\/ol>\r\nWe will use two examples to illustrate how these techniques can increase the accuracy in the following situations:\r\nExample 1: combining different architectures trained from scratch.\r\nExample 2: combining different pretrained models for transfer learning.\r\n\r\nEven though we picked two specific use cases, these techniques apply to most situations where you have trained multiple deep learning networks, including networks trained on different datasets.\r\n<h3>Example 1 \u2013 combining different architectures trained from scratch<\/h3>\r\nHere we use the CiFAR-10 dataset to train from scratch 6 different ResNet architectures. We follow this example [LINK: <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/examples\/train-residual-network-for-image-classification.html\">https:\/\/www.mathworks.com\/help\/deeplearning\/examples\/train-residual-network-for-image-classification.html<\/a>] but instead of training a single architeture we vary the number of units and network width using the following 6 combinations: numUnits = [3 6 9 12 18 33]; and netWidth = [12 32 16 24 9 6]. We train each network using the same training options as in the example and estimate their individual validation errors (<em>validation error = 100 - prediction accuracy<\/em>):\r\n<h6><\/h6>\r\n<span style=\"font-family:consolas;\"> <strong>Individual validation errors:<\/strong><\/span>\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">Network 1: 16.36%<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">Network 2: 7.83%<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">Network 3: 9.52%<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">Network 4: 7.68%<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">Network 5: 10.36%<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">Network 6: 12.04%<\/span><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h6><\/h6>\r\nWe then calculated the errors for the three different ensembling techniques:\r\n<h6><\/h6>\r\n<span style=\"font-family: consolas;\">\r\n<strong>Model ensembling errors:<\/strong><\/span>\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">Average: 6.79%<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">Weighted average: 6.79% (Network 4 counted twice).<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">Majority vote: 7.16%<\/span><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n\r\nA quick chart of these numbers:\r\n<pre>figure; bar(example1Results); title('Example 1: prediction errors (%)');\r\nxticklabels({'Network 1','Network 2','Network 3', 'Network 4', 'Network 5', 'Network 6', ...\r\n'Average', 'Weighted average', 'Majority vote'}); xtickangle(40)\r\n<\/pre>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"625\" height=\"503\" class=\"alignnone size-full wp-image-2331\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/05\/2019-05-29_11-33-49.png\" alt=\"\" \/>\r\n\r\n\r\nThe ensemble prediction errors are smaller than any of the individual models. The difference is small but in 10000 images it means that 89 images are now correctly classified in comparison with the best individual model. We can see some examples of these images:\r\n\r\n<pre>% Plot some data (misclassified for best model)\r\nload Example1Results.mat\r\nfigure;\r\nfor i = 1:4\r\nsubplot(2,2,i);imshow(dataVal(:,:,:,i))\r\ntitle(sprintf('Best model: %s \/ Ensemble: %s',bestModelPreds(i),ensemblePreds(i)))\r\nend\r\n<\/pre>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"665\" height=\"486\" class=\"alignnone size-full wp-image-2333\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/05\/2019-05-29_11-34-56.png\" alt=\"\" \/>\r\n<h3>Example 2 \u2013 combining different pretrained models for transfer learning<\/h3>\r\nIn this example we use again the CiFAR-10 dataset but this time we use different pretrained models for transfer learning. The models were originally trained on ImageNet and can be dowloaded as support packages [LINK: <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/8743315-mathworks-deep-learning-toolbox-team\">https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/8743315-mathworks-deep-learning-toolbox-team<\/a>]. We used googlenet, squeezenet, resnet18, xception and mobilenetv2 and followed the transfer learning example [LINK: <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/examples\/train-deep-learning-network-to-classify-new-images.html\">https:\/\/www.mathworks.com\/help\/deeplearning\/examples\/train-deep-learning-network-to-classify-new-images.html<\/a>]\r\n<h6><\/h6>\r\n<strong>Individual validation errors:<\/strong>\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">googlenet: 7.23%<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">squeezneet: 12.89%<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">resnet18: 7.75%<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">xception: 3.92%<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">mobilenetv2: 6.96%<\/span><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h6><\/h6>\r\n<span style=\"font-family: consolas\"><strong>\r\nModel ensembling errors:<\/strong><\/span>\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">Average: 3.56%<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">Weighted average: 3.28% (Xception counted twice).<\/span><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><span style=\"font-family: consolas\">Majority vote: 4.04%<\/span><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n\r\n<pre>% Plot errors\r\nfigure;bar(example2Results); title('Example 2: prediction errors (%)');\r\nxticklabels({'GoogLeNet','SqueezeNet','ResNet-18', 'Xception', 'MobileNet-v2', ...\r\n    'Average', 'Weighted average', 'Majority vote'}); xtickangle(40)<\/pre>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"610\" height=\"502\" class=\"alignnone size-full wp-image-2343\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/05\/2019-05-29_11-35-17.png\" alt=\"\" \/>\r\n\r\n\r\n\r\n\r\n\r\nAgain the ensemble prediction errors are smaller than any of the individual models and 64 more images were correctly classified. These included:\r\n\r\n<pre>% Plot some data (misclassified for best model)\r\nload Example2Results.mat\r\nfigure;\r\nfor i = 1:4\r\nsubplot(2,2,i);imshow(dataVal(:,:,:,i))\r\ntitle(sprintf('Best model: %s \/ Ensemble: %s',bestModelPreds(i),ensemblePreds(i)))\r\nend\r\n<\/pre>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"652\" height=\"489\" class=\"alignnone size-full wp-image-2345\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/05\/2019-05-29_11-35-34.png\" alt=\"\" \/>\r\n\r\n<h3>What else should I know?<\/h3>\r\n\r\nModel ensembling can significantly increase prediction time, which makes it impractical in applications where the cost of inference time is higher than the cost of making the wrong predictions.\r\n<h6><\/h6>\r\nOne other thing to note is that performance does not increase monotonically with the number of networks. Typically, as this number increases, training time significantly increases but the return of combining all models diminishes. \r\n<h6><\/h6>\r\nThere isn\u2019t a single magic number for how many networks one should combine. This is heavily dependent on the networks, data, and computational resources. Having said this, performance tends to improve the more model variety we have in an ensemble.\r\n<h6><\/h6>\r\nHope you found this useful - Have you tried ensemble learning or thinking of trying it? Leave a comment below. \r\n\r\n<p><a href=\"https:\/\/twitter.com\/jo_pings?ref_src=twsrc%5Etfw\" class=\"twitter-follow-button\" data-size=\"large\" data-show-count=\"false\">Follow @jo_pings<\/a><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\r\n","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/05\/2019-05-29_11-33-49.png\" onError=\"this.style.display ='none';\" \/><\/div><p>Combining Deep Learning networks to increase prediction accuracy.\r\n\r\nThe following post is from Maria Duarte Rosa, who wrote a great post on neural network feature visualization, talking about ways... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2019\/06\/03\/ensemble-learning\/\">read more >><\/a><\/p>","protected":false},"author":156,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/2319"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/156"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=2319"}],"version-history":[{"count":28,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/2319\/revisions"}],"predecessor-version":[{"id":2383,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/2319\/revisions\/2383"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=2319"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=2319"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=2319"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}