{"id":4801,"date":"2020-09-04T09:32:28","date_gmt":"2020-09-04T13:32:28","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=4801"},"modified":"2021-04-06T15:45:49","modified_gmt":"2021-04-06T19:45:49","slug":"multiple-order-modeling-for-accuracy-improvement","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2020\/09\/04\/multiple-order-modeling-for-accuracy-improvement\/","title":{"rendered":"Multiple-Order Modeling for Accuracy Improvement"},"content":{"rendered":"<em>The following is a guest post from Mohammad Muquit here to discuss implementing multi-order modeling to improve accuracy of deep learning models.<\/em>\r\n<h6><\/h6>\r\nIn typical classification problems, deep neural network (DNN) accuracy is measured in terms of percentage of correct class predictions against ground truths. However, DNN's final layer contains more information than just a class name, but also a probability density array containing the probability of each class in the DNN. As a result, typical approaches may ignore a significant amount of output data.\r\n<h6><\/h6>\r\nIn this blog we use the idea that this <strong>probability density array<\/strong> itself could be utilized as a set of predictors for additional modeling and look into cases where this lost information can be utilized to improve a DNN's accuracy.\r\n<h6><\/h6>\r\nFig 1. shows the basic idea behind the proposed approach.\r\n\r\n<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-4807 size-large\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/07\/img1-1024x505.png\" alt=\"\" width=\"1024\" height=\"505\" \/>\r\n<h6><\/h6>\r\nWe introduce a quality assurance (QA) application as a case example here.\r\n<h6><\/h6>\r\nFor a given sample product going to be inspected, multiple images of that sample are used as inputs. The images are captured by rotating the sample product in front of a QA imaging system, capturing 60 images per sample. In total, 135 unique samples are used in this case study, where 55 of them had some defects and the remaining 80 samples are normal. We split this data into training and test data as follows:\r\n<h6><\/h6>\r\n<table style=\"height: 170px;\" width=\"367\">\r\n<tbody>\r\n<tr>\r\n<td style=\"border: 1px solid black;\" width=\"139\"><\/td>\r\n<td style=\"border: 1px solid black; text-align: center;\" width=\"115\">training data<\/td>\r\n<td style=\"border: 1px solid black; text-align: center;\" width=\"64\">test data<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"border: 1px solid black; text-align: center;\">normal samples:<\/td>\r\n<td style=\"border: 1px solid black; text-align: center;\">57<\/td>\r\n<td style=\"border: 1px solid black; text-align: center;\">23<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"border: 1px solid black; text-align: center;\">defected samples:<\/td>\r\n<td style=\"border: 1px solid black; text-align: center;\">43<\/td>\r\n<td style=\"border: 1px solid black; text-align: center;\">12<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"border: 1px solid black; text-align: center;\">total unique samples:<\/td>\r\n<td style=\"border: 1px solid black; text-align: center;\">100<\/td>\r\n<td style=\"border: 1px solid black; text-align: center;\">35<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"border: 1px solid black; text-align: center;\">total images:<\/td>\r\n<td style=\"border: 1px solid black; text-align: center;\">6000<\/td>\r\n<td style=\"border: 1px solid black; text-align: center;\">2100<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h6><\/h6>\r\nA DNN is first generated using the training images for transfer learning with GoogleNet. The overall accuracy of the 2100 test images was 67.43%, where the accuracy for <strong>Normal<\/strong> and <strong>Defected<\/strong> images were 67.05% and 87.18%, respectively. Though it can identify individual defected images with higher accuracy, it fails to identify normal images for almost one-third of the cases.\r\n<h6><\/h6>\r\nNote that these results are only at individual image-to-class level, yet we have 60 images per sample. Even when these 60 predicted results are of very low accuracy <em>individually<\/em>, collating them together for an additional model (shown below in Fig2) might result in very high accuracy, which we'll explore in the next section.\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<div id=\"attachment_4809\" style=\"width: 1034px\" class=\"wp-caption alignnone\"><img aria-describedby=\"caption-attachment-4809\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-4809 size-large\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/07\/img2-1024x478.png\" alt=\"\" width=\"1024\" height=\"478\" \/><p id=\"caption-attachment-4809\" class=\"wp-caption-text\">Fig2: Image of the same sample are collated together to form a very long predictor array as input for a second model. This figure shows the idea of <strong>2nd order modeling<\/strong>, but note that this can be further extended to multiple-order modeling.<\/p><\/div>\r\n<h6><\/h6>\r\n<h2>Approaches for multiple-order modeling<\/h2>\r\nI will present 4 approaches to multiple-order modeling from simplest to most complex, all of which improves on the original accuracy of the individual image-to-class accuracy.\r\n\r\nThe code to recreate these experiments and plots is available on <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/79092-multiple-order-modeling-for-deep-learning\"><u>file exchange<\/u><\/a>\r\n<h6><\/h6>\r\n<h3>Naive approach<\/h3>\r\nWe set a rule that if <strong>N<\/strong> out of the 60 images are predicted as Defected, then the sample will be called Defected. Fig3 shows how the accuracy level for Normal and Defected moves with the changes in the value of N. Fig4 shows the ROC curve.\r\n<h6><\/h6>\r\n<pre><span class=\"comment\">%% Analysis of image level prediction probability data<\/span>\r\nload ReadyVariables.mat\r\nnumImg  = width(TrainDataTable)-1;\r\ndfctIndx= TestProdLabels=='Defected'; <span class=\"comment\">%index of rows corresponding to Defected class<\/span>\r\nnrIndx  = ~dfctIndx; <span class=\"comment\">%index of rows corresponding to Normal class<\/span>\r\nNnr     = sum(nrIndx); <span class=\"comment\">%Number of Normal samples<\/span>\r\nNdf     = sum(dfctIndx); <span class=\"comment\">%Number of Defected samples<\/span>\r\nSA      = sum(table2array(TestDataTable) &lt;= 0.5,2); <span class=\"comment\">%Number of images labeled as Defected<\/span> \r\nnAcr = zeros(numImg,1); \r\ndAcr = zeros(numImg,1); \r\n<span class=\"comment\">% For each value of N_d, the accuracy at sample-to-class level is calculated <\/span> \r\nfor k = 1:numImg\r\n    rslt = SA &gt;= k;<span class=\"comment\">%k represents N_d<\/span>\r\n    dAcr(k) = sum(rslt == 1 &amp; dfctIndx == 1)\/Ndf;<span class=\"comment\">%Accuracy regarding Defected sample<\/span>\r\n    nAcr(k) = sum(rslt == 0 &amp; nrIndx == 1)\/Nnr;<span class=\"comment\">%Accuracy regarding Normal sample<\/span>\r\nend\r\n<\/pre>\r\nFrom the coincide of the two accuracy curves in Fig3, we can see that if we set as N as 17 or 18, then Normal and Defected samples can be detected with accuracies as low as 4.35% and 8.33%, respectively. We understand that such approach is not effective for collating outputs of a model with very low prediction accuracy against individual images.\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"560\" height=\"420\" class=\"alignnone size-large wp-image-4811\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/07\/img3.png\" alt=\"\" \/>\r\n<h3>Normal-Defected pattern as predictor for machine learning<\/h3>\r\nIn this second approach, we train a second model by using only category predictions of all the 60 images. So, for each individual sample, we create a 1 x 60 array of binary values, i.e., assigning a value of either 1 or 0 (i.e., <strong>Normal:1<\/strong> or <strong>Defected:0)<\/strong>. We create arrays for training samples, train a model, and then create arrays for test samples to evaluate the model. We find that the accuracy improves to more than 90%. Instead of looking at only the number of 0's or 1's, looking at how the 0's or 1's are arranged in an array is much more efficient in differentiating the samples.\r\n<pre><span class=\"comment\">% Test data table -&gt; Binary pattern Table for test <\/span>\r\nDiscTestDataTable = double(table2array(TestDataTable(:,1:end)) &gt; 0.5); \r\nDiscTestDataTable = array2table(DiscTestDataTable);%Array to table conversion<\/pre>\r\nFor both test and training data, set the data to 1 if normal, 0 if defected. This is indicated by data that is greater than .5 in the original probability density table.\r\n<h6><\/h6>\r\nSo using the first sample in the test data, the conversion would look like this:\r\n<h6><\/h6>\r\nTest Sample 1, images 1-20:\r\n<table style=\"height: 21px;\" width=\"1044\">\r\n<tbody>\r\n<tr>\r\n<td style=\"border: 1px solid black;\" width=\"10\">0.841<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.457<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.685<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.137<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.983<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.808<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.904<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.928<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.979<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.796<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.988<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.285<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.976<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.542<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.411<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.235<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.912<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.287<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.614<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0.484<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nTest Sample 1 with threshold of 0.5:\r\n<table style=\"height: 21px;\" width=\"1044\">\r\n<tbody>\r\n<tr style=\"height: 14.5pt;\">\r\n<td style=\"border: 1px solid black;\" width=\"20\">1<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">1<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">1<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">1<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">1<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">1<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">1<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">1<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">1<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">1<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">1<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">1<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">1<\/td>\r\n<td style=\"border: 1px solid black;\" width=\"20\">0<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nWe train a machine learning classifier to identify the pattern of 0's and 1's to differentiate normal and defected samples.\r\n<h6><\/h6>\r\n<pre><span class=\"comment\">% Machine Learning Model Creation and Evaluation for Binary Pattern<\/span>\r\n\r\nbTM = trainClassifier(DiscTrainDataTable,numImg);\r\n<span class=\"comment\">% Prediction using the Test data (Sample-to-Class level prediction)<\/span>\r\nbP = bTM.predictFcn(DiscTestDataTable);\r\nbAcc = 100*sum(bP==TestProdLabels)\/numel(bP);\r\ndisp(['Thresholded Data Accuracy:', num2str(bAcc),'%'])<\/pre>\r\n<em>Thresholded Data Accuracy: 91.4286%<\/em>\r\n<h6><\/h6>\r\n<h3>Probability distribution value as predictor for machine learning<\/h3>\r\nIn the previous approach, there is no guarantee that 0.5 is the best value to divide the predictors into two different classes. Therefore, using the probability density value itself (a continuous value) as predictor might be the next step for improvement. We train and evaluate models as shown in the code below. Because information loss is reduced, we see a good rise in accuracy (97.14%) in this approach compared to the previous one (91.43%).\r\n<pre><span class=\"comment\">% Machine Learning Model Creation and Evaluation for Probability Density<\/span>\r\n<span class=\"comment\">% Training<\/span>\r\nTM = trainClassifier(TrainDataTable,numImg);%Modeling using continuous pattern as\r\n<span class=\"comment\">% Evaluation<\/span>\r\nP = TM.predictFcn(TestDataTable);% Prediction regarding the Test data (Sample-to-Class level prediction)\r\nacc = 100*sum(P==TestProdLabels)\/numel(P);\r\ndisp(['Normal Data Accuracy:',num2str(acc),'%'])<\/pre>\r\n<h3>Training a LSTM neural network<\/h3>\r\n<h6><\/h6>\r\nThere is still one more point left, which is the coherence among the 60 probability density values obtained against 60 images, which were captured in a time-series manner. The idea is: for a given Defected sample, the defects are supposed to be visible on some of the images out of 60 images. As a result, regarding such Defected samples, probability density value indicating Defected condition should appear in a bunch. Whereas, for a given Normal sample, even if there are some probability density value indicating Defected condition by mistake, they should appear randomly caused by noise or other factors.\r\n<h6><\/h6>\r\nIn this section we use the probability density data for training an LSTM neural network and evaluate its accuracy. This time, the accuracy reaches 100% with a neural network trained with 100 epochs.\r\n\r\n<img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"613\" class=\"alignnone size-large wp-image-4813\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/07\/img4-1024x613.png\" alt=\"\" \/>\r\n<pre>%% Evaluation of the LSTM network\r\nPL = classify(net,tstLstm);% Prediction on test data\r\nlAcc = 100*sum(PL==TestProdLabels)\/numel(PL);%Calculation of accuracy\r\ndisp(['LSTM Accuracy:',num2str(lAcc),'%'])<\/pre>\r\n<em>LSTM Accuracy:100%<\/em>\r\n<h6><\/h6>\r\n<h2>Final comments<\/h2>\r\nIn this blog, we showed that the general approach of interpreting the output data of a neural network into a single decision may not be the best practice to get optimal results. Our introduced approaches show that multiple-order modeling using outputs from prior deep neural network (DNN) with apparently very low accuracy can eventually contribute to high accuracy in practical applications.\r\n<h6><\/h6>\r\n<p class=\"Text\">We also introduced different approaches of multiple-order modeling to show that the use of probability distribution value instead of predicted classes may help obtain better accuracy. In addition, for cases where input image is acquired in a time-series manner, LSTM based approaches might be further helpful in accuracy improvement.<\/p>\r\n\r\n<h6><\/h6>\r\nThe full code is available here: <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/79092-multiple-order-modeling-for-deep-learning\">https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/79092-multiple-order-modeling-for-deep-learning<\/a>\r\n<h6><\/h6>\r\n<em>\r\nHave any comments or questions for Mohammad? Leave a comment below.<\/em>","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/07\/img1-1024x505.png\" onError=\"this.style.display ='none';\" \/><\/div><p>The following is a guest post from Mohammad Muquit here to discuss implementing multi-order modeling to improve accuracy of deep learning models.\r\n\r\nIn typical classification problems, deep neural... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2020\/09\/04\/multiple-order-modeling-for-accuracy-improvement\/\">read more >><\/a><\/p>","protected":false},"author":156,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/4801"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/156"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=4801"}],"version-history":[{"count":52,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/4801\/revisions"}],"predecessor-version":[{"id":5015,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/4801\/revisions\/5015"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=4801"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=4801"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=4801"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}