{"id":2826,"date":"2019-09-03T15:01:10","date_gmt":"2019-09-03T15:01:10","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=2826"},"modified":"2021-04-06T15:49:39","modified_gmt":"2021-04-06T19:49:39","slug":"matlab-wins-hackathon","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2019\/09\/03\/matlab-wins-hackathon\/","title":{"rendered":"MATLAB wins Hackathon"},"content":{"rendered":"<span style=\"font-family: courier;\">This post is from Paola Jaramillo, Application Engineer from the Benelux office.<\/span>\r\n<h6><\/h6>\r\n<h6><\/h6>\r\nBack in February, I attended a hackathon hosted by Itility: meeting for 3 hours to solve an image classification problem while also enjoying pasta and networking with peers. I was there primarily to learn and see how other engineers and researchers were using machine learning in daily-life applications. As the title of this blog post indicates, my team ended up getting impressive results and winning the hackathon!\r\n<h6><\/h6>\r\n<!--more-->\r\n<h3>The Challenge<\/h3>\r\nThe goal of the hackathon was to solve an image classification problem with ties to real-life research:\r\n<h6><\/h6>\r\nGiven a simplified dataset of specific species of plants, can machine learning correctly identify the species in the images.\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6> <a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/09\/image_classification1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-2830 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/09\/image_classification1.png\" alt=\"\" width=\"777\" height=\"233\" \/><\/a><\/h6>\r\nOriginal link to meetup is <span style=\"text-decoration: underline;\"><a href=\"https:\/\/www.meetup.com\/NL-Itility-Hackabrain\/events\/246830750\/\">here<\/a><\/span>.\r\n\r\nWe were not given any restrictions on language or method to use for this classification task. We broke off into teams and each team began brainstorming. Teams decided to tackle this with various approaches:\r\n<h6><\/h6>\r\n<ul>\r\n \t<li>- Traditional image processing skills: use pixel to correctly identify the image<\/li>\r\n \t<li>- Using R and Python based on prior experience with the tools<\/li>\r\n \t<li>- Machine learning approach, preprocessing the images to identify features<\/li>\r\n<\/ul>\r\n<h6><\/h6>\r\n<h3>My Approach<\/h3>\r\nMy group and I had no prior expertise in plants seedlings and image processing to be able to come up with the right engineering features, so we decided to use deep learning techniques on the raw images. Given the size of the dataset and the limited time, we used a simple approach popular in the deep learning community known as <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/gs\/get-started-with-transfer-learning.html\">transfer learning<\/a> instead of starting from scratch.\r\n<h6><\/h6>\r\nWhile people were inspecting the images, and looking for the right libraries and packages to get started, I fired up MATLAB and searched the documentation for a transfer learning example. (<a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/examples\/transfer-learning-using-alexnet.html\">https:\/\/www.mathworks.com\/help\/deeplearning\/examples\/transfer-learning-using-alexnet.html<\/a>)\r\n<h6><\/h6>\r\nThe original example shows completely different objects in the images, so it wasn't clear this would work for our data, but the example shows that by applying transfer learning, the pretrained model AlexNet is able to learn features and classify new images.\r\n<h6><\/h6>\r\nFirst, I changed the input to point to the location of the new data:\r\n<h6><\/h6>\r\n<pre>imagepath = fullfile(pwd,'Subset_from_NonsegmentedV2');\r\nimds = imageDatastore(imagepath, 'IncludeSubfolders',true,...\r\n    'LabelSource','FolderNames')<\/pre>\r\n<h6><\/h6>\r\nThe images were not individually labeled, though they were separated into folders with the name of the specific species as the folder name. imageDatastore can automatically label images based on the folder name, so this saved us quite a lot of effort.\r\n<h6><\/h6>\r\nWe decided before spending time preprocessing the images, we would explore the results of retraining AlexNet from the raw image data. For this, we only needed to resize the images, which is automated by the read function of imageDatastore\r\n<h6><\/h6>\r\n<pre>imagesize = layers(1).InputSize\r\n outputSize = imagesize(1:2);\r\n imds.ReadFcn = @(img)imresize(imread(img),outputSize);<\/pre>\r\n<h6><\/h6>\r\n<em>*note, you can also resize images using a newer function called <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ref\/augmentedimagedatastore.html\">augmentedImageDatastore<\/a> in 19a<\/em>\r\n<h6><\/h6>\r\nWe then split the dataset into training and validation. A separate folder of images was provided for testing.\r\n<h6><\/h6>\r\n<pre>[trainDS,valDS] = splitEachLabel(imds,0.7,'randomized')<\/pre>\r\nThen we ran the training on a simple AlexNet model. This took approx 7 minutes to train with my laptop w\/ GPU. MATLAB automatically detected the GPU and used it for training.\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"692\" height=\"315\" class=\"alignnone size-full wp-image-2832\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/09\/PIcture2_scene.png\" alt=\"\" \/><\/h6>\r\n<pre>opts = trainingOptions('sgdm','InitialLearnRate',0.0001,...\r\n    'ValidationData',valDS,...\r\n    'Plots','training-progress',...\r\n    'MiniBatchSize', 8,... %change according to the memory availability\r\n    'ValidationPatience', 3,...\r\n    'ExecutionEnvironment','auto') %'multi-gpu' or 'parallel' for scaling up to HPC\r\n\r\nhackathon_net = trainNetwork(trainDS, layers_to_train, opts);<\/pre>\r\n<h6><\/h6>\r\nTraining progress plot of the initial model\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"1688\" height=\"997\" class=\"alignnone size-full wp-image-2834\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/09\/training_progress_initial.png\" alt=\"\" \/><\/h6>\r\n&nbsp;\r\n\r\nThe first training produced an accuracy of 92%. Not bad, but was this enough to win it all? I balanced the dataset to use only 100 of each category.\r\n<h6><\/h6>\r\n<pre>imds = splitEachLabel(imds,100,'randomized');\r\n<\/pre>\r\n<h6><\/h6>\r\nWith the balanced dataset, the accuracy became much higher \u2013 resulting in <strong>97%<\/strong> at the end of the session on the test dataset. We were able to try a variety of options and iterations and found that a simple AlexNet model would produce the best results.\r\n<h6><\/h6>\r\nHere is a table of the results by approach taken:\r\n<h6><\/h6>\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td style=\"padding: 10px; width: 10%; border: 1px solid black;\"><strong>Tools<\/strong><\/td>\r\n<td style=\"padding: 10px; width: 15%; border: 1px solid black;\"><strong>MATLAB<\/strong><\/td>\r\n<td style=\"padding: 10px; width: 15%; border: 1px solid black;\"><strong>PyTorch<\/strong><\/td>\r\n<td style=\"padding: 10px; width: 15%; border: 1px solid black;\"><strong>Python<\/strong><\/td>\r\n<td style=\"padding: 10px; width: 15%; border: 1px solid black;\"><strong>Python<\/strong><\/td>\r\n<td style=\"padding: 10px; width: 15%; border: 1px solid black;\"><strong>Python<\/strong><\/td>\r\n<td style=\"padding: 10px; width: 15%; border: 1px solid black;\"><strong>TensorFlow-Keras<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"padding: 3px; border: 1px solid black;\"><strong>Model<\/strong><\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">AlexNet<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">Resnet-50<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">VGG-16<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">2-layer CNN<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">Random Forest<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">InceptionV3<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"padding: 3px; border: 1px solid black;\"><strong>Techniques<\/strong><\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">Dataset balancing<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">Adam optimization<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">data augmentation (more data)<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\"><\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">By color channel<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"padding: 3px; border: 1px solid black;\"><strong>Accuracy<\/strong><\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">97%<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">88%<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">80%<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">77%<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">53%<\/td>\r\n<td style=\"padding: 3px; border: 1px solid black;\">22%<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h6><\/h6>\r\nYou can read more about the hackathon challenge <a title=\"https:\/\/blogs.itility.nl\/en\/image-recognition-model-that-identifies-plant-species (link no longer works)\">here<\/a>. Here is a quote from the blog post:\r\n<blockquote>In the end the winning team used a rather simple 8-layer AlexNet model \u2013 but managed to reach an accuracy of 97% on the unlabeled dataset! And here is an interesting detail \u2013 not only did this team obtain the highest accuracy, they were also the only ones not using R or Python, but MATLAB<\/blockquote>\r\n<h6><\/h6>\r\nIt appeared that people were expecting open source to win this challenge, but MATLAB was the winner!\r\n<h6><\/h6>\r\n<h3>Summary<\/h3>\r\nThis was a great opportunity to work with the Machine Learning community in a real-life challenge and I felt great about my participation and the results.\r\n<h6><\/h6>\r\nIt\u2019s important to remember that I am an engineer with background on signal processing systems and a basic understanding of machine learning, that uses MATLAB to solve a wide variety of problems. I was able to apply deep learning techniques to image data without previous background, in this case simply by searching the documentation for the right example andusing a pretrained model.\r\n<h6><\/h6>\r\nOverall, a quick way to get started with deep learning and put together a working model to solve and win a real-life challenge!\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<span style=\"font-family: courier;\">Thanks again to Paola for her participation in this event and her impressive results with the team. You can download the code from <a href=\"https:\/\/nl.mathworks.com\/matlabcentral\/fileexchange\/68328-deep-learning-hackathon-with-transfer-learning\">FileExchange<\/a>. Leave a comment below for any questions you may have for Paola about this event.<\/span>\r\n<h6><\/h6>\r\n<p><a href=\"https:\/\/twitter.com\/jo_pings?ref_src=twsrc%5Etfw\" class=\"twitter-follow-button\" data-size=\"large\" data-show-count=\"false\">Follow @jo_pings<\/a><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\r\n","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/09\/image_classification1.png\" onError=\"this.style.display ='none';\" \/><\/div><p>This post is from Paola Jaramillo, Application Engineer from the Benelux office.\r\n\r\n\r\nBack in February, I attended a hackathon hosted by Itility: meeting for 3 hours to solve an image classification... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2019\/09\/03\/matlab-wins-hackathon\/\">read more >><\/a><\/p>","protected":false},"author":156,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/2826"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/156"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=2826"}],"version-history":[{"count":6,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/2826\/revisions"}],"predecessor-version":[{"id":2844,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/2826\/revisions\/2844"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=2826"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=2826"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=2826"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}