{"id":2570,"date":"2019-07-24T19:11:22","date_gmt":"2019-07-24T19:11:22","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=2570"},"modified":"2021-04-06T15:49:52","modified_gmt":"2021-04-06T19:49:52","slug":"deep-learning-for-medical-imaging","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2019\/07\/24\/deep-learning-for-medical-imaging\/","title":{"rendered":"Deep Learning for Medical Imaging"},"content":{"rendered":"\u200b\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td style=\"width: 60%; vertical-align: middle;\"><span style=\"font-family: courier;\">We have a very special post today from Jakob Kather from Heidelberg, Germany (Twitter: <a href=\"https:\/\/twitter.com\/jnkath\">jnkath<\/a>). He will be talking about deep learning for medical applications. Jakob is also one of the authors of a new paper recently published in Nature Medicine: <a href=\"https:\/\/www.nature.com\/articles\/s41591-019-0462-y\">https:\/\/www.nature.com\/articles\/s41591-019-0462-y<\/a> discussing deep learning predicting gastrointestinal cancer. <\/span><\/td>\r\n<td><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-2590 size-thumbnail\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/07\/Kather-Jakob_NCT_01-150x150.jpg\" alt=\"\" width=\"150\" height=\"150\" \/><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h6><\/h6>\r\nDeep learning-based image analysis is well suited to classifying cats versus dogs, sad versus happy faces, and pizza versus hamburgers. However, many people struggle to apply deep learning to medical imaging data. In theory, it should be easy to classify tumor versus normal in medical images; in practice, this requires some tricks for data cleaning and model training and deployment.\r\n<h6><\/h6>\r\nHere, we will show how to use deep learning in MATLAB to preprocess and classify complex medical images. For this demo, we'll be primarily using Deep Learning Toolbox and Image Processing Toolbox. On the hardware side, it's best to have a compatible GPU installed and ready to use in MATLAB (see <a href=\"https:\/\/www.mathworks.com\/solutions\/gpu-computing.html\">https:\/\/www.mathworks.com\/solutions\/gpu-computing.html<\/a>).\r\n<h6><\/h6>\r\nOur aim is to find tumor tissue in histological images**.\r\n<h6><\/h6>\r\n<em>**Do you wonder what \"histological images\" are? In almost all cancer patients, the tumor is cut out by a surgeon, thinly sliced, put onto glass slides, stained and viewed under a microscope. Thus, we can see everything from cells on a micrometer scale to tissue structures on a millimeter scale. <\/em>\r\n<h6><\/h6>\r\nThousands of such images are freely available in public repositories. Some of these repositories are available at the National Institutes of Health (NIH) data portal. From <a href=\"https:\/\/portal.gdc.cancer.gov\">https:\/\/portal.gdc.cancer.gov<\/a> we can download tumor images such as this (in this case, a lung cancer):\r\n<h6><\/h6>\r\n<a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/07\/image_fullsize_fullscale.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-2578 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/07\/image_fullsize_fullscale.png\" alt=\"\" width=\"1831\" height=\"915\" \/><\/a>\r\n<h6><\/h6>\r\nThese images are in SVS format, which is essentially a multi-layer TIFF image.\r\n<h6><\/h6>\r\nThis may look like an ordinary image, but SVS images are huge: the files are often larger than 1 GB and the images have up to a billion pixels. A zoomed in version of one section of this image shows how large this image is:\r\n\r\n<div id=\"attachment_2580\" style=\"width: 1034px\" class=\"wp-caption alignleft\"><a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/07\/image_zoomedin1.png\"><img aria-describedby=\"caption-attachment-2580\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-2580 size-large\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/07\/image_zoomedin1-1024x409.png\" alt=\"\" width=\"1024\" height=\"409\" \/><\/a><p id=\"caption-attachment-2580\" class=\"wp-caption-text\">This image shows how much detail is contained in a very small portion of the image. We are zoomed in on the red dot shown in the upper right full image viewer.<\/p><\/div>\r\n<h6>Images courtesy of National Cancer Institute.<\/h6>\r\nMany people struggle to even load these images, but MATLAB has some nice functions to deal with this huge amount of data. In particular, we will be using the functions imfinfo (to extract metadata), imread (to read the thumbnail) and blockproc (to read the actual image data without loading the full image into RAM).\r\n<h6><\/h6>\r\nSo, let's use MATLAB to look at these images. We start by downloading an example image from the TCGA database. The image in this post can be found here:\u00a0<a href=\"https:\/\/portal.gdc.cancer.gov\/files\/0afb5489-719c-4e4d-bb8a-e0e146f0adb2\">https:\/\/portal.gdc.cancer.gov\/files\/0afb5489-719c-4e4d-bb8a-e0e146f0adb2<\/a>\r\n<h6><\/h6>\r\n<pre>% define the image name\r\nimName = 'TCGA-NK-A5CR-01Z-00-DX1.A7C57B30-E2C6-4A23-AE71-7E4D7714F8EA.svs'; \r\nimInfo = imfinfo(imName); % get the metadata<\/pre>\r\n<h6><\/h6>\r\nSVS images are essentially multipage TIFFs and we can use imfinfo() to look at the metadata of each page.\r\n<h6><\/h6>\r\n<pre>for i = 1:numel(imInfo)\r\n\u00a0\u00a0\u00a0 X = ['Layer ', num2str(i), ': Width ',num2str(imInfo(i).Width), ...\r\n     ' and Height ', num2str(imInfo(i).Height)];\r\n\u00a0\u00a0\u00a0 disp(X)\r\nend<\/pre>\r\nThe base image (channel 1) is so big that we cannot even look at it... but let's look at some of the smaller images instead.\r\n<h6><\/h6>\r\n<pre>imshow(imread(imName,2))\r\nimshow(imread(imName,6))\r\nimshow(imread(imName,7))<\/pre>\r\nFor each channel, we can look at the metadata.\r\n<h6><\/h6>\r\n<pre>disp(['this image has ',num2str(imInfo(5).Width),'*',num2str(imInfo(5).Height),' pixels'])<\/pre>\r\n<h6><\/h6>\r\n&gt;&gt; this image has 3019*1421 pixels\r\n<h6><\/h6>\r\nWe can see this image is mostly background and contains non-tumor and tumor tissue.\u00a0Because we care about the tumor tissue and not so much about the surrounding normal tissue, we want to identify the tumor region.\r\n<h6><\/h6>\r\n<h6>Note if you are a non-medical person, here is the image annotated with\u00a0the tumor labeled.<\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"1143\" height=\"983\" class=\"alignnone size-full wp-image-2574\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/07\/image_annotated.jpg\" alt=\"\" \/>\r\n<h6><\/h6>\r\nLet us use a transfer learning approach with AlexNet. We will load the default pretrained AlexNet model which has already learned to distinguish shapes such as circles or lines.\r\n<h6><\/h6>\r\n<pre>net = alexnet; % load an alexnet which is pretrained on ImageNet<\/pre>\r\n<h6><\/h6>\r\nNow, we want to re-train the model as a tumor detector. We will use a public data set of 100,000 histological images of colon cancer, which is available at <a href=\"http:\/\/dx.doi.org\/10.5281\/zenodo.1214456\">http:\/\/dx.doi.org\/10.5281\/zenodo.1214456<\/a>. This set has been derived from colorectal cancer samples, but the workflow is identical for any type of solid tumor.\r\n<h6><\/h6>\r\nThis is how these smaller images (patches) look: They are labeled with one of nine classes which are explained in more detail in the data repository. Our aim is to train a deep neural network to automatically detect these classes.\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"1446\" height=\"1382\" class=\"alignnone size-full wp-image-2582\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/07\/journal.pmed_.1002730.g001_smaller.png\" alt=\"\" \/>\r\n<span style=\"font-size: 10px;\">These images represent different classes of tissue that were manually defined by a pathologist. Each row is a tissue class and contains random images from the images set. The class labels are as follows: ADI = adipose tissue (fat), BACK = background (no tissue), DEB = debris, LYM = lymphocytes, MUC = mucus, MUS = muscle, NORM = normal mucosa, STR = stroma, TUM = tumor epithelium.The classes are described in more detail here: <a href=\"https:\/\/journals.plos.org\/plosmedicine\/article?id=10.1371\/journal.pmed.1002730\">https:\/\/journals.plos.org\/plosmedicine\/article?id=10.1371\/journal.pmed.1002730<\/a> and here: <a href=\"https:\/\/www.nature.com\/articles\/srep27988\">https:\/\/www.nature.com\/articles\/srep27988<\/a>.<\/span>\r\n<h6><\/h6>\r\nAfter downloading the ZIP files from the repository and extracting them to a folder called \"images\", we have one sub-folder per tissue class in \"images.\" We can now load them, split into a training, validation and test set, and re-train our alexnet model. (Optionally, we could use the <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ref\/imagedataaugmenter.html\">imageDataAugmenter <\/a>to create even more training images with rotational variance, for example).\r\n<h6><\/h6>\r\nCollect all pictures from the folder image and put them in a datastore. Subfolders are included, the category\/label is determined by the folder names.\r\n<h6><\/h6>\r\n<pre>allImages = imageDatastore('.\/images\/','IncludeSubfolders',true,'LabelSource','foldernames');<\/pre>\r\nSplit into three sets: 40% training, 20% validation, 40% test\r\n<h6><\/h6>\r\n<pre>[training_set, validation_set, testing_set] = splitEachLabel(allImages,.4,.2,.4);<\/pre>\r\n<h2>Network modification<\/h2>\r\nModify the network by removing the last three layers. We will replace these layers with new layers for our custom classification.\r\n<h6><\/h6>\r\n<pre>layersTransfer = net.Layers(1:end-3);<\/pre>\r\nDisplay the output categories.\r\n<h6><\/h6>\r\n<pre>categories(training_set.Labels)<\/pre>\r\nans = <em>9\u00d71 cell array<\/em>\r\n<h6><\/h6>\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td>{'ADI'}<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>{'BACK'}<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>{'DEB'}<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>{'LYM'}<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>{'MUC'}<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>{'MUS'}<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>{'NORM'}<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>{'STR'}<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>{'TUM'}<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h6><\/h6>\r\n<pre>numClasses = numel(categories(training_set.Labels));<\/pre>\r\nWe merge the layers, and set the weight and bias learning rate for the last fully connected layer 'fc'\r\n<h6><\/h6>\r\n<pre>layers = [\r\n\u00a0\u00a0\u00a0 layersTransfer\r\n\u00a0\u00a0\u00a0 fullyConnectedLayer(numClasses,'Name', 'fc','WeightLearnRateFactor',1,'BiasLearnRateFactor',1)\r\n\u00a0\u00a0\u00a0 softmaxLayer('Name', 'softmax')\r\n\u00a0\u00a0\u00a0 classificationLayer('Name', 'classOutput')];<\/pre>\r\nSet up a layerGraph and plot it:\r\n<h6><\/h6>\r\n<pre>lgraph = layerGraph(layers);\r\nplot(lgraph)<\/pre>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"840\" height=\"630\" class=\"alignnone size-full wp-image-2584\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/07\/network-arch.png\" alt=\"\" \/>\r\n<h2>Modify Training Parameters<\/h2>\r\nWe now modify the training set and training options. The training set must be resized to fit the input size expected by the network.\r\n<h6><\/h6>\r\n<pre>imageInputSize = [227 227 3];\r\naugmented_training_set = augmentedImageSource(imageInputSize,training_set);<\/pre>\r\naugmented_training_set =\r\n<h6><\/h6>\r\n<em>augmentedImageDatastore with properties:<\/em>\r\n<h6><\/h6>\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td>NumObservations: 39999<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Files: {39999\u00d71 cell}<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>AlternateFileSystemRoots: {}<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>MiniBatchSize: 128<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>DataAugmentation: 'none'<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>ColorPreprocessing: 'none'<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>OutputSize: [227 227]<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>OutputSizeMode: 'resize'<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>DispatchInBackground: 0<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h6><\/h6>\r\n<pre>resized_validation_set = augmentedImageDatastore(imageInputSize,validation_set);\r\nresized_testing_set = augmentedImageDatastore(imageInputSize,testing_set);<\/pre>\r\nSet the training options, including plotting the training progress as the network trains.\r\n<h6><\/h6>\r\n<pre>opts = trainingOptions('sgdm', ...\r\n\u00a0\u00a0\u00a0 'MiniBatchSize', 64,... % mini batch size, limited by GPU RAM, default 100 on Titan, 500 on P6000\r\n\u00a0\u00a0\u00a0 'InitialLearnRate', 1e-5,... % fixed learning rate\r\n\u00a0\u00a0\u00a0 'L2Regularization', 1e-4,... % optimization L2 constraint\r\n\u00a0\u00a0\u00a0 'MaxEpochs',15,... % max. epochs for training, default 3\r\n\u00a0\u00a0\u00a0 'ExecutionEnvironment', 'gpu',...% environment for training and classification, use a compatible GPU\r\n\u00a0\u00a0\u00a0 'ValidationData', resized_validation_set,...\r\n\u00a0\u00a0\u00a0 'Plots', 'training-progress')<\/pre>\r\n&nbsp;\r\n<h2>Training<\/h2>\r\nWe trained the network for 3.5 hours on a single GPU, but training for a few minutes would actually be enough to get a reasonable result as seen in the training plot below.\r\n<h6><\/h6>\r\n<pre>net = trainNetwork(augmented_training_set, lgraph, opts)<\/pre>\r\n<h6><a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/07\/plotprogress.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-2586 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/07\/plotprogress.png\" alt=\"\" width=\"1430\" height=\"769\" \/><\/a><\/h6>\r\n<h6><\/h6>\r\n<h2>Testing and Prediction<\/h2>\r\nLet's check how well our classifier works using the held-out subset.\r\n<h6><\/h6>\r\n<pre>[predLabels,predScores] = classify(net, resized_testing_set, 'ExecutionEnvironment','gpu');<\/pre>\r\nWe can look at the confusion matrix and at the overall classification accuracy:\r\n<h6><\/h6>\r\n<pre>plotconfusion(testing_set.Labels, predLabels)\r\nPerItemAccuracy = mean(predLabels == testing_set.Labels);\r\ntitle(['overall per image accuracy ',num2str(round(100*PerItemAccuracy)),'%'])<\/pre>\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"900\" height=\"900\" class=\"alignnone size-full wp-image-2588\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/07\/confusionMatrix.png\" alt=\"\" \/><\/h6>\r\n<h6><\/h6>\r\nVoila! We have achieved an excellent classification performance (as you will see in <a href=\"https:\/\/www.nature.com\/articles\/s41591-019-0462-y\">this paper<\/a>). We are now ready to build more complicated workflows for digital pathology that include an automatic tumor detector!\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<span style=\"font-family: courier;\">I want to thank Jakob again for taking the time to give us insight into his research using MATLAB. A special thanks to Jakob Sommer for testing the source code in this post. Have any questions about this post? Leave a comment below.<\/span>\r\n<h6><\/h6>\r\n<p><a href=\"https:\/\/twitter.com\/jo_pings?ref_src=twsrc%5Etfw\" class=\"twitter-follow-button\" data-size=\"large\" data-show-count=\"false\">Follow @jo_pings<\/a><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2019\/07\/Kather-Jakob_NCT_01-150x150.jpg\" onError=\"this.style.display ='none';\" \/><\/div><p>\u200b\r\n\r\n\r\n\r\nWe have a very special post today from Jakob Kather from Heidelberg, Germany (Twitter: jnkath). He will be talking about deep learning for medical applications. Jakob is also one of the... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2019\/07\/24\/deep-learning-for-medical-imaging\/\">read more >><\/a><\/p>","protected":false},"author":156,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/2570"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/156"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=2570"}],"version-history":[{"count":31,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/2570\/revisions"}],"predecessor-version":[{"id":2650,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/2570\/revisions\/2650"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=2570"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=2570"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=2570"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}