{"id":8145,"date":"2021-11-02T09:00:27","date_gmt":"2021-11-02T13:00:27","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=8145"},"modified":"2021-10-20T16:03:57","modified_gmt":"2021-10-20T20:03:57","slug":"handling-very-large-images-in-medical-imaging-applications","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2021\/11\/02\/handling-very-large-images-in-medical-imaging-applications\/","title":{"rendered":"Handling very large images in medical imaging applications"},"content":{"rendered":"<em>This post is from\u00a0<a href=\"http:\/\/www.ogemarques.com\/\">Oge Marques, PhD<\/a>\u00a0and Professor of Engineering and Computer Science at FAU. Oge is a\u00a0<a href=\"https:\/\/protect-us.mimecast.com\/s\/sFvRCERy7AhoOq41INYhhZ?domain=sigmaxi.org\">Sigma Xi Distinguished Speaker<\/a>,\u00a0<a href=\"https:\/\/protect-us.mimecast.com\/s\/SYBFCG6A7DIZE83WI7UFu1?domain=ogemarques.com\/\">book author<\/a>, and\u00a0<a href=\"https:\/\/protect-us.mimecast.com\/s\/RPeGCJ61jJILln9pizR7yv?domain=aaas.org\">AAAS Leshner Fellow<\/a>.\u00a0He also happens to be a MATLAB aficionado and has been using MATLAB in his classroom for more than 20 years. You can follow him on Twitter (<a href=\"https:\/\/twitter.com\/ProfessorOge\">@ProfessorOge<\/a>).<\/em>\r\n<h6><\/h6>\r\nThe field of computational pathology (CPATH) consists of using algorithms to analyze digital images obtained through scanning slides of cells and tissues. In recent years, deep learning algorithms that show comparable performance to trained pathologists have been developed for several classification, regression, and segmentation tasks, such as tumor detection and grading.\r\n<h6><\/h6>\r\nApplying deep learning (DL) techniques to analyze histopathological tissue sections will start just like any other deep learning application (making sure to adopt a principled\u00a0<a href=\"https:\/\/www.kdnuggets.com\/2020\/09\/mathworks-deep-learning-workflow.html\" rel=\"nofollow\">workflow<\/a>\u00a0and running a large number of\u00a0<a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/tracking-experiments-to-improve-ai-accuracy\" rel=\"nofollow\">experiments<\/a>\u00a0in a structured way) but will have a significant additional complicating factor: the need to acquire, label, store, display, and process gigapixel-sized images.\r\n<h6><\/h6>\r\nIn this blog post we\u2019ll go through an overview of the latest developments in deep learning for CPATH and show you how to handle very large images using MATLAB.\r\n<h6><\/h6>\r\n<h2>An overview of deep learning in computational pathology<\/h2>\r\n<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-8208 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/10\/data-science-central-blog-wsi-patch-nn-diagram-1.jpg\" alt=\"\" width=\"710\" height=\"168\" \/>\r\n<h6><\/h6>\r\n<em>Figure 1: Deep learning workflow in computation pathology.<\/em>\r\n<h6><\/h6>\r\nFigure 1 provides an overview of the deep learning workflow in computation histopathology. Essentially, a deep neural network is trained using\u00a0<em>patches<\/em>\u00a0that must be extracted from a gigapixel-sized whole-slide image (WSI). The choices of\u00a0<strong>architecture<\/strong>\u00a0(convolutional neural networks (CNNs), fully convolutional networks (FCNs), recurrent neural networks (RNNs), autoencoders, or generative adversarial networks (GANs)) and\u00a0<strong>learning paradigm<\/strong>\u00a0(supervised, weakly supervised, fully unsupervised, transfer learning) depend on whether the images are\u00a0<strong>labeled<\/strong>\u00a0and the histopathological image analysis\u00a0<strong>task<\/strong>\u00a0at hand: CNNs and FCNs are the most widely used architectures for (weakly) supervised learning tasks whereas autoencoders and GANs are popular choices under the unsupervised learning paradigm.\r\n<h6><\/h6>\r\n<ul>\r\n \t<li><strong>Supervised learning<\/strong>\u00a0methods in CPATH have been used for:\u00a0<strong>classification<\/strong>\u00a0tasks, e.g., predicting whether a patch should be labeled as healthy or cancerous;\u00a0<strong>regression<\/strong>, e.g., detection or localization of cells in histopathology images; and\u00a0<strong>segmentation<\/strong>\u00a0of structures from histology images.<\/li>\r\n \t<li><strong>Weakly supervised learning<\/strong>\u00a0techniques exploit coarse-grained (image-level) annotations (e.g.,\u00a0<em>cancer<\/em>\u00a0or\u00a0<em>non-cancer<\/em>) to automatically infer \ufb01ne-grained (pixel\/patch-level) information, therefore reducing the annotation burden on a pathologist. The most popular paradigm in this category is multiple-instance learning (MIL), in which a training set consists of\u00a0<em>bags<\/em>, WSIs labeled as positive or negative; and each bag includes many\u00a0<em>instances<\/em>, image patches whose label is to be predicted or unknown. The main goal is to train a classifier to predict both bag-level and instance-level labels, while only bag-level labels are given in the training set.<\/li>\r\n \t<li><strong>Unsupervised<\/strong>\u00a0(lately rebranded as \u201cself-supervised\u201d)\u00a0<strong>learning<\/strong>\u00a0is still a young field within deep learning and the applications to CPATH are just starting to appear in the literature.<\/li>\r\n \t<li><strong>Transfer learning<\/strong>\u00a0approaches are the most widely adopted in histopathology, typically using pretrained (using ImageNet images) models such as\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1409.1556\" rel=\"nofollow\">VGGNet<\/a>,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1409.4842\" rel=\"nofollow\">Inception<\/a>,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1512.03385\" rel=\"nofollow\">ResNet<\/a>,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1704.04861\" rel=\"nofollow\">MobileNet<\/a>, and\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1608.06993\" rel=\"nofollow\">DenseNet<\/a>. These pretrained models have been extensively used in several cancer grading and prognosis tasks, including public\u00a0<a href=\"https:\/\/grand-challenge.org\/challenges\/\" rel=\"nofollow\">challenges<\/a>\u00a0in the field, such as\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1808.04277\" rel=\"nofollow\">BACH<\/a>\u00a0or\u00a0<a href=\"https:\/\/camelyon17.grand-challenge.org\/\" rel=\"nofollow\">CAMELYON<\/a>.<\/li>\r\n<\/ul>\r\n<h3>Challenges in acquiring and processing whole-slide images<\/h3>\r\nRegardless of the architecture, learning scheme, or application, digital pathology solutions typically require acquiring and processing a significant number of very large (gigapixel-sized) whole slide images (WSIs), whose contents are often analyzed based on smaller patches (or blocks).\r\n<h6><\/h6>\r\nThe challenges with\u00a0<em>acquiring<\/em>\u00a0WSIs include:\r\n<h6><\/h6>\r\n<ul>\r\n \t<li><strong>Data availability<\/strong>: there are relatively few publicly available data sets in this field (e.g.,\u00a0<a href=\"https:\/\/camelyon17.grand-challenge.org\/\" rel=\"nofollow\">CAMELYON<\/a>,\u00a0<a href=\"https:\/\/warwick.ac.uk\/fac\/cross_fac\/tia\/data\/her2contest\/\" rel=\"nofollow\">HER2<\/a>,\u00a0<a href=\"https:\/\/iciar2018-challenge.grand-challenge.org\/Dataset\/\" rel=\"nofollow\">BACH<\/a>) and much of the published research employs proprietary WSI data sets. The Digital Pathology Association (DPA) maintains a\u00a0<a href=\"https:\/\/digitalpathologyassociation.org\/whole-slide-imaging-repository\" rel=\"nofollow\">website<\/a>\u00a0with a list of image repositories.<\/li>\r\n \t<li><strong>Image format<\/strong>: different scanners output the images using different proprietary file formats (PFFs), creating additional difficulties for data exchange, archiving, and online publication. The lack of a universal image format brings additional costs and potential delays to the curation of large data sets. There have been discussions toward wide\u00a0<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC6774793\/\" rel=\"nofollow\">adoption of a single open source file format<\/a>, including the possibility of\u00a0<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC6236926\/\" rel=\"nofollow\">adopting the DICOM standard<\/a>\u00a0for whole slide image encoding.<\/li>\r\n \t<li><strong>Image size<\/strong>: once you successfully acquire enough histopathology images and convert them to a useful format (e.g., TIFF), you must be prepared for the fact that\u00a0<em>each image file<\/em>\u00a0will typically be in the order of a few GB, and plan for the associated implications (such as storage space and network upload\/download speeds).<\/li>\r\n<\/ul>\r\nThe challenges with\u00a0<em>processing<\/em>\u00a0WSIs include:\r\n<h6><\/h6>\r\n<ul>\r\n \t<li><strong>Memory<\/strong>: even a single WSI might be too large to fit entirely into memory.<\/li>\r\n \t<li><strong>Display<\/strong>: ideally you should be able to display the contents of a WSI with zoom\/pan\/scroll capabilities in monitors whose resolutions are a fraction of the image pixel count.<\/li>\r\n \t<li><strong>Blocks \/ patches<\/strong>: there should be an elegant way to represent individual blocks (patches) within the image \u2013 and treat them as \u201csubimages\u201d whenever needed.<\/li>\r\n \t<li><strong>Artifacts<\/strong>: different artifacts might be present in the WSIs, due to the slide preparation workflow (e.g., color variability in the staining process) or the scanner setup (e.g., different illumination and resolution settings).<\/li>\r\n<\/ul>\r\n<h3>The deep learning workflow for CPATH<\/h3>\r\n<h6><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-8241 size-medium\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/10\/data-science-central-blog-data-acquisition-to-prediction-diagram-2-187x300.jpg\" alt=\"\" width=\"187\" height=\"300\" \/><\/h6>\r\n<em>Figure 2: Simplified deep learning workflow for computational pathology.<\/em>\r\n<h6><\/h6>\r\nFigure 2 shows a simplified deep learning workflow for CPATH. It follows the basic steps of a classic machine learning (ML) \/ deep learning workflow, with a few notable exceptions and peculiarities:\r\n<h6><\/h6>\r\n<ul>\r\n \t<li>The\u00a0<strong>data acquisition<\/strong>\u00a0process involves collecting tissue specimen, slicing it, extracting each tissue slide, and digitizing it, generating a whole-slide image (WSI). If the resulting image doesn\u2019t pass a quality check, (part of) the acquisition process might have to be repeated.<\/li>\r\n \t<li>The\u00a0<strong>preprocessing<\/strong>\u00a0step consists of extracting a small number of patches from the gigapixel-sized WSIs. This approach to reducing the high dimensionality of WSIs can be seen as \u201chuman-guided feature selection.\u201d Image patches are usually square regions whose size can vary from 32 \u00d7 32 pixels up to 10,000 \u00d7 10,000 pixels (256 \u00d7 256 pixels is a typical patch size). Additionally, this step might include provisions for handling tissue and artifact detection and color management (see example below).<\/li>\r\n \t<li>The\u00a0<strong>modeling<\/strong>\u00a0block consists of training a selected deep learning model under a selected learning paradigm (e.g., supervised, weakly supervised, self-supervised, transfer learning).<\/li>\r\n \t<li>The\u00a0<strong>postprocessing<\/strong>\u00a0block might include morphological operations for improving the quality of the predictions at pixel level, fixing small errors, e.g., by filling gaps.<\/li>\r\n \t<li>Finally, the\u00a0<strong>prediction<\/strong>\u00a0step checks whether the model worked well. If it didn\u2019t (as indicated by the backward arrows) you might want to either: (1) adjust the model\u2019s hyperparameters and perform other similar steps common to most machine learning (ML) and deep learning (DL) tasks; or (2) revisit the preprocessing steps and improve the quality of the input images used for training the model. After all, since ML\/DL models learn from data, we must keep this fine balance in mind when refining a solution to better meet a target metric of success.<\/li>\r\n<\/ul>\r\n<h3>An example<\/h3>\r\nHere is an\u00a0<a href=\"https:\/\/github.com\/ogemarques\/cpath-matlab\" target=\"_blank\" rel=\"nofollow noopener\">example<\/a>\u00a0of how to use MATLAB to: (1) handle very large images such as WSIs; and (2) pre- and post-process histology images.\r\n<h6><\/h6>\r\n<h4>Handling WSIs in MATLAB<\/h4>\r\nThe first part of this example shows how to read, display, explore, and organize WSIs (and their patches) in MATLAB. Thanks to the recently introduced\u00a0<a href=\"https:\/\/www.mathworks.com\/help\/images\/ref\/blockedimage.html\" rel=\"nofollow\">blockedImage\u00a0<\/a>object, it is now possible to handle very large images without running out of memory. A\u00a0blockedImage\u00a0 is an image made from discrete blocks (patches), which can be organized and managed using a\u00a0<a href=\"https:\/\/www.mathworks.com\/help\/images\/ref\/blockedimagedatastore.html\" rel=\"nofollow\">blockedImageDatastore\u00a0<\/a>object and displayed using\u00a0<a href=\"https:\/\/www.mathworks.com\/help\/images\/ref\/bigimageshow.html\" rel=\"nofollow\">bigimageshow<\/a>.\r\n<h6><\/h6>\r\n<h4>Useful pre- and post-processing operations on WSIs in MATLAB<\/h4>\r\nSince the goal of using deep learning techniques in CPATH is to produce solutions that are clinically translatable, i.e., capable of working across large patient populations, it is advisable to deal with some of the most likely WSI artifacts upfront, thereby increasing the abilities of the resulting model to generalize over image artifacts found in other test sets.\r\n<h6><\/h6>\r\nThe second part of this example shows examples of preprocessing operations to handle commonly found artifacts in histopathology images as well as postprocessing morphological operations for improving the quality of the results at pixel level. Essentially, this example should help the medical image analysis community to create an image analysis pipeline for WSIs (and, as bonus, the ability to reproduce the code and examples described in a\u00a0<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC8057393\/\" rel=\"nofollow\">recent paper<\/a>\u00a0on this topic) using MATLAB.\r\n<h6><\/h6>\r\nIt highlights the usefulness of MATLAB (and Image Processing Toolbox) functions such as:\r\n<h6><\/h6>\r\n<ul>\r\n \t<li>Image thresholding and filtering:\u00a0<strong>imbinarize<\/strong>,\u00a0<strong>bwareafilt<\/strong>, and\u00a0<strong>imlincomb<\/strong><\/li>\r\n \t<li>Morphological image processing operations:\u00a0<strong>imclose<\/strong>,\u00a0<strong>imopen<\/strong>,\u00a0<strong>imdilate<\/strong>,\u00a0<strong>imerode<\/strong>,\u00a0<strong>imfill<\/strong>, and\u00a0<strong>strel<\/strong><\/li>\r\n \t<li>Feature extraction:\u00a0<strong>bwlabel<\/strong>\u00a0and\u00a0<strong>regionprops<\/strong><\/li>\r\n \t<li>Visualization:\u00a0<strong>montage<\/strong>,\u00a0<strong>imoverlay<\/strong>,\u00a0<strong>plot<\/strong>\u00a0and\u00a0<strong>rectangle<\/strong><\/li>\r\n<\/ul>\r\n&nbsp;\r\n\r\nFigures 3 and 4 show examples of results.\r\n\r\n<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-8217 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/10\/Fig3.png\" alt=\"\" width=\"710\" height=\"238\" \/>\r\n<h6><\/h6>\r\n<em>Figure 3: Preprocessing example: (left) initial image; (center) result of thresholding operation to separate tissue pixels from glass pixels; (right) result of applying hull filling to capture the full shape and size of the tissue and remove the slide background from further analysis down the pipeline.<\/em>\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"300\" height=\"261\" class=\"alignnone size-medium wp-image-8229\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/10\/Fig6a-300x261.png\" alt=\"\" \/>\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"300\" height=\"261\" class=\"alignnone size-medium wp-image-8226\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/10\/Fig6b-300x261.png\" alt=\"\" \/>\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"300\" height=\"261\" class=\"alignnone size-medium wp-image-8223\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/10\/Fig6c-300x261.png\" alt=\"\" \/>\r\n<h6><\/h6>\r\n<em>Figure 4: Postprocessing example: (top) overlay of hypothetical predictions of the presence of a region of interest (in green) on a renal transplant biopsy WSI; (center) result of using morphological algorithms to fill holes and remove spurious pixels; (bottom) result of postprocessing (by patch).<\/em>\r\n\r\n&nbsp;\r\n<h6><\/h6>\r\n<h3>Key takeaways<\/h3>\r\nDeep Learning solutions for computational histopathology require the ability to handle whole slide images, which \u2013 in addition to being usually much larger than their images used in other areas of image analysis and computer vision \u2013 might suffer from artifacts that could impact the quality of the overall solution. In this blog post we have used MATLAB to show how to handle and process gigapixel-sized WSIs in the context of a CPATH deep learning workflow.\r\n<h6><\/h6>\r\nCPATH is an active research area, and new developments and applications of deep learning in this area will likely emerge in the near future. If you\u2019re interested in learning more about CPATH and related issues, I recommend you to check this resource:\u00a0<a href=\"https:\/\/www.mathworks.com\/help\/images\/deep-learning-classification-of-large-multiresolution-images.html\" rel=\"nofollow\">Classify Large Multiresolution Images Using\u00a0blockedImageand Deep Learning<\/a>.","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2021\/10\/data-science-central-blog-wsi-patch-nn-diagram-1.jpg\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><p>This post is from\u00a0Oge Marques, PhD\u00a0and Professor of Engineering and Computer Science at FAU. Oge is a\u00a0Sigma Xi Distinguished Speaker,\u00a0book author, and\u00a0AAAS Leshner Fellow.\u00a0He also happens to be a... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2021\/11\/02\/handling-very-large-images-in-medical-imaging-applications\/\">read more >><\/a><\/p>","protected":false},"author":156,"featured_media":8208,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/8145"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/156"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=8145"}],"version-history":[{"count":40,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/8145\/revisions"}],"predecessor-version":[{"id":8857,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/8145\/revisions\/8857"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media\/8208"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=8145"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=8145"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=8145"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}