Deep Learning

Understanding and using deep learning networks

Semantic Segmentation for Medical Imaging

The following post is by Dr. Barath Narayanan, University of Dayton Research Institute (UDRI) with co-authors: Dr. Russell C. Hardie, and Redha Ali.
In this blog, we apply Deep Learning based segmentation to skin lesions in dermoscopic images to aid in melanoma detection.
*Sensors and Software Systems, University of Dayton Research Institute, 300 College Park, Dayton, OH, 45469
**Department of Electrical and Computer Engineering, University of Dayton, 300 College Park, Dayton, OH, 45469


Skin lesion segmentation is an important step in Computer-Aided Diagnosis (CAD) of melanoma. In this blog, we present a Convolutional Neural Network (CNN) based segmentation approach applied to skin lesions in dermoscopic images. Early stage detection and diagnosis of melanoma detection increases one's survival rate significantly.
Please cite the following article if you're using any part of the code for your research.
[1] Ali, R., Hardie, R. C., Narayanan, B. N., & De Silva, S. (2019, July). "Deep learning ensemble methods for skin lesion analysis towards melanoma detection". In 2019 IEEE National Aerospace and Electronics Conference (NAECON) (pp. 311-316). IEEE.
Dataset utilized for this blog is taken from ISIC 2018. Instructions about the dataset are provided at the end of this post.

Load the Dataset and Resize

Raw images are loaded using imageDatastore. It is a computationally efficient function to collect image information. Load the ground truth masks using pixelLabelDatastore. White region in the ground truth mask indicates the "lesion" and rest of the image belongs to "background" class. The function pixelLabelImageDatastore helps in tagging the raw image with its corresponding ground truth mask. Let's visualize certain random images from the dataset for our reference. Later, resize all images to a size 224 x 224 for the deep learning network.

Labeled image showing pixels as either background (black) and foreground "lesion" (white)

% Clear workspace 
clear; close all; clc;

% All images

% Define class names and their corresponding IDs

% Create a pixelLabelDatastore holding the ground truth pixel labels
pxds=pixelLabelDatastore('ISIC2018_Task1_Training_GroundTruth',classNames, labelIDs);

% Create a pixel label image datastore of all images  

% Number of Images

% Visualize random images

% Visualize the images with Mask
for idx=1:length(perm)
    hold on;
% Desired Image Size 
imageSize=[224 224 3];

% Create a pixel label image datastore of all resized images  

% Clear all variables except the necessary variables
clearvars -except pximdsResz classNames total_num_images imageSize

Split the Dataset - Training, Validation and Testing

% Randomly select 100 images for testing from the dataset

% Rest of the indices are utilize for training and validation 

% Randomly pick 100 images for validation from the training dataset

% Rest of the indices are used for training

% Train Dataset

% Validation Dataset

% Test Dataset

Deep Learning Approach

Define the CNN network for training the network along with the parameters necessary.
In this blog, we study the performance using DeepLab v3+ network. DeepLab v3+ is a CNN for semantic image segmentation. It utilizes an encoder-decoder based architecture with dilated convolutions and skip convolutions to segment images. In [1], we present an ensemble approach of combining both U-Net with DeepLab v3+ network. In the blog, we solely focus on DeepLab v3+ network using ResNet50 architecture. Feel free to change the hyperparameters and observe the performance.
  • Make sure to install Deep Learning Toolbox Model for ResNet-50 Network support package through add-on explorer.
  • The input normalization might take about 5-10 minutes due to resolution of the original images. Training time per epoch is about 10 minutes in NVIDIA GeForce GTX 1070.
  • You can also set the execution environment to 'multi-gpu' in the training options if you have access to more than one GPU.
 % Number of classes

% Network
lgraph=deeplabv3plusLayers(imageSize, numClasses,'resnet50');

% Define the parameters for the network 
    'InitialLearnRate', 0.03, ...
    'ValidationData',pximdsValid, ...
    'ValidationFrequency',50, ...

% Train the network 

Testing and Performance Analysis

Now, let's study the performance of the network on the test set. We study the performance in terms of following metrics:
  • Pixel classification accuracy: global and mean
  • Intersection over Union (IoU): weighted and mean
  • Normalized confusion matrix
In [1], we also study the performance in terms of Jaccard index and Dice coefficient.
% Semantic segmentation of test dataset based on the trained network

% Evaluation
% Normalized Confusion Matrix
h.XLabel='Predicted Class';
h.YLabel='True Class';
h.Title='Normalized Confusion Matrix (%)';

Visual Inspection

In this section, we visually inspect the results by visualizing both the predicted and actual masks for a given image.
 % Number of Images

% Pick any random 2 images

% Visualize the images with Mask
for idx=1:length(perm)
    % Extract filename for the title
    % Read the original file and resize it for network purposes
    I=imresize(I,[imageSize(1) imageSize(2)],'bilinear');
    hold on;
    % Read the actual mask and resize it for visualization
    actual_mask=imresize(actual_mask,[imageSize(1) imageSize(2)],'bilinear');
    % Ground Truth
    % Predicted by the Algorithm
    predicted_image=(uint8(readimage(pxdspredicted,perm(idx)))); % Values are 1 and 2
    predicted_results=uint8(~(predicted_image-1)); % Conversion to binary and reverse the polarity to match with the labelIds
    % Predicted result
    title(sprintf('%s Red- Actual, Green - Predicted',filename),'Interpreter',"none");


In this blog, we have presented a simple deep learning-based segmentation applied to skin lesions in dermoscopic images to aid in melanoma detection. The segmentation algorithm using DeepLab v3+ with ResNet50 architecture performed relatively well with a good IoU and pixel classification accuracy. Combining these results with other existing architectures would provide a boost in performance. Feel free to study the performance under different hyperparameters settings and architectures. In our paper [1], we fused the results of DeepLab v3+ with a U-Net architecture. Segmentation of skin lesion would serve as a valuable preprocessing step for classification algorithm for the detection of melanoma.

Dataset Instructions

Please cite the following articles if you're using the dataset.

[2] Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018;

[3] Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 doi:10.1038/sdata.2018.161 (2018).
Download the ISIC 2018 Task 1,2 Training Data (10.4 GB) and the Training Ground Truth (26 MB) for Task 1. After downloading the zip files, extract them into respective folders ("ISIC2018_Task1-2_Training_Input", "ISIC2018_Task1_Training_GroundTruth" as needed for the script). The dataset contains 2594 images in total. Note that we solely utilize "Task 1 - Training Data" to study the performance of the system, as the ground truth is publicly available.


Barath Narayanan graduated with M.S. and Ph.D. degree in Electrical Engineering from University of Dayton (UD) in 2013 and 2017 respectively. He currently holds a joint appointment as a Research Scientist at UDRI's Software Systems Group and as an Adjunct Faculty for the ECE department at UD. He graduated with distinction from SRM University, Chennai, India in 2012 with a Bachelor’s degree in Electrical and Electronics Engineering. His research interests include deep learning, machine learning, computer vision, and pattern recognition.
Redha Ali received his B.S. in Computer Science and Information Technology from the College of Electronic Technology, Bani Walid, Libya, in 2012. He completed his M.S. in Electrical and Computer Engineering from the University of Dayton in 2016. His Master's thesis work and publication are in the field of image and video denoising. He is currently pursuing his Ph. D. research in medical imaging at the University of Dayton. His applied research interests include medical image processing, deep learning, machine learning, computer vision, video restoration, and enhancement.
Dr. Russell C. Hardie graduated Magna Cum Laude from Loyola University in Baltimore Maryland in 1988 with a B.S. degree in Engineering Science. He obtained an M.S. and Ph.D. degree in Electrical Engineering from the University of Delaware in 1990 and 1992, respectively. Dr. Hardie served as a Senior Scientist at Earth Satellite Corporation (Now MDA) in Maryland prior to his appointment at the University of Dayton in 1993. He is currently a Full Professor in the Department of Electrical and Computer Engineering and holds a joint appointment with the Department of Electro-Optics and Photonics.
Have a question or comment for the authors? Leave a comment below.
  • print
  • send email


To leave a comment, please click here to sign in to your MathWorks Account or create a new one.