## Deep LearningUnderstanding and using deep learning networks

# Jumping into the Deep End14

Posted by Steve Eddins,

Hello, and welcome to the new MATLAB Central blog on deep learning! In my 24th year of MATLAB and toolbox development and design, I am excited to be tackling this new project.

Deep learning refers to a collection of machine learning techniques that are based on neural networks that have a large number of layers (hence "deep"). By training these networks on labeled data sets, they can achieve state-of-the-art accuracy on classification tasks using images, text, and sound as inputs.

Because of my background in image processing, I have followed the rapid progress in deep learning over the past several years with great interest. There is much that I would like to learn and share with you about the area, especially with respect to exploring deep learning ideas with MATLAB. To that end, several developers have volunteered to lend a hand with topics and code and technical guidance as we explore. They are building deep learning capabilities as fast as they can in products like:

• Neural Network Toolbox
• Parallel Computing Toolbox
• Image Processing Toolbox
• Computer Vision System Toolbox
• Automated Driving System Toolbox
• GPU Coder

I will be introducing them to you as we get into the details of deep learning with MATLAB.

If you have followed my image processing blog posts, you can expect a similar style here. Topics will be a mix of concept tutorials, examples and case studies, feature exploration, and tips. I imagine we'll discuss things like performance, GPU hardware, and online data sets. Maybe we'll do some things just for fun, like the LSTM network built last month by a MathWorks developer that spouts Shakespeare-like verse.

I'll leave you with a little teaser based on AlexNet. I just plugged in a webcam and connected to it in MATLAB.

c = webcam

c =

webcam with properties:

Name: 'Microsoft® LifeCam Cinema(TM)'
Resolution: '640x480'
AvailableResolutions: {1×11 cell}
ExposureMode: 'auto'
WhiteBalanceMode: 'auto'
Focus: 33
BacklightCompensation: 5
Sharpness: 25
Zoom: 0
FocusMode: 'auto'
Tilt: 0
Brightness: 143
Pan: 0
WhiteBalance: 4500
Saturation: 83
Exposure: -6
Contrast: 5



Next, I loaded an AlexNet network that has been pretrained with a million images. The network can classify images into 1,000 different object categories.

nnet = alexnet

nnet =

SeriesNetwork with properties:

Layers: [25×1 nnet.cnn.layer.Layer]



You could also try other networks. For example, after you have upgraded to R2017b, you could experiment with GoogLeNet by using net = googlenet.

What do these 25 network layers look like?

nnet.Layers

ans =

25x1 Layer array with layers:

1   'data'     Image Input                   227x227x3 images with 'zerocenter' normalization
2   'conv1'    Convolution                   96 11x11x3 convolutions with stride [4  4] and padding [0  0  0  0]
3   'relu1'    ReLU                          ReLU
4   'norm1'    Cross Channel Normalization   cross channel normalization with 5 channels per element
5   'pool1'    Max Pooling                   3x3 max pooling with stride [2  2] and padding [0  0  0  0]
6   'conv2'    Convolution                   256 5x5x48 convolutions with stride [1  1] and padding [2  2  2  2]
7   'relu2'    ReLU                          ReLU
8   'norm2'    Cross Channel Normalization   cross channel normalization with 5 channels per element
9   'pool2'    Max Pooling                   3x3 max pooling with stride [2  2] and padding [0  0  0  0]
10   'conv3'    Convolution                   384 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
11   'relu3'    ReLU                          ReLU
12   'conv4'    Convolution                   384 3x3x192 convolutions with stride [1  1] and padding [1  1  1  1]
13   'relu4'    ReLU                          ReLU
14   'conv5'    Convolution                   256 3x3x192 convolutions with stride [1  1] and padding [1  1  1  1]
15   'relu5'    ReLU                          ReLU
16   'pool5'    Max Pooling                   3x3 max pooling with stride [2  2] and padding [0  0  0  0]
17   'fc6'      Fully Connected               4096 fully connected layer
18   'relu6'    ReLU                          ReLU
19   'drop6'    Dropout                       50% dropout
20   'fc7'      Fully Connected               4096 fully connected layer
21   'relu7'    ReLU                          ReLU
22   'drop7'    Dropout                       50% dropout
23   'fc8'      Fully Connected               1000 fully connected layer
24   'prob'     Softmax                       softmax
25   'output'   Classification Output         crossentropyex with 'tench', 'goldfish', and 998 other classes


I happen to know that 'coffee mug' is one of the categories. How will the network do with the 23-year-old MATLAB "Picture the Power" mug from my bookshelf?

Here's the snapshot I took with my webcam using pic = snapshot(c).

imshow(pic)


The first layer accepts inputs. It will tell us the image size that the network accepts.

nnet.Layers(1)

ans =

ImageInputLayer with properties:

Name: 'data'
InputSize: [227 227 3]

Hyperparameters
DataAugmentation: 'none'
Normalization: 'zerocenter'



So I need to resize the snapshot to be 227x227 before I feed it to the network.

pic2 = imresize(pic,[227 227]);
imshow(pic2)


Now I can try to classify it.

label = classify(nnet,pic2)

label =

categorical

coffee mug



OK! But I wonder what else the network thought it might be? The predict function can return the scores for all the categories.

p = predict(nnet,pic2);
plot(p)


There are several notable prediction peaks. I'll use the maxk function (new in R2017b) to find where they are, and then I'll look up those locations in the list of category labels in the network's last layer.

[p3,i3] = maxk(p,3);

p3

p3 =

1×3 single row vector

0.2469    0.1446    0.1377


i3

i3 =

505   733   623


nnet.Layers(end)

ans =

ClassificationOutputLayer with properties:

Name: 'output'
ClassNames: {1000×1 cell}
OutputSize: 1000

Hyperparameters
LossFunction: 'crossentropyex'


nnet.Layers(end).ClassNames(i3)

ans =

3×1 cell array

{'coffee mug'     }
{'Polaroid camera'}
{'lens cap'       }



Hmm. I'm glad coffee mug came out on top. I can't pour coffee into a camera or a lens cap!

Finally, a note for my image processing blog readers: Don't worry, I will continue to write for that blog, too.

### Note

btimofte replied on : 1 of 14
Hello Is it possible to use GPU coder in Matlab Home edition ? I couldnt find it anywhere in the list of toolboxes to buy...
btimofte replied on : 3 of 14
Thats a pity since Matlab Coder is available to Home edition ! I don't understand why Home edition is so crippled when it comes to most cool features -_-''
Eric replied on : 4 of 14
It's interesting the maximum probability is so low. There's a lot said about how accurate these algorithms are and how they perform better than people. If you were only 25% sure this was a coffee cup, would you pour coffee into it? What about if you were told the probability that this is a coffee mug is only about twice that of it being a camera? Would you risk being wrong and pouring coffee onto your (presumably vintage) Polaroid camera? What if you were looking for the laptop? I know AlexNet is a bit dated and I wonder how a newer algorithm would perform. I'm reasonably sure a human would have a very high confidence in the "coffee mug" solution in addition to "Apple laptop computer" and "notepad".
tsherida replied on : 5 of 14
MATLAB: Picture the Power! Jealous - I want one of those mugs! and I'm excited about hearing about new deep learning features!
Steve Eddins replied on : 6 of 14
Tish—I'm so sorry, but I'm not letting go of my mug! :-)
Dan Samber replied on : 7 of 14
Is there a way to do image segmentation using CNNs in Matlab? Thanks! Dan
Dan Samber replied on : 9 of 14
Thanks Steve! Exactly what I was looking for! (Except NOW I need a new computer... or at least a reasonable GPU) Dan
Steve Eddins replied on : 10 of 14
Dan—I started working this week on a blog post about GPU choices. Look for it to appear in about a month.
Grzegorz Knor replied on : 11 of 14
Could you share a link to Shakespeare LSTM network?
Steve Eddins replied on : 12 of 14
Grzegorz—I don't have anything I can share about that right now, but maybe I can do a blog post about it a little later.
hassmal7374@gmail.com replied on : 13 of 14
Hello, do you have an example of sound as an input?
Henk-Jan replied on : 14 of 14
Hi Steve, I really don't see why you are worried so much about the prediction peaks: