Artificial Intelligence

Apply machine learning and deep learning

Jumping into the Deep End

Hello, and welcome to the new MATLAB Central blog on deep learning! In my 24th year of MATLAB and toolbox development and design, I am excited to be tackling this new project.

Deep learning refers to a collection of machine learning techniques that are based on neural networks that have a large number of layers (hence "deep"). By training these networks on labeled data sets, they can achieve state-of-the-art accuracy on classification tasks using images, text, and sound as inputs.

Because of my background in image processing, I have followed the rapid progress in deep learning over the past several years with great interest. There is much that I would like to learn and share with you about the area, especially with respect to exploring deep learning ideas with MATLAB. To that end, several developers have volunteered to lend a hand with topics and code and technical guidance as we explore. They are building deep learning capabilities as fast as they can in products like:

  • Neural Network Toolbox
  • Parallel Computing Toolbox
  • Image Processing Toolbox
  • Computer Vision System Toolbox
  • Automated Driving System Toolbox
  • GPU Coder

I will be introducing them to you as we get into the details of deep learning with MATLAB.

If you have followed my image processing blog posts, you can expect a similar style here. Topics will be a mix of concept tutorials, examples and case studies, feature exploration, and tips. I imagine we'll discuss things like performance, GPU hardware, and online data sets. Maybe we'll do some things just for fun, like the LSTM network built last month by a MathWorks developer that spouts Shakespeare-like verse.

To subscribe, either using email or RSS, click on the "Subscribe" link at the top of the page.

I'll leave you with a little teaser based on AlexNet. I just plugged in a webcam and connected to it in MATLAB.

c = webcam
c = 

  webcam with properties:

                     Name: 'Microsoft® LifeCam Cinema(TM)'
               Resolution: '640x480'
     AvailableResolutions: {1×11 cell}
             ExposureMode: 'auto'
         WhiteBalanceMode: 'auto'
                    Focus: 33
    BacklightCompensation: 5
                Sharpness: 25
                     Zoom: 0
                FocusMode: 'auto'
                     Tilt: 0
               Brightness: 143
                      Pan: 0
             WhiteBalance: 4500
               Saturation: 83
                 Exposure: -6
                 Contrast: 5

Next, I loaded an AlexNet network that has been pretrained with a million images. The network can classify images into 1,000 different object categories.

nnet = alexnet
nnet = 

  SeriesNetwork with properties:

    Layers: [25×1 nnet.cnn.layer.Layer]

You could also try other networks. For example, after you have upgraded to R2017b, you could experiment with GoogLeNet by using net = googlenet.

What do these 25 network layers look like?

nnet.Layers
ans = 

  25x1 Layer array with layers:

     1   'data'     Image Input                   227x227x3 images with 'zerocenter' normalization
     2   'conv1'    Convolution                   96 11x11x3 convolutions with stride [4  4] and padding [0  0  0  0]
     3   'relu1'    ReLU                          ReLU
     4   'norm1'    Cross Channel Normalization   cross channel normalization with 5 channels per element
     5   'pool1'    Max Pooling                   3x3 max pooling with stride [2  2] and padding [0  0  0  0]
     6   'conv2'    Convolution                   256 5x5x48 convolutions with stride [1  1] and padding [2  2  2  2]
     7   'relu2'    ReLU                          ReLU
     8   'norm2'    Cross Channel Normalization   cross channel normalization with 5 channels per element
     9   'pool2'    Max Pooling                   3x3 max pooling with stride [2  2] and padding [0  0  0  0]
    10   'conv3'    Convolution                   384 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    11   'relu3'    ReLU                          ReLU
    12   'conv4'    Convolution                   384 3x3x192 convolutions with stride [1  1] and padding [1  1  1  1]
    13   'relu4'    ReLU                          ReLU
    14   'conv5'    Convolution                   256 3x3x192 convolutions with stride [1  1] and padding [1  1  1  1]
    15   'relu5'    ReLU                          ReLU
    16   'pool5'    Max Pooling                   3x3 max pooling with stride [2  2] and padding [0  0  0  0]
    17   'fc6'      Fully Connected               4096 fully connected layer
    18   'relu6'    ReLU                          ReLU
    19   'drop6'    Dropout                       50% dropout
    20   'fc7'      Fully Connected               4096 fully connected layer
    21   'relu7'    ReLU                          ReLU
    22   'drop7'    Dropout                       50% dropout
    23   'fc8'      Fully Connected               1000 fully connected layer
    24   'prob'     Softmax                       softmax
    25   'output'   Classification Output         crossentropyex with 'tench', 'goldfish', and 998 other classes

I happen to know that 'coffee mug' is one of the categories. How will the network do with the 23-year-old MATLAB "Picture the Power" mug from my bookshelf?

Here's the snapshot I took with my webcam using pic = snapshot(c).

imshow(pic)

The first layer accepts inputs. It will tell us the image size that the network accepts.

nnet.Layers(1)
ans = 

  ImageInputLayer with properties:

                Name: 'data'
           InputSize: [227 227 3]

   Hyperparameters
    DataAugmentation: 'none'
       Normalization: 'zerocenter'

So I need to resize the snapshot to be 227x227 before I feed it to the network.

pic2 = imresize(pic,[227 227]);
imshow(pic2)

Now I can try to classify it.

label = classify(nnet,pic2)
label = 

  categorical

     coffee mug 

OK! But I wonder what else the network thought it might be? The predict function can return the scores for all the categories.

p = predict(nnet,pic2);
plot(p)

There are several notable prediction peaks. I'll use the maxk function (new in R2017b) to find where they are, and then I'll look up those locations in the list of category labels in the network's last layer.

[p3,i3] = maxk(p,3);
p3
p3 =

  1×3 single row vector

    0.2469    0.1446    0.1377

i3
i3 =

   505   733   623

nnet.Layers(end)
ans = 

  ClassificationOutputLayer with properties:

            Name: 'output'
      ClassNames: {1000×1 cell}
      OutputSize: 1000

   Hyperparameters
    LossFunction: 'crossentropyex'

nnet.Layers(end).ClassNames(i3)
ans =

  3×1 cell array

    {'coffee mug'     }
    {'Polaroid camera'}
    {'lens cap'       }

Hmm. I'm glad coffee mug came out on top. I can't pour coffee into a camera or a lens cap!

Remember, for options to follow along with this new blog, click on the "Subscribe" link at the top of the page.

Finally, a note for my image processing blog readers: Don't worry, I will continue to write for that blog, too.




Published with MATLAB® R2017b

|
  • print

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.