Jumping into the Deep End
Hello, and welcome to the new MATLAB Central blog on deep learning! In my 24th year of MATLAB and toolbox development and design, I am excited to be tackling this new project.
Deep learning refers to a collection of machine learning techniques that are based on neural networks that have a large number of layers (hence "deep"). By training these networks on labeled data sets, they can achieve state-of-the-art accuracy on classification tasks using images, text, and sound as inputs.
Because of my background in image processing, I have followed the rapid progress in deep learning over the past several years with great interest. There is much that I would like to learn and share with you about the area, especially with respect to exploring deep learning ideas with MATLAB. To that end, several developers have volunteered to lend a hand with topics and code and technical guidance as we explore. They are building deep learning capabilities as fast as they can in products like:
- Neural Network Toolbox
- Parallel Computing Toolbox
- Image Processing Toolbox
- Computer Vision System Toolbox
- Automated Driving System Toolbox
- GPU Coder
I will be introducing them to you as we get into the details of deep learning with MATLAB.
If you have followed my image processing blog posts, you can expect a similar style here. Topics will be a mix of concept tutorials, examples and case studies, feature exploration, and tips. I imagine we'll discuss things like performance, GPU hardware, and online data sets. Maybe we'll do some things just for fun, like the LSTM network built last month by a MathWorks developer that spouts Shakespeare-like verse.
To subscribe, either using email or RSS, click on the "Subscribe" link at the top of the page.
I'll leave you with a little teaser based on AlexNet. I just plugged in a webcam and connected to it in MATLAB.
c = webcam
c = webcam with properties: Name: 'Microsoft® LifeCam Cinema(TM)' Resolution: '640x480' AvailableResolutions: {1×11 cell} ExposureMode: 'auto' WhiteBalanceMode: 'auto' Focus: 33 BacklightCompensation: 5 Sharpness: 25 Zoom: 0 FocusMode: 'auto' Tilt: 0 Brightness: 143 Pan: 0 WhiteBalance: 4500 Saturation: 83 Exposure: -6 Contrast: 5
Next, I loaded an AlexNet network that has been pretrained with a million images. The network can classify images into 1,000 different object categories.
nnet = alexnet
nnet = SeriesNetwork with properties: Layers: [25×1 nnet.cnn.layer.Layer]
You could also try other networks. For example, after you have upgraded to R2017b, you could experiment with GoogLeNet by using net = googlenet.
What do these 25 network layers look like?
nnet.Layers
ans = 25x1 Layer array with layers: 1 'data' Image Input 227x227x3 images with 'zerocenter' normalization 2 'conv1' Convolution 96 11x11x3 convolutions with stride [4 4] and padding [0 0 0 0] 3 'relu1' ReLU ReLU 4 'norm1' Cross Channel Normalization cross channel normalization with 5 channels per element 5 'pool1' Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0] 6 'conv2' Convolution 256 5x5x48 convolutions with stride [1 1] and padding [2 2 2 2] 7 'relu2' ReLU ReLU 8 'norm2' Cross Channel Normalization cross channel normalization with 5 channels per element 9 'pool2' Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0] 10 'conv3' Convolution 384 3x3x256 convolutions with stride [1 1] and padding [1 1 1 1] 11 'relu3' ReLU ReLU 12 'conv4' Convolution 384 3x3x192 convolutions with stride [1 1] and padding [1 1 1 1] 13 'relu4' ReLU ReLU 14 'conv5' Convolution 256 3x3x192 convolutions with stride [1 1] and padding [1 1 1 1] 15 'relu5' ReLU ReLU 16 'pool5' Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0] 17 'fc6' Fully Connected 4096 fully connected layer 18 'relu6' ReLU ReLU 19 'drop6' Dropout 50% dropout 20 'fc7' Fully Connected 4096 fully connected layer 21 'relu7' ReLU ReLU 22 'drop7' Dropout 50% dropout 23 'fc8' Fully Connected 1000 fully connected layer 24 'prob' Softmax softmax 25 'output' Classification Output crossentropyex with 'tench', 'goldfish', and 998 other classes
I happen to know that 'coffee mug' is one of the categories. How will the network do with the 23-year-old MATLAB "Picture the Power" mug from my bookshelf?
Here's the snapshot I took with my webcam using pic = snapshot(c).
imshow(pic)
The first layer accepts inputs. It will tell us the image size that the network accepts.
nnet.Layers(1)
ans = ImageInputLayer with properties: Name: 'data' InputSize: [227 227 3] Hyperparameters DataAugmentation: 'none' Normalization: 'zerocenter'
So I need to resize the snapshot to be 227x227 before I feed it to the network.
pic2 = imresize(pic,[227 227]); imshow(pic2)
Now I can try to classify it.
label = classify(nnet,pic2)
label = categorical coffee mug
OK! But I wonder what else the network thought it might be? The predict function can return the scores for all the categories.
p = predict(nnet,pic2); plot(p)
There are several notable prediction peaks. I'll use the maxk function (new in R2017b) to find where they are, and then I'll look up those locations in the list of category labels in the network's last layer.
[p3,i3] = maxk(p,3);
p3
p3 = 1×3 single row vector 0.2469 0.1446 0.1377
i3
i3 = 505 733 623
nnet.Layers(end)
ans = ClassificationOutputLayer with properties: Name: 'output' ClassNames: {1000×1 cell} OutputSize: 1000 Hyperparameters LossFunction: 'crossentropyex'
nnet.Layers(end).ClassNames(i3)
ans = 3×1 cell array {'coffee mug' } {'Polaroid camera'} {'lens cap' }
Hmm. I'm glad coffee mug came out on top. I can't pour coffee into a camera or a lens cap!
Remember, for options to follow along with this new blog, click on the "Subscribe" link at the top of the page.
Finally, a note for my image processing blog readers: Don't worry, I will continue to write for that blog, too.
- Category:
- Deep Learning
Comments
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.