Have you ever wondered what your favorite deep learning network is looking at? For example, if a network classifies this image as "French horn," what part of the image matters most for the classification? Birju Patel, a developer on the Computer Vision System Toolbox team, helped me with the main... read more >>

]]>Have you ever wondered what your favorite deep learning network is looking at? For example, if a network classifies this image as "French horn," what part of the image matters most for the classification?

Birju Patel, a developer on the Computer Vision System Toolbox team, helped me with the main idea and code for today's post. Birju has focused on deep learning for the last couple of years. Before that, he worked on feature extraction methods and on optimizing feature matching.

Let's use the pretrained ResNet-50 network for this experiment. (He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, Sun, Jian. "Deep Residual Learning for Image Recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.) An easy way to get the ResNet-50 network for MATLAB is to launch the Add-On Explorer (from the HOME tab in MATLAB) and search for resnet.

net = resnet50;

We need to be aware that ResNet-50 expects the input images to be a particular size. The network's initial layer has this information.

sz = net.Layers(1).InputSize(1:2)

sz = 224 224

The required image size can be passed directly to the `imresize` function.

```
url = 'http://blogs.mathworks.com/steve/files/steve-horn.jpg';
rgb = imread(url);
rgb = imresize(rgb,sz);
imshow(rgb)
```

Call `classify` with the network and the image to see what category the network thinks is most probable.

classify(net,rgb)

ans = categorical French horn

ResNet-50 thinks I am playing the French horn.

Birju was reading a paper by Zeiler and Fergus about visualization techniques for convolutional neural networks, and in it he came across the idea of *occlusion sensitivity*. If you block out, or occlude, a portion of the image, how does that affect the probability score of the network? And how does the result vary depending on which portion you occlude?

Let's try it.

rgb2 = rgb; rgb2((1:71)+77,(1:71)+108,:) = 128; imshow(rgb2)

classify(net,rgb2)

ans = categorical notebook

Hmm. I guess the network "thinks" that gray square looks like a notebook. That region must be important for classifying the image. Now let's try the occlusion in a different spot.

rgb3 = rgb; rgb3((1:71)+15,(1:71)+80,:) = 128; imshow(rgb3)

classify(net,rgb3)

ans = categorical French horn

Hmm. I guess my head is not as important.

Anyway, Birju wrote some MATLAB code to systematically quantify the relative importance of different images regions to the classification result. His code builds up a large batch of images. For each image in the batch, a different region is occluded. For each location of the occlusion mask, the prediction score of the expected class ("French horn," in this case) is recorded.

Let's make a batch of images with 71x71 regions masked out. Start by computing the corners of all the masks, represented as (X1,Y1) and (X2,Y2).

mask_size = [71 71]; [H,W,~] = size(rgb); X = 1:W; Y = 1:H; [X1, Y1] = meshgrid(X, Y); X1 = X1(:) - (mask_size(2)-1)/2; Y1 = Y1(:) - (mask_size(1)-1)/2; X2 = X1 + mask_size(2) - 1; Y2 = Y1 + mask_size(1) - 1;

Don't let the mask corners stray outside the image boundaries.

X1 = max(1, X1); Y1 = max(1, Y1); X2 = min(W, X2); Y2 = min(H, Y2);

Make the batch.

batch = repmat(rgb,[1 1 1 size(X1,1)]); for i = 1:size(X1,1) c = X1(i):X2(i); r = Y1(i):Y2(i); batch(r,c,:,i) = 128; % gray mask. end

[Note: This batch has more than 50,000 images in it. You'll need a lot of RAM to create and process such a large batch of images all at once.]

Here are a few of the masked images.

montage(batch(:,:,:,randperm(size(X1,1),9)))

Now I'll use `predict` (instead of `classify`) to get the prediction scores for each category and for each image in the batch. The `'MiniBatchSize'` parameter is used to keep the GPU memory use down. It means that the `predict` function will send 64 images at a time to the GPU for processing.

```
s = predict(net, batch, 'MiniBatchSize',64);
```

size(s)

ans = 50176 1000

That's a lot of prediction scores! There are 51,529 images in the batch, and there are 1,000 categories. The matrix `s` has a score for each category and for each image.

We are specifically interested in the prediction scores for the category predicted for the original image. Let's figure out the category index for that.

scores = predict(net,rgb); [~,horn_idx] = max(scores);

So, here are the *French horn* scores for every image in the batch:

s_horn = s(:,horn_idx);

Reshape the set of horn scores to be an image and display it.

```
S_horn = reshape(s_horn,H,W);
imshow(-S_horn,[])
colormap(gca,'parula')
```

The brightest regions indicate the locations where the occlusion had the biggest effect on the probability score.

Let's find the location that minimizes the "French horn" probability score.

[min_score,min_idx] = min(s_horn); rgb_min_score = batch(:,:,:,min_idx); imshow(rgb_min_score)

There you go. To recognize a French horn, it's all about the valves and valve slides. It's not about the bell.

A final note on terminology: Some of my horn-playing friends might give me a hard time about calling my instrument a "French horn." According to the International Horn Society, the instrument should just be called "horn." However, the label stored in ResNet-50 is "French horn," and that is the most commonly used term in the United States, where I live.

Get
the MATLAB code

Published with MATLAB® R2017b

This is the second post in the series on using deep learning for automated driving. In the first post I covered object detection (specifically vehicle detection). In this post I will go over how deep learning is used to find lane boundaries. Lane Detection Lane detection is the identification of the location... read more >>

]]>*Labeled objects and lane boundaries.*

*Coefficients of parabolas representing lane boundaries.*

originalConvNet = alexnetOnce I have the network loaded into MATLAB I need to modify its structure slightly to change it from a classification network into a regression network. Notice in the code below that, I have 6 outputs corresponding to the three coefficients for the parabola representing each lane boundary(left and right).

% Extract layers from the original network layers = originalConvNet.Layers % Net surgery % Replace the last few fully connected layers with suitable size layers layers(20:25) = []; outputLayers = [ ... fullyConnectedLayer(16, 'Name', 'fcLane1'); reluLayer('Name','fcLane1Relu'); fullyConnectedLayer(6, 'Name', 'fcLane2'); regressionLayer('Name','output')]; layers = [layers; outputLayers]I used an NVIDIA Titan X (Pascal) GPU to train this network. As you can see in the figure below it took 245 seconds to train the network. This time is lower than I expected mostly due to the fact that only a limited number of weights from the new layers are being learned, and also because MATLAB automatically uses CUDA and cuDNN to accelerate the training process when a GPU is available.

*Training progress to train lane boundary detection regression network on an NVIDIA Titan X GPU.*

*Output of lane boundary detection network.*

This is a guest post from Avinash Nehemiah, Avi is a product manager for computer vision and automated driving. I often get questions from friends and colleagues on how automated driving systems perceive their environment and make “human-like” decisions and how MATLAB is used in these systems. Over the next two blog posts... read more >>

]]>- Vehicle detection (this post)
- Lane detection (next post)

*Output of a vehicle detector that locates and classifies different types of vehicles.*

*Raw input image (left) and input image with labeled ground truth (right).*

*Screen shot of Ground Truth Labeler app designed to label video and image data.*

*Process of automating ground truth labeling using MATLAB.*

% Create image input layer. inputLayer = imageInputLayer([32 32 3]);The middle layers are the core building blocks of the network, with repeated sets of convolution, ReLU and pooling layers. For our example, I’ll use just a couple of layers. You can always create a deeper network by repeating these layers to improve accuracy or if you want to incorporate more classes into the detector. You can learn more about the different types of layers available in the Neural Network Toolbox documentation.

% Define the convolutional layer parameters. filterSize = [3 3]; numFilters = 32; % Create the middle layers. middleLayers = [ convolution2dLayer(filterSize, numFilters, 'Padding', 1) reluLayer() convolution2dLayer(filterSize, numFilters, 'Padding', 1) reluLayer() maxPooling2dLayer(3, 'Stride',2) ];The final layers of a CNN are typically a set of fully connected layers and a softmax loss layer. In this case, I’ve added a ReLU nonlinearity between the fully connected layers to improve detector performance since our training set for this detector wasn’t as large as I would like.

finalLayers = [ % Add a fully connected layer with 64 output neurons. The output size % of this layer will be an array with a length of 64. fullyConnectedLayer(64) % Add a ReLU non-linearity. reluLayer() % Add the last fully connected layer. At this point, the network must % produce outputs that can be used to measure whether the input image % belongs to one of the object classes or background. This measurement % is made using the subsequent loss layers. fullyConnectedLayer(width(vehicleDataset)) % Add the softmax loss layer and classification layer. softmaxLayer() classificationLayer() ]; layers = [ inputLayer middleLayers finalLayers ]To train the object detector, I pass the "layers" network structure to the "trainFasterRCNNObjectDetector" function. If you have a GPU installed, the algorithm will default to using the GPU. If you want to train without a GPU or use multiple GPUs, you can do so by adjusting the "ExecutionEnvironment" parameter in "trainingOptions".

detector = trainFasterRCNNObjectDetector(trainingData, layers, options, ... 'NegativeOverlapRange', [0 0.3], ... 'PositiveOverlapRange', [0.6 1], ... 'BoxPyramidScale', 1.2);Once training is done, try it out on a few test images to see if the detector is working properly. I used the following code to test the detector on a single image.

% Read a test image. I = imread('highway.png'); % Run the detector. [bboxes, scores] = detect(detector, I); % Annotate detections in the image. I = insertObjectAnnotation(I, 'rectangle', bboxes, scores); figure imshow(I)

*Detected bounding boxes and scores from Faster R-CNN vehicle detector.*

The first is an importer for TensorFlow-Keras models. This submission enables you to import a pretrained Keras model and weights and then use the model for prediction or transfer learning. Or, you can import the layer architecture as a Layer array or a LayerGraph object and then train the model.

The second is the pretrained ResNet-50 model. This model, which has been trained on a subset of the ImageNet database, won the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) competition in 2015. The model is trained on more than a million images, has 177 layers in total, and can classify images into 1000 object categories (e.g. keyboard, mouse, pencil, and many animals).

]]>MATLAB users ask us a lot of questions about GPUs, and today I want to answer some of them. I hope you'll come away with a basic sense of how to choose a GPU card to help you with deep learning in MATLAB.I asked Ben Tordoff for help. I first... read more >>

]]>MATLAB users ask us a lot of questions about GPUs, and today I want to answer some of them. I hope you'll come away with a basic sense of how to choose a GPU card to help you with deep learning in MATLAB.

I asked Ben Tordoff for help. I first met Ben about 12 years ago, when he was giving the Image Processing Toolbox development a LOT of feedback about our functions. Since then, he has moved into software development, and he now leads the team responsible for GPU, distributed, and tall array support in MATLAB and the Parallel Computing Toolbox.

The function `gpuDevice` tells you about your GPU hardware. I asked Ben to walk me through the output of `gpuDevice` on my computer.

gpuDevice

ans = CUDADevice with properties: Name: 'TITAN Xp' Index: 1 ComputeCapability: '6.1' SupportsDouble: 1 DriverVersion: 9 ToolkitVersion: 8 MaxThreadsPerBlock: 1024 MaxShmemPerBlock: 49152 MaxThreadBlockSize: [1024 1024 64] MaxGridSize: [2.1475e+09 65535 65535] SIMDWidth: 32 TotalMemory: 1.2885e+10 AvailableMemory: 1.0425e+10 MultiprocessorCount: 30 ClockRateKHz: 1582000 ComputeMode: 'Default' GPUOverlapsTransfers: 1 KernelExecutionTimeout: 1 CanMapHostMemory: 1 DeviceSupported: 1 DeviceSelected: 1

Ben: "That's a Titan Xp card. You've got a pretty good GPU there -- a lot better than the one I've got, at least for deep learning." (I'll explain this comment below.)

"An Index of 1 means that the NVIDIA driver thinks this GPU is the most powerful one installed on your computer. And `ComputeCapability` refers to the generation of computation capability supported by this card. The sixth generation is known as Pascal." As of the R2017b release, GPU computing with MATLAB and Parallel Computing Toolbox requires a `ComputeCapability` of at least 3.0.

The other information provided by `gpuDevice` is mostly useful to the developers writing low-level GPU computation routines, or for troubleshooting. There's one other number, though, that might be helpful to you when comparing GPUs. The `MultiprocessorCount` is effectively the number of chips on the GPU. "The difference between a high end card and a low end card within the same generation often comes down to the number of chips available."

The next thing Ben and I discussed was the output of GPUBench, a GPU performance measurement tool maintained by Ben's team. You can get it from the MATLAB Central File Exchange. Here's a portion of the report:

*GFLOP* is a unit of computational speed. 1 GFLOP is roughly 1 billion floating point operations per second. The report measures computational speed for both double-precision and single-precision floating point. Some cards excel at double precision, and some do better at single precision. The report shows the best double precision cards at the top because that is most important for general MATLAB computing.

The report includes three different computational benchmarks: MTimes (matrix multiplication), backslash (linear system solving), and FFT. The matrix multiplication benchmark is best at measuring pure computation speed, and so it has the highest GFLOP numbers. The FFT and backslash benchmarks, on the other hand, involve more of a mixture of computation and I/O, so the reported GFLOP rates are lower.

My Titan Xp card is better than my CPU ("Host PC" in the table above) for double precision computing, but it's definitely slower than the top cards listed. So, why did Ben tell me that my card was so good for deep learning? It's because of the right-hand column of the report, which focuses on *single-precision computation*. For image processing and deep learning, single-precision speed is more important than double-precision speed.

And the Titan Xp is blazingly fast at single-precision computation, with a whopping 11,000 GFLOPS for matrix multiplication with large matrices. If you're interested, you can drill into the GPUBench report for more details, like this:

I asked Ben for a little help understanding the wide variety of GPU cards made by NVIDIA. "Well, for deep learning, you can probably focus just on three lines of cards: GeForce, Titan, and Tesla. The GeForce cards are the cheapest ones with decent compute performance, but you have to keep in mind that they don't work if you are using remote desktop software. The Titan is kind of a souped-up version of GeForce that does have remote desktop support. And the Tesla cards are intended as high-performance cards for compute servers in double-precision applications."

Many of the deep learning functions in Neural Network Toolbox and other products now support an option called `'ExecutionEnvironment'`. The choices are: `'auto'`, `'cpu'`, `'gpu'`, `'multi-gpu'`, and `'parallel'`. You can use this option to try some network training and prediction computations to measure the practical GPU impact on deep learning on your own computer.

I'm going to experiment with this option using the "Train a Convolutional Neural Network for Regression" example. I'm just going to do the training step here, not the full example. First, I'll use my GPU.

options = trainingOptions('sgdm','InitialLearnRate',0.001, ... 'MaxEpochs',15); net = trainNetwork(trainImages,trainAngles,layers,options);

Training on single GPU. Initializing image normalization. |=========================================================================================| | Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Base Learning| | | | (seconds) | Loss | RMSE | Rate | |=========================================================================================| | 1 | 1 | 0.01 | 352.3131 | 26.54 | 0.0010 | | 2 | 50 | 0.75 | 114.6249 | 15.14 | 0.0010 | | 3 | 100 | 1.40 | 69.1581 | 11.76 | 0.0010 | | 4 | 150 | 2.04 | 52.7575 | 10.27 | 0.0010 | | 6 | 200 | 2.69 | 54.4214 | 10.43 | 0.0010 | | 7 | 250 | 3.33 | 40.6091 | 9.01 | 0.0010 | | 8 | 300 | 3.97 | 29.9065 | 7.73 | 0.0010 | | 9 | 350 | 4.63 | 28.4160 | 7.54 | 0.0010 | | 11 | 400 | 5.28 | 28.4920 | 7.55 | 0.0010 | | 12 | 450 | 5.92 | 21.7896 | 6.60 | 0.0010 | | 13 | 500 | 6.56 | 22.7835 | 6.75 | 0.0010 | | 15 | 550 | 7.20 | 24.8388 | 7.05 | 0.0010 | | 15 | 585 | 7.66 | 17.7162 | 5.95 | 0.0010 | |=========================================================================================|

You can see in the "Elapsed Time" column that the training for this simple example took about 8 seconds.

Now let's repeat the training using just the CPU.

options = trainingOptions('sgdm','InitialLearnRate',0.001, ... 'MaxEpochs',15,'ExecutionEnvironment','cpu'); net = trainNetwork(trainImages,trainAngles,layers,options);

Initializing image normalization. |=========================================================================================| | Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Base Learning| | | | (seconds) | Loss | RMSE | Rate | |=========================================================================================| | 1 | 1 | 0.17 | 354.9253 | 26.64 | 0.0010 | | 2 | 50 | 6.74 | 117.6613 | 15.34 | 0.0010 | | 3 | 100 | 13.31 | 92.0581 | 13.57 | 0.0010 | | 4 | 150 | 20.10 | 57.7432 | 10.75 | 0.0010 | | 6 | 200 | 26.66 | 50.4582 | 10.05 | 0.0010 | | 7 | 250 | 33.35 | 35.4191 | 8.42 | 0.0010 | | 8 | 300 | 40.06 | 30.0699 | 7.75 | 0.0010 | | 9 | 350 | 46.70 | 24.5073 | 7.00 | 0.0010 | | 11 | 400 | 53.35 | 28.2483 | 7.52 | 0.0010 | | 12 | 450 | 59.95 | 23.1092 | 6.80 | 0.0010 | | 13 | 500 | 66.54 | 18.9768 | 6.16 | 0.0010 | | 15 | 550 | 73.10 | 15.1666 | 5.51 | 0.0010 | | 15 | 585 | 77.78 | 20.5303 | 6.41 | 0.0010 | |=========================================================================================|

That took about 10 times longer than training using the GPU. For realistic networks, we expect the difference to be even greater. With more powerful deep learning networks that take hours or days to train, you can see why we recommend using a good GPU for substantial deep learning work.

I hope you find this information helpful. Good luck setting up your own deep learning system with MATLAB!

PS. Thanks, Ben!

Get
the MATLAB code

Published with MATLAB® R2017b

The R2017b release of MathWorks products shipped just two weeks ago, and it includes many new capabilities for deep learning. Developers on several product teams have been working hard on these capabilities, and everybody is excited to see them make it into your hands. Today, I'll give you a little... read more >>

]]>The R2017b release of MathWorks products shipped just two weeks ago, and it includes many new capabilities for deep learning. Developers on several product teams have been working hard on these capabilities, and everybody is excited to see them make it into your hands. Today, I'll give you a little tour of what you can expect when you get a chance to update to the new release.

The heart of deep learning for MATLAB is, of course, the Neural Network Toolbox. The Neural Network Toolbox introduced two new types of networks that you can build and train and apply: directed acyclic graph (DAG) networks, and long short-term memory (LSTM) networks.

In a DAG network, a layer can have inputs from multiple layers instead of just one one. A layer can also output to multiple layers. Here's a sample from the example Create and Train DAG Network for Deep Learning.

You can try out the pretrained GoogLeNet model, which is a DAG network that you can load using `googlenet`.

Experiment also with long short-term memory (LSTM) networks, which have the ability to learn long-term dependencies in time-series data.

There's a pile of new layer types, too: batch normalization, transposed convolution, max unpooling, leaky ReLU, clipped rectified ReLU, addition, and depth concatenation.

My colleague Joe used the Neural Network Toolbox to define his own type of network layer based on a paper he read a couple of months ago. I'll show you his work in detail a little later this fall.

When you train your networks, you can now plot the training progress. You can also validate network performance and automatically halt training based on the validation metrics. Plus, you can find optimal network parameters and training options using Bayesian optimization.

Automatic image preprocessing and augmentation is now available for network training. Image augmentation is the idea of increasing the training set by randomly applying transformations, such as resizing, rotation, reflection, and translation, to the available images.

As an image processing algorithms person, I am especially intrigued by the new semantic segmentation capability, which lets you classify pixel regions and visualize the results.

See "Semantic Segmentation Using Deep Learning" for a detailed example using the CamVid dataset from the University of Cambridge.

If you are implementing deep learning methods in embedded system, take a look at GPU Coder, a brand new product in the R2017b release. GPU Coder generates CUDA from MATLAB code for deep learning, embedded vision, and autonomous systems. The generated code is well optimized, as you can see from this performance benchmark plot.

*MathWorks benchmarks of inference performance of AlexNet using GPU acceleration, Titan XP GPU, Intel® Xeon® CPU E5-1650 v4 at 3.60GHz, cuDNN v5, and Windows 10. Software versions: MATLAB (R2017b), TensorFlow (1.2.0), MXNet (0.10), and Caffe2 (0.8.1).*

I have just scratched the surface of the deep learning capabilities in the ambitious R2017b release.

Here are some additional sources of information.

- R2017b Highlights
- Neural Network Toolbox (doc, release notes)
- Parallel Computing Toolbox (doc, release notes)
- Computer Vision System Toolbox (doc, release notes)
- Image Processing Toolbox (doc, release notes)
- GPU Coder (product info)

Get
the MATLAB code

Published with MATLAB® R2017b

Hello, and welcome to the new MATLAB Central blog on deep learning! In my 24th year of MATLAB and toolbox development and design, I am excited to be tackling this new project.Deep learning refers to a collection of machine learning techniques that are based on neural networks that have a... read more >>

]]>Hello, and welcome to the new MATLAB Central blog on deep learning! In my 24th year of MATLAB and toolbox development and design, I am excited to be tackling this new project.

*Deep learning* refers to a collection of machine learning techniques that are based on neural networks that have a large number of layers (hence "deep"). By training these networks on labeled data sets, they can achieve state-of-the-art accuracy on classification tasks using images, text, and sound as inputs.

Because of my background in image processing, I have followed the rapid progress in deep learning over the past several years with great interest. There is much that I would like to learn and share with you about the area, especially with respect to exploring deep learning ideas with MATLAB. To that end, several developers have volunteered to lend a hand with topics and code and technical guidance as we explore. They are building deep learning capabilities as fast as they can in products like:

- Neural Network Toolbox
- Parallel Computing Toolbox
- Image Processing Toolbox
- Computer Vision System Toolbox
- Automated Driving System Toolbox
- GPU Coder

I will be introducing them to you as we get into the details of deep learning with MATLAB.

If you have followed my image processing blog posts, you can expect a similar style here. Topics will be a mix of concept tutorials, examples and case studies, feature exploration, and tips. I imagine we'll discuss things like performance, GPU hardware, and online data sets. Maybe we'll do some things just for fun, like the LSTM network built last month by a MathWorks developer that spouts Shakespeare-like verse.

To subscribe, either using email or RSS, click on the "Subscribe" link at the top of the page.

I'll leave you with a little teaser based on AlexNet. I just plugged in a webcam and connected to it in MATLAB.

c = webcam

c = webcam with properties: Name: 'Microsoft® LifeCam Cinema(TM)' Resolution: '640x480' AvailableResolutions: {1×11 cell} ExposureMode: 'auto' WhiteBalanceMode: 'auto' Focus: 33 BacklightCompensation: 5 Sharpness: 25 Zoom: 0 FocusMode: 'auto' Tilt: 0 Brightness: 143 Pan: 0 WhiteBalance: 4500 Saturation: 83 Exposure: -6 Contrast: 5

Next, I loaded an AlexNet network that has been pretrained with a million images. The network can classify images into 1,000 different object categories.

nnet = alexnet

nnet = SeriesNetwork with properties: Layers: [25×1 nnet.cnn.layer.Layer]

You could also try other networks. For example, after you have upgraded to R2017b, you could experiment with GoogLeNet by using `net = googlenet`.

What do these 25 network layers look like?

nnet.Layers

ans = 25x1 Layer array with layers: 1 'data' Image Input 227x227x3 images with 'zerocenter' normalization 2 'conv1' Convolution 96 11x11x3 convolutions with stride [4 4] and padding [0 0 0 0] 3 'relu1' ReLU ReLU 4 'norm1' Cross Channel Normalization cross channel normalization with 5 channels per element 5 'pool1' Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0] 6 'conv2' Convolution 256 5x5x48 convolutions with stride [1 1] and padding [2 2 2 2] 7 'relu2' ReLU ReLU 8 'norm2' Cross Channel Normalization cross channel normalization with 5 channels per element 9 'pool2' Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0] 10 'conv3' Convolution 384 3x3x256 convolutions with stride [1 1] and padding [1 1 1 1] 11 'relu3' ReLU ReLU 12 'conv4' Convolution 384 3x3x192 convolutions with stride [1 1] and padding [1 1 1 1] 13 'relu4' ReLU ReLU 14 'conv5' Convolution 256 3x3x192 convolutions with stride [1 1] and padding [1 1 1 1] 15 'relu5' ReLU ReLU 16 'pool5' Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0] 17 'fc6' Fully Connected 4096 fully connected layer 18 'relu6' ReLU ReLU 19 'drop6' Dropout 50% dropout 20 'fc7' Fully Connected 4096 fully connected layer 21 'relu7' ReLU ReLU 22 'drop7' Dropout 50% dropout 23 'fc8' Fully Connected 1000 fully connected layer 24 'prob' Softmax softmax 25 'output' Classification Output crossentropyex with 'tench', 'goldfish', and 998 other classes

I happen to know that 'coffee mug' is one of the categories. How will the network do with the 23-year-old MATLAB "Picture the Power" mug from my bookshelf?

Here's the snapshot I took with my webcam using `pic = snapshot(c)`.

imshow(pic)

The first layer accepts inputs. It will tell us the image size that the network accepts.

nnet.Layers(1)

ans = ImageInputLayer with properties: Name: 'data' InputSize: [227 227 3] Hyperparameters DataAugmentation: 'none' Normalization: 'zerocenter'

So I need to resize the snapshot to be 227x227 before I feed it to the network.

pic2 = imresize(pic,[227 227]); imshow(pic2)

Now I can try to classify it.

label = classify(nnet,pic2)

label = categorical coffee mug

OK! But I wonder what else the network thought it might be? The `predict` function can return the scores for all the categories.

p = predict(nnet,pic2); plot(p)

There are several notable prediction peaks. I'll use the `maxk` function (new in R2017b) to find where they are, and then I'll look up those locations in the list of category labels in the network's last layer.

[p3,i3] = maxk(p,3);

p3

p3 = 1×3 single row vector 0.2469 0.1446 0.1377

i3

i3 = 505 733 623

nnet.Layers(end)

ans = ClassificationOutputLayer with properties: Name: 'output' ClassNames: {1000×1 cell} OutputSize: 1000 Hyperparameters LossFunction: 'crossentropyex'

nnet.Layers(end).ClassNames(i3)

ans = 3×1 cell array {'coffee mug' } {'Polaroid camera'} {'lens cap' }

Hmm. I'm glad coffee mug came out on top. I can't pour coffee into a camera or a lens cap!

Remember, for options to follow along with this new blog, click on the "Subscribe" link at the top of the page.

Finally, a note for my image processing blog readers: Don't worry, I will continue to write for that blog, too.

Get
the MATLAB code

Published with MATLAB® R2017b