{"id":5249,"date":"2020-10-30T09:03:18","date_gmt":"2020-10-30T13:03:18","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=5249"},"modified":"2021-04-06T15:45:41","modified_gmt":"2021-04-06T19:45:41","slug":"deep-wine-designer","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2020\/10\/30\/deep-wine-designer\/","title":{"rendered":"Deep Wine Designer"},"content":{"rendered":"<span style=\"font-family: calibri; font-size: 15px;\">This post is another from Ieuan Evans, who brought us <a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2018\/10\/18\/deep-beer-designer\/\"><em>Deep Beer Designer<\/em><\/a>, back today to talk about wine! It's a longer post than usual, but packed with useful information for Deep Learning for Text. If you're looking for a unique text example, this is for you!<\/span>\r\n<h6><\/h6>\r\n<h2><img decoding=\"async\" loading=\"lazy\" width=\"3955\" height=\"1282\" class=\"alignnone size-full wp-image-5295\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/Wine_blog_vineyard.jpg\" alt=\"\" \/><\/h2>\r\n<h6><\/h6>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Following my last blog post about how to use MATLAB to choose the <a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2018\/10\/18\/deep-beer-designer\/\"><strong><u>perfect beer<\/u><\/strong><\/a>, I decided to explore what I could do with deep learning and wine.<\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Similar to beer, the number of different choices of wine is overwhelming. Before I can select a wine, I need to work out what grape varieties of wine I actually like.<\/span>\r\n\r\n<span style=\"font-size: 14px;\">Could MATLAB help me with this? If I describe my\r\nperfect wine to MATLAB, will it be able to select one for me? Which characteristics of different wine varieties stand out?<\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">There are two common approaches to a text classification problem like this:<\/span>\r\n<h6><\/h6>\r\n<ol>\r\n \t<li><span style=\"font-size: 14px;\">Train a long short-term memory (LSTM) network that treats text data as a time series and learns long-term dependencies between time steps.<\/span><\/li>\r\n \t<li><span style=\"font-size: 14px;\">Train a convolutional neural network (CNN) that treats text data as hyperspectral images and learns localized features by applying sliding\r\nconvolutional filters.<\/span><\/li>\r\n<\/ol>\r\n<span style=\"font-size: 14px;\">In this blog post, I will focus on the second approach: classifying wine grape varieties given a description by converting them to images and using a CNN.<\/span>\r\n<h6><\/h6>\r\n<h2>Background: Word Embeddings<\/h2>\r\n<span style=\"font-size: 14px;\">To convert text to hyperspectral images, we can use a word embedding that maps a sequence of words to a 2-D array representing an image. In other words, a word embedding maps words to high-dimensional vectors. These vectors sometimes have interesting properties. For example, given the word vectors corresponding to Italy, Rome, Paris, and France you might discover the relationship:<\/span>\r\n<h6><\/h6>\r\n<p style=\"text-align: center;\"><em>Italy \u2013 Rome + Paris \u2248 France<\/em><\/p>\r\n\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">That is, the vector corresponding to the word <em>Italy<\/em> without the components of the word <em>Rome<\/em> but with the added components of the word <em>Paris<\/em> is\r\napproximately equal to the vector corresponding to <em>France<\/em>.<\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">To do this in MATLAB, we can use the Text Analytics Toolbox\u2122 Model for <em>fastText English 16 Billion Token Word Embedding<\/em> support package. This word embedding maps approximately 1,000,000 English words to 1-by-300 vectors. Let's load the word embedding using the fastTextWordEmbedding function.<\/span>\r\n<pre>emb = fastTextWordEmbedding;<\/pre>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Let's visualize the word vectors <em>Italy, Rome, Paris<\/em>, and <em>France<\/em>. By calculating the word vectors using the <span style=\"font-family: courier;\">word2vec<\/span> function, we can see the relationship:<\/span>\r\n<h6><\/h6>\r\n<div class=\"inlineWrapper\">\r\n<pre class=\"S9\">italy = word2vec(emb,\"Italy\");\r\nrome = word2vec(emb,\"Rome\");\r\nparis = word2vec(emb,\"Paris\");\r\n\r\nword = vec2word(emb,italy - rome + paris)<\/pre>\r\n<\/div>\r\n<span style=\"font-family: consolas; font-size: 13px;\">\u00a0 \u00a0word = \"France\" <\/span>\r\n<h6><\/h6>\r\n<h2>Words to Images<\/h2>\r\n<span style=\"font-size: 14px;\">To use a word embedding to map a sequence of words to an image, let's split the text into words using <span style=\"font-family: courier;\">tokenizedDocument<\/span> and convert the words to a sequence vectors using <span style=\"font-family: courier;\">doc2sequence<\/span>.<\/span>\r\n<div class=\"inlineWrapper\">\r\n<pre class=\"S9\">str = \"The rain in Spain falls mainly on the plain.\";\r\ndocument = tokenizedDocument(str);\r\nsequence = doc2sequence(emb,document);<\/pre>\r\n<span style=\"font-size: 14px;\">Let's view the hyperspectral images corresponding to this sequence of words.<\/span>\r\n\r\n<\/div>\r\n<pre>figure\r\nI = sequence{1};\r\nimagesc(I,[-1 1])\r\ncolorbar\r\nxlabel(\"Word Index\")\r\nylabel(\"Embedding Feature\")\r\ntitle(\"Word Vectors\")<\/pre>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"560\" height=\"420\" class=\"alignnone size-full wp-image-6081\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/word_embedding-image.png\" alt=\"\" \/>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">The resulting image is not particularly exciting. It is a C-by-S array, where C is the number of features of the word embedding (the embedding\r\ndimension) and S is the number of words in the text (the sequence length). When formatted as 1-by-N hyperspectral image with C channels, you can\r\ninput this data to a CNN and apply sliding filters of height 1. These are known as 1-D convolutions.<\/span>\r\n<h6><\/h6>\r\n<h2>Load Wine Reviews Data<\/h2>\r\n<span style=\"font-size: 14px;\">Let's download the <a href=\"https:\/\/www.kaggle.com\/zynicide\/wine-reviews\/\"><span style=\"text-decoration: underline;\">Wine Reviews<\/span><\/a> data from Kaggle and extract the data into a folder named wine-reviews. After downloading the data, we can read the data from winemag-data-130k-v2.csv into a table using the <span style=\"font-family: courier;\">readtable<\/span> function. The data contains special characters such as the \u00e9 in Ros\u00e9, so we must specify the text encoding option too.<\/span>\r\n<h6><\/h6>\r\n<pre>filename = fullfile(\"wine-reviews\",\"winemag-data-130k-v2.csv\");\r\ndata = readtable(filename,\"Encoding\",\"UTF-8\");\r\ndata.variety = categorical(data.variety);<\/pre>\r\n<h6><\/h6>\r\n<h2>Explore Wine Reviews Data<\/h2>\r\n<span style=\"font-size: 14px;\">To get a feel for the data, let's visualize the text data using word clouds. First create a word cloud of the different grape varieties.<\/span>\r\n<h6><\/h6>\r\n<pre>figure; \r\nwordcloud(data.variety);\r\ntitle(\"Grape Varieties\")\r\n<\/pre>\r\n<h6><\/h6>\r\n\r\n<img decoding=\"async\" loading=\"lazy\" width=\"560\" height=\"420\" class=\"alignnone size-full wp-image-6083\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/word_cloud1.png\" alt=\"\" \/>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">To quickly verify whether text classification might be possible, let's quickly create word clouds for a selection of classes and inspect the differences between them. If you have Text Analytics Toolbox installed, then the wordcloud function automatically preprocesses string input. For better visualizations, let's also remove a list of common words and the grape varieties from the text.<\/span>\r\n<h6><\/h6>\r\n<pre>labels = [\"Gew\u00fcrztraminer\" \"Chardonnay\" \"Nebbiolo\" \"Malbec\"];\r\ncommonWords = [\"wine\" \"Drink\" \"drink\" \"flavors\" \"finish\" \"palate\" \"notes\" \"aromas\"]; \r\n\r\nfigure\r\nfor i = 1:4\r\n  subplot(2,2,i)\r\n  label = labels(i);\r\n  idx = data.variety == label;\r\n\r\n  str = data.description(idx);\r\n  documents = tokenizedDocument(str)\r\n  documents = removeWords(documents,commonWords);\r\n  documents = removeWords(documents,labels);\r\n  str = joinWords(documents);\r\n\r\n  wordcloud(str);\r\n  title(label)\r\nend<\/pre>\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"560\" height=\"420\" class=\"alignnone size-full wp-image-6087\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/word_cloud2-1.png\" alt=\"\" \/>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">The word clouds show that the distributions of words amongst each grape variety are different. Even though words like \"fruit\" and \"berry\" appear to commonly describe some of these varieties, the word clouds show that the distributions of words among each grape variety are different.<\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">This shows that there are grounds to train a classifier on the text data. Excellent!<\/span>\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" width=\"1920\" height=\"500\" class=\"alignnone size-full wp-image-5279\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/beer-and-wine-banner.jpg\" alt=\"\" \/><\/h6>\r\n<h2>Prepare Text Data for Deep Learning<\/h2>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">To classify text data using convolutions, we need to convert the text data into images. To do this, let's pad or truncate the observations to have a constant length S and convert the documents into sequences of word vectors of length C using the pretrained word embedding. We can then represent a document as a 1-by-S-by-C image (an image with height 1, width S, and C channels).<\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">To convert text data from a CSV file to images, I have a helper function at the end of this post called <span style=\"font-family: courier;\">transformTextData<\/span>. It creates a tabularTextDatastore object and uses the transform function with a custom transformation function that converts the data read from the tabularTextDatastore object to images for deep learning.<\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">In this example, we'll train a network with 1-D convolutional filters of varying widths. The width of each filter corresponds the number of words the filter can see (the n-gram length). The network has multiple branches of convolutional layers, so it can use different n-gram lengths.<\/span>\r\n<h6><\/h6>\r\n&nbsp;\r\n<h3>Clean up Data<\/h3>\r\n<span style=\"font-size: 14px;\">Remove reviews without a label.<\/span>\r\n<h6><\/h6>\r\n<pre>idxMissing = ismissing(data.variety);\r\ndata(idxMissing,:) = [];<\/pre>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Remove any reviews where the grape variety is not one of the top 200 varieties in the data. (If you can't find a wine you like in the top 200 choices available, MATLAB probably can't help you.)<\/span>\r\n<h6><\/h6>\r\n<pre>numClasses = 200; \r\n[classCounts,classNames] = histcounts(data.variety);\r\n\r\n[~,idx] = maxk(classCounts,numClasses);\r\nclassNames = classNames(idx);\r\n\r\nidx = ismember(data.variety,classNames);\r\ndata = data(idx,:);<\/pre>\r\n<span style=\"font-size: 14px;\">Remove the unused categories from the data.<\/span>\r\n<pre>data.variety = removecats(data.variety);\r\nclassNames = categories(data.variety);<\/pre>\r\n<h6><\/h6>\r\n<h3>Partition Data<\/h3>\r\n<span style=\"font-size: 14px;\">To help evaluate the performance of the network, let's partition the data into training, testing, and validation sets. Let's set aside 30% of the data for validation and testing (two partitions of 15%).<\/span>\r\n<h6><\/h6>\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><code>cvp = cvpartition(data.variety,'HoldOut',0.3);\r\n\r\nfilenameTrain = fullfile(\"wine-reviews\",\"wineReviews_\" + numClasses + \"_classes_Train.csv\");\r\ndataTrain = data(training(cvp),:);\r\nwritetable(dataTrain,filenameTrain,\"Encoding\",\"UTF-8\");\r\n\r\ndataHeldOut = data(test(cvp),:);\r\ncvp = cvpartition(dataHeldOut.variety,'HoldOut',0.5);\r\n\r\nfilenameValidation = fullfile(\"wine-reviews\",\"wineReviews_\" + numClasses + \"_classes_Validation.csv\");\r\ndataValidation = dataHeldOut(training(cvp),:);\r\nwritetable(dataValidation,filenameValidation,\"Encoding\",\"UTF-8\");\r\n\r\nfilenameTest = fullfile(\"wine-reviews\",\"wineReviews_\" + numClasses + \"_classes_Test.csv\");\r\ndataTest = dataHeldOut(test(cvp),:);\r\nwritetable(dataTest,filenameTest,\"Encoding\",\"UTF-8\");<\/code><\/td>\r\n<td style=\"padding: 10px;\"><a href=\"https:\/\/www.mathworks.com\/company\/mathworks-stories\/making-better-beer-and-wine-using-machine-learning.html\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-5281 size-medium\" style=\"font-size: 16px; font-weight: 400;\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/vineyard-of-the-future-300x205.jpg\" alt=\"\" width=\"300\" height=\"205\" \/><\/a>\r\n\r\n<em>Need a break from reading code? Read how researchers are using MATLAB for making better beer and wine in <a href=\"https:\/\/www.mathworks.com\/company\/mathworks-stories\/making-better-beer-and-wine-using-machine-learning.html\"><strong>this article<\/strong><\/a>.<\/em><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Bring in the description and variety fields from the table.<\/span>\r\n<h6><\/h6>\r\n<pre>miniBatchSize = 128;\r\nttdsTrain = tabularTextDatastore(filenameTrain, ...\r\n'SelectedVariableNames',[\"description\" \"variety\"], ...\r\n'ReadSize',miniBatchSize);<\/pre>\r\n<h6><\/h6>\r\n<h3>Specify Input Size<\/h3>\r\n<span style=\"font-size: 14px;\">To input the text data into the network, we need to convert the text to images with a fixed size by padding or truncating the sequences. Ideally, we need to choose a value that minimizes both the amount of padding added to the sequences and the amount of data discarded due to truncation. Let's try to approximate the number of words in each review by counting the number of spaces and plotting the sequence lengths in a histogram.<\/span>\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"560\" height=\"420\" class=\"alignnone size-full wp-image-6089\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/description_lengths.png\" alt=\"\" \/>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Most of the reviews contain 80 or fewer words. Let's use this as our sequence length by specifying 80 in our custom transform function. The transformTextData function, takes the data read from a tabularTextDatastore object and returns a table of predictors and responses.<\/span>\r\n<h6><\/h6>\r\n<pre>sequenceLength = 80;\r\ntdsTrain = transform(ttdsTrain, @(data) transformTextData(data,sequenceLength,emb,classNames));<\/pre>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">The predictors are 1-by-S-by-C arrays, where S is the sequence length and C is the number of features. The responses are the categorical labels.<\/span>\r\n<pre>preview(tdsTrain)<\/pre>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"292\" height=\"239\" class=\"alignnone size-full wp-image-5329\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/previewTrain.jpg\" alt=\"\" \/>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">For validation, let's also create a transformed datastore containing the validation data using the same steps.<\/span>\r\n<pre>ttdsValidation = tabularTextDatastore(filenameValidation, ...\r\n    'SelectedVariableNames',[\"description\" \"variety\"], ...\r\n    'ReadSize',miniBatchSize);\r\ntdsValidation = transform(ttdsValidation, @(data) transformTextData(data,sequenceLength,emb,classNames))<\/pre>\r\n<h2>Define Network Architecture<\/h2>\r\n<span style=\"font-size: 14px;\">Let's now define the network architecture for the classification task, we can use deepNetworkDesigner to create the network.<\/span>\r\n<h6><\/h6>\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><img decoding=\"async\" loading=\"lazy\" width=\"560\" height=\"420\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/network_description-1.png\" alt=\"\" class=\"alignnone size-full wp-image-6093\" \/><\/td>\r\n<td>The following describes the network architecture:\r\n<ul>\r\n \t<li>An input size of 1-by-S-by-C, where S is the sequence length and C is the number of features (the embedding dimension).<\/li>\r\n \t<li>For the n-gram lengths 1 through 5, let's create blocks of layers containing a convolutional layer, a batch normalization layer, a ReLU layer, a dropout layer, and a max pooling layer.<\/li>\r\n \t<li>For each block, let's specify 256 convolutional filters of size 1-by-N and pooling regions of size 1-by-S, where N is the n-gram length.<\/li>\r\n \t<li>Let's connect the input layer to each block and concatenate the outputs of the blocks using a depth concatenation layer.<\/li>\r\n \t<li>Finally, to classify the outputs, let's include a fully connected layer with output size K, a softmax layer, and a classification layer, where K is the number of classes.<\/li>\r\n<\/ul>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h6><\/h6>\r\n<h3>Specify Training Options<\/h3>\r\n<pre>numIterationsPerEpoch = floor(numObservationsTrain\/miniBatchSize);\r\noptions = trainingOptions('adam', ...\r\n'MaxEpochs',50, ...\r\n'Shuffle','never', ...\r\n'MiniBatchSize',miniBatchSize, ...\r\n'ValidationData',tdsValidation, ...\r\n'ValidationFrequency',numIterationsPerEpoch, ...\r\n'Plots','training-progress', ...\r\n'Verbose',false);<\/pre>\r\n<h6><\/h6>\r\n<h6><\/h6>\r\n<h3>Train Network<\/h3>\r\n<span style=\"font-size: 14px;\">Finally, we can train the network! Let's train the network using the <span style=\"font-family: courier;\">trainNetwork<\/span> function. Depending on your hardware, this can take a long time. If you are having trouble with hardware or training, you can email Johanna for a copy of the trained network. <\/span>\r\n<pre>caberNet = trainNetwork(tdsTrain,lgraph,options);\r\nsave(\"caberNet.mat\",\"caberNet\")\r\n<\/pre>\r\n<em>*Note from Johanna to Ieuan: I have a no-pun policy on this blog, and \"caberNet\" is borderline, so consider this a warning.<\/em>\r\n\r\n<img decoding=\"async\" loading=\"lazy\" width=\"1200\" height=\"719\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/trainingPlot.jpg\" alt=\"\" class=\"alignnone size-full wp-image-6095\" \/>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Here we can see that the training accuracy converges to about 93% and the validation accuracy converges to about 63%. This suggests that the\r\nnetwork might be overfitting to the training data. In particular, it might be learning characteristics of the training data that does not generalize well to the\r\nvalidation data. More investigation is needed here!<\/span>\r\n<h6><\/h6>\r\n<h2>Test Network<\/h2>\r\n<span style=\"font-size: 14px;\">Now the network is trained, we can test it using the held-out test data. First, let's create a transformed datastore containing the held-out test data.<\/span>\r\n<pre>ttdsTest = tabularTextDatastore(filenameTest, ...\r\n    'SelectedVariableNames',[\"description\" \"variety\"], ...\r\n    'ReadSize',miniBatchSize);\r\ntdsTest = transform(ttdsTest, @(data) transformTextData(data,sequenceLength,emb,classNames));\r\n\r\ntbl = readall(ttdsTest);\r\nlabelsTest = tbl.variety;\r\nYTest = categorical(labelsTest,classNames);\r\n\r\nYPred = classify(caberNet,tdsTest,'MiniBatchSize',miniBatchSize);\r\n<\/pre>\r\n<em>accuracy = 0.6397<\/em>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Here, we can see that the network is about 64% accurate on the held-out test data. Given the varied and subjective nature of wine tasting notes, I think this is a good score!<\/span>\r\n<h6><\/h6>\r\n<h2>Make Predictions on New Data<\/h2>\r\n<span style=\"font-size: 14px;\">The next step is to try out the classifier in the real world! Here are some notes from a recent wine tasting I attended.<\/span>\r\n<h6><\/h6>\r\n<ul>\r\n \t<li>\u201cA crisp, golden coloured, bubbly wine. On the nose, there are aromas of citrus fruits alongside ripe stone fruits. On the palete, vibrant notes of\r\napple and creamy textures.\u201d<\/li>\r\n \t<li>\u201cStraw coloured with a slight hint of green. Notes of peaches and nectarines. Rich and slightly sweet, intense notes of lychee. Strong minerality with\r\nsome sweetness.\u201d<\/li>\r\n \t<li>\u201cPale straw in colour with zesty citrus fruit on the nose. On the palate, intense gooseberry and crisp lime flavours with slight hints of oak.\u201d<\/li>\r\n \t<li>\u201cDeep golden colour. Strong aromas of toast and butter with strong hints of oak. On the palate, intense flavours of ripe banana and cooked apples.\u201d<\/li>\r\n \t<li>\u201cVery light bodied wine and pale in colour. Aromas of strawberries and forest fruits. Slightly oaked with slight tannins. Vibrant taste of red cherries.\u201d<\/li>\r\n \t<li>\u201cMedium bodied and brick-red in colour. On the nose, black cherry, and violet. Complex flavours including strong tannins coupled with flavours of\r\nblack fruits and pepper.\u201d<\/li>\r\n \t<li>\u201cDeep ruby red in colour. Aromas of dark cherries, oak, and clove. Slightly smokey in taste with strong hints of blackberries and licorce.\u201d<\/li>\r\n \t<li>\u201cStrong aromas of blackcurrent and blueberries. A very big wine with high alcohol content. Intense flavour on the palate with a long finish. Vibrant\r\nflavors of black fruits and spices.\u201d<\/li>\r\n<\/ul>\r\n<span style=\"font-size: 14px;\">To make predictions using these notes using the network, we need to convert the text into sequences using the same steps as the training process. By using the <span style=\"font-family: courier;\">text2sequence<\/span> function, which I've included at the end of the blog post, we can convert a string array to a table of word vector sequences of a specified length.<\/span>\r\n<h6><\/h6>\r\n<pre>sequencesNew = text2sequence(emb,str,sequenceLength);\r\n[YNewPred,scoresNew] = classify(caberNet,sequencesNew);\r\n\r\ntbl = table; \r\ntbl.PredictedVariety = YNewPred;\r\ntbl.TrueVariety = YNewTest<\/pre>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"342\" height=\"233\" class=\"alignnone size-full wp-image-5345\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/finalResults.jpg\" alt=\"\" \/>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Here, the network has classified four out of eight correctly. Though I'm tempted to let it get away with saying Cava is a sparkling blend (technically, the network is correct). Similarly, saying Syrah instead of Shiraz is forgivable since they are the same variety under different names.<\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">So let's say 6 out of 8... Great!<\/span>\r\n<h6><\/h6>\r\n<h2>Visualize Network Predictions<\/h2>\r\n<span style=\"font-size: 14px;\">For image classification problems, you can visualize the predictions of a network by taking an image, deleting a patch of the image, measure if the classification gets better or worse, then overlay the results on the image. In other words, if you delete a patch of the image and the classification gets worse, then that patch must contain features pertaining to the true class. Similarly, if you delete a patch of the image, and the classification gets better, then that patch must contain features pertaining to a different class and thus confuses the classifier.<\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">We can do this using the <u><a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ref\/occlusionsensitivity.html\">occlusionSensitivity<\/a><\/u> function. Let's select one of the observations of the text data where the network has predicted the correct label.<\/span>\r\n<pre>idxObservation = 2;\r\nstrNew = str(idxObservation)\r\n\r\nlabelTest = YNewTest(idxObservation)\r\n<\/pre>\r\n<em>strNew = \"Slightly straw colored with a hint of greenness. Notes of peaches and nectarines. Rich and slightly sweet, intense notes of lychee. A soft finish with some sweetness.\"<\/em>\r\n<h6><\/h6>\r\n<em>labelTest = \"Gew\u00fcrztraminer\"<\/em>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Let's view the occlusion sensitivity scores using the function <span style=\"font-family: courier;\">plotOcclusion<\/span>, which I have listed at the end of the blog post. This shows which patches of words contribute most to the prediction.<\/span>\r\n<pre>h = figure;\r\nh.Position(3) = 1.5 * h.Position(3);\r\nplotOcclusion(caberNet,emb,strNew,sequenceLength,labelTest)\r\n<\/pre>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"778\" height=\"424\" class=\"alignnone size-full wp-image-5351\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/plotOcclusion01.jpg\" alt=\"\" \/>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Here, we can see that the network has learned that the phrases \"Rich and slightly sweet\" and \"notes of lychee\" is a strong indication of the Gew\u00fcrztraminer variety, and similarly, the phrases \"straw colored\" and \"Notes of peaches\" are less characteristic for this variety.<\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Now, let's visualize one of the misclassified varieties using the same technique.<\/span>\r\n<h6><\/h6>\r\n<pre>idxObservation = 8;\r\nstrNew = str(idxObservation)\r\n\r\nlabelTest = YNewTest(idxObservation)<\/pre>\r\n<em>strNew = \"Strong aromas of black cherry. Powerful taste with a high alcohol content. Rich flavor with strong tannins and a long finish. Vibrant flavors of cherries and a hint of pepper.\"<\/em>\r\n<h6><\/h6>\r\nlabelTest = \"Zinfandel\"\r\n<pre>h = figure;\r\nh.Position(3) = 1.5 * h.Position(3);\r\nh.Position(4) = 1.5 * h.Position(4);\r\nplotOcclusion(caberNet,emb,strNew,sequenceLength,labelTest)<\/pre>\r\n<img decoding=\"async\" loading=\"lazy\" width=\"757\" height=\"610\" class=\"alignnone size-full wp-image-5353\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/plotOcclusion02.jpg\" alt=\"\" \/>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Here, we can see that the network understands many of the phrases as strong indications of the Merlot variety with the exception of \"high alcohol content\". Similarly, the second plot shows that the network understands only some of the phrases in the text as being characteristic of the Zinfandel variety, however, the phrase \"strong tannins\" and phrases containing \"cherry\" or \"cherries\" are particularly uncharacteristic in comparison.<\/span>\r\n<h6><\/h6>\r\n<span style=\"font-size: 14px;\">Perfect! Now I can get MATLAB to help me identify the wines I like. Furthermore, I can visualize the predictions made by the network and perhaps learn a few more things myself. I think I better test this network at a few more wine tastings...<\/span>\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<em><strong>All helper functions can be found using the Get the MATLAB Code link below.<\/strong><\/em>\r\n<h6><\/h6>\r\n<span style=\"font-family: calibri; font-size: 15px;\">Thanks to Ieuan for this very informative and wine-filled post. He originally wanted to title this post, \"Grapes of Math\" but I've implemented no-pun policy on the blog. I especially like that he field tests his code by going to wine tastings, now that's dedication! Have a question for Ieuan? Leave a comment below.<\/span>\r\n\r\n&nbsp;\r\n\r\n<script language=\"JavaScript\"> <!-- \r\n    function grabCode_a710f144b80042c592b9fe35aab1fc59() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='a710f144b80042c592b9fe35aab1fc59 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' a710f144b80042c592b9fe35aab1fc59';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2020 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<\/p>\r\n<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\r\n<p>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script>\r\n<p style=\"text-align: right; font-size: xx-small; font-weight: lighter; font-style: italic; color: gray;\">Copyright 2018 The MathWorks, Inc.\r\n<a><span class=\"get_ml_code\">Get the MATLAB code<noscript>(requires JavaScript)<\/noscript><\/span><\/a>\r\n\r\n\r\n<\/p>\r\n<!-- a710f144b80042c592b9fe35aab1fc59 ##### SOURCE BEGIN ##### %%Functions % The transformTextData function takes the data read from a tabularTextDatastore object % and returns a table of predictors and responses. The predictors are 1-by-sequenceLength-by-C arrays of word vectors given by the word embedding emb, % where C is the embedding dimension. The responses are categorical labels over the classes in classNames. function dataTransformed = transformTextData(data,sequenceLength,emb,classNames) textData = data{:,1}; labels = data{:,2}; predictors = text2sequence(emb,textData,sequenceLength); responses = categorical(labels,classNames); dataTransformed = predictors; dataTransformed.responses = responses; end % The text2sequence function converts a string array to a table of word vector sequences of the specified length and removes any specified words. % Removing words can help prevent the network from learning directly from the labels if they appear in the text. function tbl = text2sequence(emb,textData,sequenceLength) documents = tokenizedDocument(textData); predictors = doc2sequence(emb,documents,'Length',sequenceLength,'PaddingDirection','right'); predictors = cellfun(@(X) permute(X,[3 2 1]),predictors,'UniformOutput',false); tbl = table(predictors); end % The plotOcclusion function takes a network, word embedding, and plots the occlusion maps for the given string str with the specified sequence length and target class. function plotOcclusion(net,emb,str,sequenceLength,labelTest) % Determine predicted class and score. X = text2sequence(emb,str,sequenceLength); [YPred,scores] = classify(net,X); labelPred = string(YPred); idxLabelPred = net.Layers(end).ClassNames == labelPred; idxLabelTest = net.Layers(end).ClassNames == labelTest; scorePred = scores(idxLabelPred); scoreTest = scores(idxLabelTest); % If the predicted label is different to the test label, then create two % occlusion maps. if labelPred == labelTest labels = labelPred; else labels = [labelPred labelTest]; subplot(2,1,1) end % Calculate occlusion sensitivity. patchWidth = 5; scoreMap = occlusionSensitivity(net,X.predictors{1},labels, ... 'MaskSize',[1 patchWidth], ... 'Stride',1, ... 'MaskValue',0, ... 'OutputUpsampling','none', ... 'MaskClipping','off'); % Determine input number of words. documentsNew = tokenizedDocument(str); numWords = doclength(documentsNew); words = string(documentsNew); % Remove values that overlap edges. clippingSize = floor(patchWidth\/2); scoreMap = scoreMap(:,1:numWords-clippingSize,:); scoreMap = cat(2, nan(1,clippingSize,size(scoreMap,3)),scoreMap); scoreMap(:,end-(clippingSize-1):end,:) = nan; % Plot occlusion sensitivity for predicted label. bar(categorical(1:size(scoreMap,2)),scoreMap(:,:,1)) xticklabels(words) yline(0,'r--'); title(\"Word Occlusion - \"+ patchWidth + \"-grams\" + newline + ... \"Predicted Class: \" + labelPred + newline + ... \"Score: \" + scorePred) % If the predicted label is different to the test label, then plot % occlusion sensitivity for test label. if labelPred ~= labelTest subplot(2,1,2) scoreMapTest = scoreMap(:,:,2); bar(categorical(1:size(scoreMapTest,2)),scoreMapTest) xticklabels(words) yline(0,'r--'); title(\"Word Occlusion - \"+ patchWidth + \"-grams\" + newline + ... \"Test Class: \" + labelTest + newline + ... \"Score: \" + scoreTest) end end ##### SOURCE END ##### a710f144b80042c592b9fe35aab1fc59 --><!-- AddThis Sharing Buttons below -->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2020\/10\/Wine_blog_vineyard.jpg\" onError=\"this.style.display ='none';\" \/><\/div><p>This post is another from Ieuan Evans, who brought us Deep Beer Designer, back today to talk about wine! It's a longer post than usual, but packed with useful information for Deep Learning for Text.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2020\/10\/30\/deep-wine-designer\/\">read more >><\/a><\/p>","protected":false},"author":156,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/5249"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/156"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=5249"}],"version-history":[{"count":72,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/5249\/revisions"}],"predecessor-version":[{"id":6097,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/5249\/revisions\/6097"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=5249"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=5249"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=5249"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}