{"id":1117,"date":"2015-03-25T13:41:31","date_gmt":"2015-03-25T18:41:31","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=1117"},"modified":"2015-02-26T13:20:20","modified_gmt":"2015-02-26T18:20:20","slug":"bayesian-brain-teaser","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2015\/03\/25\/bayesian-brain-teaser\/","title":{"rendered":"Bayesian Brain Teaser"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>Winter in Boston can get quite cold. When we get a lot of snow, we need to take a break after shoveling, and solving puzzles is nice way to spend time indoors. Today's guest blogger, Toshi Takeuchi, gives you an interesting brain teaser, written during one of the many 2015 snowstorms in Boston.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2015\/doors.jpg\" alt=\"\"> <\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#269a2f4e-81a3-4a59-862c-929af106b629\">Nate Silver and Bayesian Reasoning<\/a><\/li><li><a href=\"#1068bcf5-8351-4907-901b-b60e2a153357\">The Monty Hall Problem - Simulation<\/a><\/li><li><a href=\"#35706bed-d557-41bc-805c-29627f0ec4d4\">Why Switching Is Better - Bayesian Analysis<\/a><\/li><li><a href=\"#a096cd16-5082-4540-8279-56d5dad4a1a9\">Nate Silver's Method<\/a><\/li><li><a href=\"#ff35de1d-f71d-4e21-a3a3-86b21a349fee\">Bayesian in Machine Learning<\/a><\/li><li><a href=\"#82717f47-8e31-4ac1-8960-f50d9216c16d\">It's Your Turn<\/a><\/li><\/ul><\/div><h4>Nate Silver and Bayesian Reasoning<a name=\"269a2f4e-81a3-4a59-862c-929af106b629\"><\/a><\/h4><p>In <a href=\"http:\/\/en.wikipedia.org\/wiki\/The_Signal_and_the_Noise\">The Signal and the Noise<\/a>, the famed data scientist Nate Silver discusses the precarious business of making predictions. The key message of the book is that we should adopt the Bayesian way of thinking - deal with uncertainty probablistically and be flexible to change our prior beliefs as we gather more evidence.<\/p><p>How does it actually work? Let us use a fun brain teaser to work through an example.<\/p><h4>The Monty Hall Problem - Simulation<a name=\"1068bcf5-8351-4907-901b-b60e2a153357\"><\/a><\/h4><p><a href=\"http:\/\/en.wikipedia.org\/wiki\/Monty_Hall_problem\">The Monty Hall Problem<\/a> is loosely based on a game show, \"Let's Make a Deal,\" where the host Monty Hall gives you a choice of three doors, and he tells you that there is a car behind one of them, and you get to keep it if you pick the right door. If you pick a wrong door, you will see a goat, and you get nothing. After you made your choice, Monty opens one of the two doors you didn't pick that reveals a goat. Then he asks, if you would like to stick with your original choice or switch your choice to the other unopened door. At that point the options have been reduced to two doors, and only one of them has the car, so it appears that it is a 50\/50 proposition.<\/p><p>Let's test this intuition by running simulations. We will play this game over many trials.<\/p><pre class=\"codeinput\">trials = 1000;\r\n<\/pre><p>First, Monty randomly picks a door from 1, 2, or 3 to hide the car.<\/p><pre class=\"codeinput\">car_door = randi(3,trials,1);\r\n<\/pre><p>You make a guess and a pick a door you think hides the car.<\/p><pre class=\"codeinput\">choice = randi(3,trials,1);\r\n<\/pre><p>Let's preview the first five trials.<\/p><pre class=\"codeinput\">T = table(car_door,choice);\r\ndisp(T(1:5,:))\r\n<\/pre><pre class=\"codeoutput\">    car_door    choice\r\n    ________    ______\r\n    3           2     \r\n    3           2     \r\n    1           3     \r\n    3           1     \r\n    2           2     \r\n<\/pre><p>At this point, Monty opens one of the unchosen doors and shows a goat. If the car door and chosen door are different, Monty has only one choice.<\/p><pre>1 + 2 = 3 -&gt; 3\r\n1 + 3 = 4 -&gt; 2\r\n2 + 3 = 5 -&gt; 1<\/pre><p>Otherwise he can randomly choose either of the unopened doors to open.<\/p><pre>1 + 1 = 2 --&gt; 2 or 3\r\n2 + 2 = 4 --&gt; 1 or 3\r\n3 + 3 = 6 --&gt; 1 or 2<\/pre><p>This solution is not so elegant, but here it goes.<\/p><pre class=\"codeinput\">goat_door = zeros(trials,1);\r\ngoat_door(car_door + choice == 3) = 3;\r\ngoat_door(car_door + choice == 4 &amp; car_door ~= choice) = 2;\r\ngoat_door(car_door + choice == 5) = 1;\r\ngoat_door(goat_door == 0 &amp; choice == 1) = randsample([2,3],1);\r\ngoat_door(goat_door == 0 &amp; choice == 2) = randsample([1,3],1);\r\ngoat_door(goat_door == 0 &amp; choice == 3) = randsample([1,2],1);\r\n<\/pre><p>Let's update the preview with the goat door.<\/p><pre class=\"codeinput\">T = table(car_door,choice,goat_door);\r\ndisp(T(1:5,:))\r\n<\/pre><pre class=\"codeoutput\">    car_door    choice    goat_door\r\n    ________    ______    _________\r\n    3           2         1        \r\n    3           2         1        \r\n    1           3         2        \r\n    3           1         2        \r\n    2           2         3        \r\n<\/pre><p>At this point, you can stick with your original choice or switch. My intuition was that it would be a 50\/50 proposition. Here is the simulated win rate if you choose to stay.<\/p><pre class=\"codeinput\">simulation = zeros(3,1);\r\nsimulation(1) = sum((car_door - choice) == 0)\/trials*100;\r\nfprintf(<span class=\"string\">'Win Rate if stick to your original choice: %.2f%%\\n'<\/span>,simulation(1))\r\n<\/pre><pre class=\"codeoutput\">Win Rate if stick to your original choice: 33.70%\r\n<\/pre><p>As you can see, staying with your original choice gives you a much lower win rate than I expected. Let's see what happens if you switch.<\/p><pre class=\"codeinput\">switch_choice = zeros(trials,1);\r\nswitch_choice(goat_door + choice == 3) = 3;\r\nswitch_choice(goat_door + choice == 4) = 2;\r\nswitch_choice(goat_door + choice == 5) = 1;\r\n<\/pre><p>Let's update the preview with the switch and compute the win rate.<\/p><pre class=\"codeinput\">T = table(car_door,choice,goat_door,switch_choice);\r\ndisp(T(1:5,:))\r\n\r\nsimulation(3) = sum((car_door - switch_choice) == 0)\/trials*100;\r\nfprintf(<span class=\"string\">'Win Rate if switch your choice: %.2f%%\\n'<\/span>,simulation(3))\r\n<\/pre><pre class=\"codeoutput\">    car_door    choice    goat_door    switch_choice\r\n    ________    ______    _________    _____________\r\n    3           2         1            3            \r\n    3           2         1            3            \r\n    1           3         2            1            \r\n    3           1         2            3            \r\n    2           2         3            1            \r\nWin Rate if switch your choice: 66.30%\r\n<\/pre><h4>Why Switching Is Better - Bayesian Analysis<a name=\"35706bed-d557-41bc-805c-29627f0ec4d4\"><\/a><\/h4><p>Counterintuitively, the simulation shows that it is actually better to switch your choice. Now let's turn to <a href=\"http:\/\/en.wikipedia.org\/wiki\/Bayesian_inference\">Bayesian inference<\/a> to figure out why, using Bayes' Rule.<\/p><p>$$P(H|E) = \\frac{P(E|H)\\cdot P(H)}{P(E)}$$<\/p><p>In this analysis let's assume that:<\/p><div><ul><li>You initially picked Door 1. This is your hypothesis, <tt><b>H<\/b><\/tt><\/li><li>Monty opened Door 2 to reveal a goat. This is your evidence, <tt><b>E<\/b><\/tt><\/li><\/ul><\/div><p>$$P(H)$$<\/p><p>The car is hidden randomly behind any of the three doors and you picked Door 1. The probability of Door 1 being the car door is 1\/3. This is the <b>prior<\/b>, or <tt>P(H)<\/tt>.<\/p><pre class=\"codeinput\">prior = ones(3,1)*1\/3\r\n<\/pre><pre class=\"codeoutput\">prior =\r\n      0.33333\r\n      0.33333\r\n      0.33333\r\n<\/pre><p>$$P(E|H)$$<\/p><p>Now Monty has to choose Door 2 or 3 as the goat door, and he picked Door 2. It's time to think about the conditional probability of the evidence given hypothesis, the <b>likelihood<\/b>, or <tt>P(E|H)<\/tt>.<\/p><p>You observed that Monty opened Door 2. His choice is not random - he actually knows the car door, and he needs to avoid it. What does his choice imply?<\/p><div><ol><li>If the car is behind Door 1, then Monty can choose either Door 2 or 3. So the probability that he chose Door 2 was 1\/2.<\/li><li>If the car were behond Door 2, then Monty couldn't have chosen it. So the probability was 0. Getting confused? I will revisit this case below.<\/li><li>If the car is behind Door 3, then Monty had to choose Door 2. So the probability that he chose Door 2 was 1.<\/li><\/ol><\/div><p>Let's go back to case #2. We already know Monty opened Door 2, so why are we talking about the probability of him opening it? Yeah, I know, the probability should be 1 because he actually opened it, right? Well, in this case, the probability is conditioned on the hypothesis under consideration to be true. So in each case you need to figure out what Monty could do based on this assumption. So we know that Door 2 is not the car door, but we still would like to know what Monty might have done if that was the car door.<\/p><pre class=\"codeinput\">likelihood = [1\/2;0;1]\r\n<\/pre><pre class=\"codeoutput\">likelihood =\r\n          0.5\r\n            0\r\n            1\r\n<\/pre><p>$$P(E|H)\\cdot P(H)$$<\/p><p>Now we are ready to update the probabilities of hypotheses based on the new evidence using the equation. Let's calculate the joint probabilities of the prior and likelihood, the numerator of the equation.<\/p><pre class=\"codeinput\">joint_prob = likelihood .* prior\r\n<\/pre><pre class=\"codeoutput\">joint_prob =\r\n      0.16667\r\n            0\r\n      0.33333\r\n<\/pre><p>$$P(E)$$<\/p><p>The joint probabilities are not true probabilities because they don't sum up to 1. To get the updated probabilities of hypotheses after the evidence, the <b>posterior<\/b>, we just need to normalize them by first adding up all the joint probabilities to get the total, and then use that to divide each joint probability. This total is called the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Normalizing_constant\">normalizing constant<\/a>, the probability of the evidence under all hypotheses, <tt>P(E)<\/tt>.<\/p><p>$$P(H|E)$$<\/p><p>The posterior, <tt>P(H|E)<\/tt>, should be pretty close to the simulation result.<\/p><pre class=\"codeinput\">posterior = joint_prob\/sum(joint_prob);\r\ncompare = table(posterior,simulation,<span class=\"string\">'RowNames'<\/span>,{<span class=\"string\">'Stay'<\/span>,<span class=\"string\">'N\/A'<\/span>,<span class=\"string\">'Switch'<\/span>});\r\ndisp(compare([1 3],:))\r\n<\/pre><pre class=\"codeoutput\">              posterior    simulation\r\n              _________    __________\r\n    Stay      0.33333      33.7      \r\n    Switch    0.66667      66.3      \r\n<\/pre><h4>Nate Silver's Method<a name=\"a096cd16-5082-4540-8279-56d5dad4a1a9\"><\/a><\/h4><p>In his book, Nate Silver uses a different formulation to calculate the posterior, but it is actually the same thing. Let's use the example from the book...<\/p><p>\"Suppose you are living with a partner and come home from a business trip to discover a strange pair of underwear in your dresser drawer. You will probably ask yourself: what is the probability that your partner is cheating on you?\"<\/p><div><ul><li><tt>x<\/tt> is the prior, the probability of your spouse cheating before you found the underwear, which he puts to 4%<\/li><li><tt>y<\/tt> is the conditional probability or likelihood that you see the underwear because your spouse is cheating, 50%<\/li><li><tt>z<\/tt> is the the conditional probability or likelihood that you see the underwear even though your spouse is not cheating, 5%.<\/li><li>The resulting posterior of your spouse cheating is only 29%, so don't jump to the premature conclusion!<\/li><\/ul><\/div><pre class=\"codeinput\">x = 0.04; y = 0.5; z = 0.05;\r\nposterior = x*y\/(x*y + z*(1-x))\r\n<\/pre><pre class=\"codeoutput\">posterior =\r\n      0.29412\r\n<\/pre><p>The method we use, however, computes the posteriors for both hypotheses - cheating and not cheating.<\/p><pre class=\"codeinput\">prior = [x; 1-x]; likelihood = [y; z];\r\njoint_prob = likelihood .* prior;\r\nposterior = joint_prob .\/ sum(joint_prob)\r\n<\/pre><pre class=\"codeoutput\">posterior =\r\n      0.29412\r\n      0.70588\r\n<\/pre><h4>Bayesian in Machine Learning<a name=\"ff35de1d-f71d-4e21-a3a3-86b21a349fee\"><\/a><\/h4><p>Bayes' Theorem is used in powerful machine learning algorithms. One fairely straight-forward and useful example is <a href=\"https:\/\/www.mathworks.com\/help\/stats\/naive-bayes-classification.html\">Naive Bayes Classification<\/a>. It is often used for text classification tasks such as spam filter or sentiment analysis.<\/p><p>Let's do a quick walk-through using a toy example of sentiment analysis. We want to infer the sentiment, positive or negative, of a statement, based on the words contained. Naive Bayes classifier is available as <a href=\"https:\/\/www.mathworks.com\/help\/stats\/fitcnb.html\"><tt>fitcnb<\/tt><\/a> function from Statistics Toolbox in MATLAB R2014b.<\/p><p>Let's load the training examples.<\/p><pre class=\"codeinput\">training_exmples = {{<span class=\"string\">'i'<\/span>,<span class=\"string\">'love'<\/span>,<span class=\"string\">'this'<\/span>,<span class=\"string\">'food'<\/span>,<span class=\"string\">'very'<\/span>,<span class=\"string\">'much'<\/span>}; <span class=\"keyword\">...<\/span>\r\n    {<span class=\"string\">'that'<\/span>,<span class=\"string\">'left'<\/span>,<span class=\"string\">'a'<\/span>,<span class=\"string\">'bad'<\/span>,<span class=\"string\">'taste'<\/span>}; <span class=\"keyword\">...<\/span>\r\n    {<span class=\"string\">'the'<\/span>,<span class=\"string\">'movie'<\/span>,<span class=\"string\">'was'<\/span>,<span class=\"string\">'boring'<\/span>}; <span class=\"keyword\">...<\/span>\r\n    {<span class=\"string\">'i'<\/span>,<span class=\"string\">'had'<\/span>,<span class=\"string\">'a'<\/span>,<span class=\"string\">'very'<\/span>,<span class=\"string\">'good'<\/span>,<span class=\"string\">'time'<\/span>};\r\n    {<span class=\"string\">'she'<\/span>,<span class=\"string\">'has'<\/span>,<span class=\"string\">'such'<\/span>,<span class=\"string\">'a'<\/span>,<span class=\"string\">'nice'<\/span>,<span class=\"string\">'sweet'<\/span>,<span class=\"string\">'dog'<\/span>}; <span class=\"keyword\">...<\/span>\r\n    {<span class=\"string\">'the'<\/span>,<span class=\"string\">'ending'<\/span>,<span class=\"string\">'was'<\/span>,<span class=\"string\">'very'<\/span>,<span class=\"string\">'sad'<\/span>}};\r\n<\/pre><p>Those examples are pre-labeled as positive or negative.<\/p><pre class=\"codeinput\">sentiment_class = {<span class=\"string\">'positive'<\/span>;<span class=\"string\">'negative'<\/span>;<span class=\"string\">'negative'<\/span>; <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'positive'<\/span>;<span class=\"string\">'positive'<\/span>;<span class=\"string\">'negative'<\/span>};\r\n<\/pre><p>Here is the unlabeled example we will use to test the classifier.<\/p><pre class=\"codeinput\">test_example = {<span class=\"string\">'she'<\/span>,<span class=\"string\">'was'<\/span>,<span class=\"string\">'a'<\/span>,<span class=\"string\">'very'<\/span>,<span class=\"string\">'sweet'<\/span>,<span class=\"string\">'girl'<\/span>};\r\n<\/pre><p>Turn the training data into a numeric matrix. This is a bag-of-words model that uses frequency of words as predictors.<\/p><pre class=\"codeinput\">tokens = unique([training_exmples{:}]);\r\nX = zeros(length(training_exmples),length(tokens));\r\n<span class=\"keyword\">for<\/span> i = 1:length(training_exmples)\r\n    X(i,:) = ismember(tokens,training_exmples{i});\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><p>Train a Naive Bayes model with a multinomial distribution, which is the best choice for the bag-of-words example we have. Even though we know that words have grammatical relationships and they are not completely independent, we make a simplifying assumption that our features are independent so we can apply Bayes' Theorem - thus <i>naive<\/i>. It works just fine none the less.<\/p><pre class=\"codeinput\">Mdl = fitcnb(X,sentiment_class,<span class=\"string\">'Distribution'<\/span>,<span class=\"string\">'mn'<\/span>,<span class=\"string\">'PredictorNames'<\/span>,tokens);\r\n<\/pre><p>Now check the trained model with the test example.<\/p><pre class=\"codeinput\">[label,post,~]= predict(Mdl,double(ismember(Mdl.PredictorNames,test_example)));\r\nfprintf(<span class=\"string\">'Test example: \"%s\"\\nSentiment:    %s\\nPosterior:    %.2f\\n'<\/span>,<span class=\"keyword\">...<\/span>\r\n    strjoin(test_example,<span class=\"string\">' '<\/span>),label{1},post(strcmp(Mdl.ClassNames,label)))\r\n<\/pre><pre class=\"codeoutput\">Test example: \"she was a very sweet girl\"\r\nSentiment:    positive\r\nPosterior:    0.62\r\n<\/pre><p>In this toy example, the trained model was able to classify it correctly.<\/p><p>Let's examine the details in the trained model <tt>Mdl<\/tt> to see how Bayes' Theorem is used internally.<\/p><p>Here is the prior, and it is 1\/2 for each class since we have 3 positive and 3 negative examples. Note that the numbers are listed in order of negative and positive.<\/p><pre class=\"codeinput\">disp(table(Mdl.Prior(1),Mdl.Prior(2),<span class=\"string\">'VariableNames'<\/span>,Mdl.ClassNames))\r\n<\/pre><pre class=\"codeoutput\">    negative    positive\r\n    ________    ________\r\n    0.5         0.5     \r\n<\/pre><p>What's the actual frequency of positive and negative statements? I have no idea. Often, we don't know the prior, and in such cases we just apply equal probabilities for each hypothesis. This is called a uniform prior.<\/p><p>Here is the likelihood for words 'good' and 'bad'. You see that 'bad' is twice as negative as 'good' and 'good's twice as positive as 'bad'.<\/p><pre class=\"codeinput\">disp(table(cell2mat(Mdl.DistributionParameters(:,ismember(Mdl.PredictorNames,<span class=\"string\">'good'<\/span>))),<span class=\"keyword\">...<\/span>\r\n    cell2mat(Mdl.DistributionParameters(:,ismember(Mdl.PredictorNames,<span class=\"string\">'bad'<\/span>))),<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'VariableNames'<\/span>,{<span class=\"string\">'good'<\/span>,<span class=\"string\">'bad'<\/span>},<span class=\"string\">'RowNames'<\/span>,Mdl.ClassNames))\r\n<\/pre><pre class=\"codeoutput\">                  good        bad   \r\n                ________    ________\r\n    negative       0.025        0.05\r\n    positive    0.044444    0.022222\r\n<\/pre><p>Since we are simply interested in relative scores between the negative and positive classification, we can ignore the normalizing constant, which is the denominator of the equation and the same for either class. Therefore all you need is the prior and conditional probabilities of the tokens to calculate relative posterior scores between two hypotheses - postive or negative.<\/p><p>\"So what happens if you have 'not' in the statement?\" you may ask. A very good point. This simplistic approach doesn't work so well with negation, or irony, for that matter. However, if you train your model on a large number of training examples, it actually works surprisingly well on a majority of real world cases.<\/p><p>To deal with negation like 'not good' or 'not bad' in a more sophisticated way, you need to use word pairs as features instead of single words. For the work you need to put in, however, the performance gain would be incremental.<\/p><p>Likewise, it is easy to deal with irony - just come up with an appropriate way to capture the ironical features! (This is irony).<\/p><h4>It's Your Turn<a name=\"82717f47-8e31-4ac1-8960-f50d9216c16d\"><\/a><\/h4><p>I hope you now see how powerful and useful Bayesian reasoning is and ready to apply to your problems. Do you have an interesting example of Bayesian reasoning? Please share <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=1117#respond\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_6db174c34e5a4e19b4971377057aebfd() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='6db174c34e5a4e19b4971377057aebfd ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 6db174c34e5a4e19b4971377057aebfd';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2015 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_6db174c34e5a4e19b4971377057aebfd()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2015a<br><\/p><\/div><!--\r\n6db174c34e5a4e19b4971377057aebfd ##### SOURCE BEGIN #####\r\n%% Bayesian Brain Teaser\r\n% Winter in Boston can get quite cold. When we get a lot of snow, we need\r\n% to take a break after shoveling, and solving puzzles is nice way to spend\r\n% time indoors. Today's guest blogger, Toshi Takeuchi, gives you an\r\n% interesting brain teaser.\r\n%\r\n% <<doors.jpg>>\r\n\r\n%% Nate Silver and Bayesian Reasoning\r\n% In <http:\/\/en.wikipedia.org\/wiki\/The_Signal_and_the_Noise The Signal and\r\n% the Noise>, the famed data scientist Nate Silver discusses the precarious\r\n% business of making predictions. The key message of the book is that we\r\n% should adopt the Bayesian way of thinking - deal with uncertainty\r\n% probablistically and be flexible to change our prior beliefs as we gather\r\n% more evidence.\r\n% \r\n% How does it actually work? Let us use a fun brain teaser to work through\r\n% an example.\r\n%\r\n%% The Monty Hall Problem - Simulation\r\n% <http:\/\/en.wikipedia.org\/wiki\/Monty_Hall_problem The Monty Hall Problem>\r\n% is loosely based on a game show, \"Let's Make a Deal,\" where the host\r\n% Monty Hall gives you a choice of three doors, and he tells you that there\r\n% is a car behind one of them, and you get to keep it if you pick the right\r\n% door. If you pick a wrong door, you will see a goat, and you get nothing.\r\n% After you made your choice, Monty opens one of the two doors you didn't\r\n% pick that reveals a goat. Then he asks, if you would like to stick with\r\n% your original choice or switch your choice to the other unopened door. At\r\n% that point the options have been reduced to two doors, and only one of\r\n% them has the car, so it appears that it is a 50\/50 proposition.\r\n%\r\n% Let's test this intuition by running simulations. We will play this game\r\n% over many trials.\r\ntrials = 1000;\r\n\r\n%%\r\n% First, Monty randomly picks a door from 1, 2, or 3 to hide the car.\r\ncar_door = randi(3,trials,1);\r\n\r\n%%\r\n% You make a guess and a pick a door you think hides the car. \r\nchoice = randi(3,trials,1);\r\n\r\n%%\r\n% Let's preview the first five trials.\r\nT = table(car_door,choice);\r\ndisp(T(1:5,:))\r\n\r\n%%\r\n% At this point, Monty opens one of the unchosen doors and shows a goat.\r\n% If the car door and chosen door are different, Monty has only one choice.\r\n%\r\n%  1 + 2 = 3 -> 3\r\n%  1 + 3 = 4 -> 2\r\n%  2 + 3 = 5 -> 1\r\n%\r\n% Otherwise he can randomly choose either of the unopened doors to open.\r\n%  \r\n%  1 + 1 = 2 REPLACE_WITH_DASH_DASH> 2 or 3\r\n%  2 + 2 = 4 REPLACE_WITH_DASH_DASH> 1 or 3\r\n%  3 + 3 = 6 REPLACE_WITH_DASH_DASH> 1 or 2\r\n%\r\n% This solution is not so elegant, but here it goes.\r\n\r\ngoat_door = zeros(trials,1);\r\ngoat_door(car_door + choice == 3) = 3;\r\ngoat_door(car_door + choice == 4 & car_door ~= choice) = 2;\r\ngoat_door(car_door + choice == 5) = 1;\r\ngoat_door(goat_door == 0 & choice == 1) = randsample([2,3],1);\r\ngoat_door(goat_door == 0 & choice == 2) = randsample([1,3],1);\r\ngoat_door(goat_door == 0 & choice == 3) = randsample([1,2],1);\r\n\r\n%%\r\n% Let's update the preview with the goat door. \r\nT = table(car_door,choice,goat_door);\r\ndisp(T(1:5,:))\r\n\r\n%%\r\n% At this point, you can stick with your original choice or switch. My\r\n% intuition was that it would be a 50\/50 proposition. Here is the simulated\r\n% win rate if you choose to stay.\r\n\r\nsimulation = zeros(3,1);\r\nsimulation(1) = sum((car_door - choice) == 0)\/trials*100;\r\nfprintf('Win Rate if stick to your original choice: %.2f%%\\n',simulation(1))\r\n\r\n%% \r\n% As you can see, staying with your original choice gives you a much lower\r\n% win rate than I expected. Let's see what happens if you switch.\r\n\r\nswitch_choice = zeros(trials,1);\r\nswitch_choice(goat_door + choice == 3) = 3;\r\nswitch_choice(goat_door + choice == 4) = 2;\r\nswitch_choice(goat_door + choice == 5) = 1;\r\n\r\n%% \r\n% Let's update the preview with the switch and compute the win rate.\r\nT = table(car_door,choice,goat_door,switch_choice);\r\ndisp(T(1:5,:))\r\n\r\nsimulation(3) = sum((car_door - switch_choice) == 0)\/trials*100;\r\nfprintf('Win Rate if switch your choice: %.2f%%\\n',simulation(3))\r\n\r\n%% Why Switching Is Better - Bayesian Analysis\r\n% Counterintuitively, the simulation shows that it is actually better to\r\n% switch your choice. Now let's turn to\r\n% <http:\/\/en.wikipedia.org\/wiki\/Bayesian_inference Bayesian inference> to\r\n% figure out why, using Bayes' Rule.\r\n%\r\n% $$P(H|E) = \\frac{P(E|H)\\cdot P(H)}{P(E)}$$\r\n% \r\n% In this analysis let's assume that: \r\n% \r\n% * You initially picked Door 1. This is your hypothesis, |*H*|\r\n% * Monty opened Door 2 to reveal a goat. This is your evidence, |*E*| \r\n%\r\n%%\r\n% $$P(H)$$\r\n% \r\n% The car is hidden randomly behind any of the three doors and you picked\r\n% Door 1. The probability of Door 1 being the car door is 1\/3. This is the\r\n% *prior*, or |P(H)|. \r\n\r\nprior = ones(3,1)*1\/3\r\n\r\n%%\r\n% $$P(E|H)$$\r\n%\r\n% Now Monty has to choose Door 2 or 3 as the goat door, and he picked Door\r\n% 2. It's time to think about the conditional probability of the evidence\r\n% given hypothesis, the *likelihood*, or |P(E|H)|.\r\n%\r\n% You observed that Monty opened Door 2. His choice is not\r\n% random - he actually knows the car door, and he needs to avoid it. What\r\n% does his choice imply?\r\n% \r\n% # If the car is behind Door 1, then Monty can choose either Door 2 or 3. \r\n% So the probability that he chose Door 2 was 1\/2.    \r\n% # If the car were behond Door 2, then Monty couldn't have chosen it. So\r\n% the probability was 0. Getting confused? I will revisit this case below.\r\n% # If the car is behind Door 3, then Monty had to choose Door 2. So the\r\n% probability that he chose Door 2 was 1. \r\n% \r\n% Let's go back to case #2. We already know Monty opened Door 2, so why are\r\n% we talking about the probability of him opening it? Yeah, I know, the\r\n% probability should be 1 because he actually opened it, right? Well, in\r\n% this case, the probability is conditioned on the hypothesis under\r\n% consideration to be true. So in each case you need to figure out what\r\n% Monty could do based on this assumption. So we know that Door 2 is not\r\n% the car door, but we still would like to know what Monty might have done\r\n% if that was the car door.\r\n\r\nlikelihood = [1\/2;0;1]\r\n\r\n%%\r\n% $$P(E|H)\\cdot P(H)$$\r\n%\r\n% Now we are ready to update the probabilities of hypotheses based on the\r\n% new evidence using the equation. Let's calculate the joint probabilities\r\n% of the prior and likelihood, the numerator of the equation.\r\n%\r\n\r\njoint_prob = likelihood .* prior\r\n\r\n%%\r\n% $$P(E)$$\r\n%\r\n% The joint probabilities are not true probabilities because they don't sum\r\n% up to 1. To get the updated probabilities of hypotheses after the\r\n% evidence, the *posterior*, we just need to normalize them by first adding\r\n% up all the joint probabilities to get the total, and then use that to\r\n% divide each joint probability. This total is called the\r\n% <http:\/\/en.wikipedia.org\/wiki\/Normalizing_constant normalizing constant>,\r\n% the probability of the evidence under all hypotheses, |P(E)|.\r\n%\r\n% $$P(H|E)$$\r\n%\r\n% The posterior, |P(H|E)|, should be pretty close to the simulation result.\r\n\r\nposterior = joint_prob\/sum(joint_prob);\r\ncompare = table(posterior,simulation,'RowNames',{'Stay','N\/A','Switch'});\r\ndisp(compare([1 3],:))\r\n\r\n%% Nate Silver's Method\r\n% In his book, Nate Silver uses a different formulation to calculate the\r\n% posterior, but it is actually the same thing. Let's use the example from\r\n% the book...\r\n%\r\n% \"Suppose you are living with a partner and come home from a\r\n% business trip to discover a strange pair of underwear in your dresser\r\n% drawer. You will probably ask yourself: what is the probability that your\r\n% partner is cheating on you?\"\r\n%\r\n% * |x| is the prior, the probability of your spouse cheating before you\r\n% found the underwear, which he puts to 4% \r\n% * |y| is the conditional probability or likelihood that you see the\r\n% underwear because your spouse is cheating, 50%\r\n% * |z| is the the conditional probability or likelihood that you see the\r\n% underwear even though your spouse is not cheating, 5%. \r\n% * The resulting posterior of your spouse cheating is only 29%, so don't\r\n% jump to the premature conclusion!\r\n\r\nx = 0.04; y = 0.5; z = 0.05; \r\nposterior = x*y\/(x*y + z*(1-x))\r\n\r\n%%\r\n% The method we use, however, computes the posteriors for both hypotheses -\r\n% cheating and not cheating.\r\nprior = [x; 1-x]; likelihood = [y; z]; \r\njoint_prob = likelihood .* prior;\r\nposterior = joint_prob .\/ sum(joint_prob)\r\n\r\n%% Bayesian in Machine Learning\r\n% Bayes' Theorem is used in powerful machine learning algorithms. One\r\n% fairely straight-forward and useful example is\r\n% <https:\/\/www.mathworks.com\/help\/stats\/naive-bayes-classification.html\r\n% Naive Bayes Classification>. It is often used for text classification\r\n% tasks such as spam filter or sentiment analysis.\r\n%\r\n% Let's do a quick walk-through using a toy example of sentiment analysis.\r\n% We want to infer the sentiment, positive or negative, of a statement,\r\n% based on the words contained. Naive Bayes classifier is available as\r\n% <https:\/\/www.mathworks.com\/help\/stats\/fitcnb.html |fitcnb|> function from\r\n% Statistics Toolbox in MATLAB R2014b.\r\n%\r\n% Let's load the training examples.\r\ntraining_exmples = {{'i','love','this','food','very','much'}; ...\r\n    {'that','left','a','bad','taste'}; ...\r\n    {'the','movie','was','boring'}; ...\r\n    {'i','had','a','very','good','time'};\r\n    {'she','has','such','a','nice','sweet','dog'}; ...\r\n    {'the','ending','was','very','sad'}};\r\n%%\r\n% Those examples are pre-labeled as positive or negative.\r\nsentiment_class = {'positive';'negative';'negative'; ...\r\n    'positive';'positive';'negative'};\r\n%%\r\n% Here is the unlabeled example we will use to test the classifier. \r\ntest_example = {'she','was','a','very','sweet','girl'};\r\n\r\n%%\r\n% Turn the training data into a numeric matrix. This is a bag-of-words\r\n% model that uses frequency of words as predictors.\r\ntokens = unique([training_exmples{:}]); \r\nX = zeros(length(training_exmples),length(tokens));\r\nfor i = 1:length(training_exmples)\r\n    X(i,:) = ismember(tokens,training_exmples{i});\r\nend\r\n\r\n%%\r\n% Train a Naive Bayes model with a multinomial distribution, which is the\r\n% best choice for the bag-of-words example we have. Even though we know\r\n% that words have grammatical relationships and they are not completely\r\n% independent, we make a simplifying assumption that our features are\r\n% independent so we can apply Bayes' Theorem - thus _naive_. It works\r\n% just fine none the less. \r\n\r\nMdl = fitcnb(X,sentiment_class,'Distribution','mn','PredictorNames',tokens);\r\n\r\n%%\r\n% Now check the trained model with the test example.\r\n[label,post,~]= predict(Mdl,double(ismember(Mdl.PredictorNames,test_example)));\r\nfprintf('Test example: \"%s\"\\nSentiment:    %s\\nPosterior:    %.2f\\n',...\r\n    strjoin(test_example,' '),label{1},post(strcmp(Mdl.ClassNames,label)))\r\n\r\n%%\r\n% In this toy example, the trained model was able to classify it correctly.\r\n% \r\n% Let's examine the details in the trained model |Mdl| to see how Bayes'\r\n% Theorem is used internally. \r\n% \r\n% Here is the\r\n% prior, and it is 1\/2 for each class since we have 3 positive and 3\r\n% negative examples. Note that the numbers are listed in order of\r\n% negative and positive.\r\n\r\ndisp(table(Mdl.Prior(1),Mdl.Prior(2),'VariableNames',Mdl.ClassNames))\r\n\r\n%%\r\n% What's the actual frequency of positive and negative statements? I have\r\n% no idea. Often, we don't know the prior, and in such cases we just apply\r\n% equal probabilities for each hypothesis. This is called a uniform prior.\r\n%\r\n% Here is the likelihood for words 'good' and 'bad'. You see that 'bad' is\r\n% twice as negative as 'good' and 'good's twice as positive as 'bad'.\r\n\r\ndisp(table(cell2mat(Mdl.DistributionParameters(:,ismember(Mdl.PredictorNames,'good'))),...\r\n    cell2mat(Mdl.DistributionParameters(:,ismember(Mdl.PredictorNames,'bad'))),...\r\n    'VariableNames',{'good','bad'},'RowNames',Mdl.ClassNames))\r\n\r\n%%\r\n% Since we are simply interested in relative scores between the negative\r\n% and positive classification, we can ignore the normalizing constant,\r\n% which is the denominator of the equation and the same for either class.\r\n% Therefore all you need is the prior and conditional probabilities of the\r\n% tokens to calculate relative posterior scores between two hypotheses -\r\n% postive or negative.\r\n% \r\n% \"So what happens if you have 'not' in the statement?\" you may ask. A very\r\n% good point. This simplistic approach doesn't work so well with negation,\r\n% or irony, for that matter. However, if you train your model on a large\r\n% number of training examples, it actually works surprisingly well on a\r\n% majority of real world cases.\r\n% \r\n% To deal with negation like 'not good' or 'not bad' in a more\r\n% sophisticated way, you need to use word pairs as features instead of\r\n% single words. For the work you need to put in, however, the performance\r\n% gain would be incremental.\r\n% \r\n% Likewise, it is easy to deal with irony - just come up with an\r\n% appropriate way to capture the ironical features! (This is irony).\r\n\r\n%% It's Your Turn\r\n% I hope you now see how powerful and useful Bayesian reasoning is and\r\n% ready to apply to your problems. Do you have an interesting example of\r\n% Bayesian reasoning? Please share\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=1117#respond here>.\r\n##### SOURCE END ##### 6db174c34e5a4e19b4971377057aebfd\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2015\/doors.jpg\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>Winter in Boston can get quite cold. When we get a lot of snow, we need to take a break after shoveling, and solving puzzles is nice way to spend time indoors. Today's guest blogger, Toshi Takeuchi, gives you an interesting brain teaser, written during one of the many 2015 snowstorms in Boston.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2015\/03\/25\/bayesian-brain-teaser\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[43,48],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1117"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=1117"}],"version-history":[{"count":5,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1117\/revisions"}],"predecessor-version":[{"id":1124,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1117\/revisions\/1124"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=1117"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=1117"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=1117"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}