{"id":2196,"date":"2017-02-14T15:50:28","date_gmt":"2017-02-14T20:50:28","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=2196"},"modified":"2017-02-24T21:33:42","modified_gmt":"2017-02-25T02:33:42","slug":"the-perverse-port-passing-problem","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2017\/02\/14\/the-perverse-port-passing-problem\/","title":{"rendered":"The Perverse Port Passing Problem"},"content":{"rendered":"<div class=\"content\">\r\n\r\n<!--introduction-->\r\n\r\nToday's guest blogger is <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/1141841-Jos-Martin\">Jos Martin<\/a>, from the Parallel Computing group at MathWorks. Whilst normally focussing on making parallel computing both easy and fast in MATLAB, occasionally he likes to use MATLAB to explore other problem spaces. Here, he writes about an interesting random walk process he encountered that has allowed him to explore both simulation and analytic methods in MATLAB, and using each of those methods to help guide the other.\r\n\r\n<!--\/introduction-->\r\n<h3>Contents<\/h3>\r\n<div>\r\n<ul>\r\n \t<li><a href=\"#132613db-78af-435c-a605-66a3c1b37d47\">Introduction<\/a><\/li>\r\n \t<li><a href=\"#4f122c18-173e-457f-ab3d-ae0d45e0cad5\">A simulation approach<\/a><\/li>\r\n \t<li><a href=\"#3f68217f-5940-40a4-a753-18f925fe2218\">Investigating the final position of the port<\/a><\/li>\r\n \t<li><a href=\"#4f5e474a-aa14-4142-913d-16e761bb6146\">Analytic probability for last position when passing is unbiased<\/a><\/li>\r\n \t<li><a href=\"#3ebc20bd-261c-436f-aef8-9290e9b7f525\">What happens if there is a biased probability of passing left or right?<\/a><\/li>\r\n \t<li><a href=\"#892f094b-0612-4993-ab57-be816f305617\">Does this biased simulation agree with theory?<\/a><\/li>\r\n \t<li><a href=\"#fd42656f-4d1e-4e4d-83c9-0bd76f28d2b4\">Who has the most to drink?<\/a><\/li>\r\n \t<li><a href=\"#651d6c60-53d5-426e-a29e-cdfba29ba1a5\">Does the number of guests affect how much each drinks?<\/a><\/li>\r\n \t<li><a href=\"#845c2ec3-0c93-4930-8aea-b07ed038c16a\">Will the pipe of port run out?<\/a><\/li>\r\n \t<li><a href=\"#0ec66b5d-cd38-49c4-a4a2-d77604961ef5\">Predicting the amount of port consumed<\/a><\/li>\r\n \t<li><a href=\"#a2e730e0-a6b0-4fe5-8f70-3a0ddf407ee9\">An analytic conjecture<\/a><\/li>\r\n \t<li><a href=\"#cf6ce7a9-5eec-452e-94c8-b98f5d14369c\">How does the amount of port consumed depend on the final position?<\/a><\/li>\r\n \t<li><a href=\"#a814ebcc-8a1f-406f-85b4-792ff05b2d3a\">And finally<\/a><\/li>\r\n<\/ul>\r\n<\/div>\r\n<h4>Introduction<a name=\"132613db-78af-435c-a605-66a3c1b37d47\"><\/a><\/h4>\r\n<a href=\"http:\/\/www.piday.org\/\">Pi day<\/a> is coming and you have been invited to dine with an eccentric but mathematically minded host. She has provided many <a href=\"http:\/\/www.georgehart.com\/bagel\/bagel.html\">interesting delicacies<\/a> and you have now <a href=\"http:\/\/www.bbc.co.uk\/news\/blogs-magazine-monitor-26526076\">finally arrived<\/a> at the <a href=\"https:\/\/geometricdelights.wordpress.com\/2011\/04\/17\/cheese-usamts\/\">cheese course<\/a>; of course port is served alongside a wonderful round of Stilton. However, in defiance of normal protocol (<a href=\"http:\/\/www.taylor.pt\/en\/enjoy-port-wine\/traditions\/passing-the-decanter\/\">passing to the left<\/a>), she instructs her guests to randomly choose a direction in which to pass the port decanter. You are some way around the circular table and feeling a degree of consternation at this strange turn of events. Should you be concerned? How does your position at the table affect the likelihood of being the last person to sample this exceptional <a title=\"http:\/\/www.decanter.com\/reviews\/port\/fonseca-port-douro-portugal-1963\/ (link no longer works)\">Fonseca '63<\/a>? For those pedantic few who immediately worry that the port might run out, I can assert she is a very accommodating host and has assured her guests that the decanter will continue to be filled from the <a href=\"https:\/\/en.wikipedia.org\/wiki\/English_wine_cask_units\">pipe of port<\/a> she <a href=\"http:\/\/www.decanter.com\/wine-news\/bonhams-to-hold-rare-fonseca-1963-auction-286004\/\">recently acquired<\/a>. Another question one might pose is: on average how much port will be consumed (assuming that at each pass each guest takes a fixed amount of port)?\r\n\r\nThis problem can be modelled with a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Random_walk\">random walk<\/a>, similar in nature to a well-studied case called <a href=\"http:\/\/web.mit.edu\/neboat\/Public\/6.042\/randomwalks.pdf\">Gambler's Ruin<\/a>. The major difference here will be the need to visit all locations around the table, rather than simply reaching one end or other of the walk. To explore this apparently simple problem I will start by using a simulation approach that will help guide analytic methods (in a future <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=2198\">post<\/a>). To do this I'll be using our statistics and symbolic maths tools.\r\n<h4>A simulation approach<a name=\"4f122c18-173e-457f-ab3d-ae0d45e0cad5\"><\/a><\/h4>\r\nThere are analytical methods to attack this problem (I will quote some of these during this article, for which all <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/61285-port-passing-problem\">files and proofs<\/a> are available on File Exchange) but here I want to focus on a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Monte_Carlo_method\">monte-carlo simulation<\/a> approach. To exemplify the approach, we will run <tt>nSims<\/tt> independent simulations passing the decanter between <tt>N<\/tt> people (always starting at person 0), with a probability <tt>p<\/tt> of passing left (and <tt>1-p<\/tt> of passing right).\r\n<pre class=\"codeinput\">nSims = 100;\r\nN = 11;\r\np = 0.5;\r\n<\/pre>\r\nTo simulate <tt>M<\/tt> passes of the port across the <tt>nSims<\/tt> simulations we make a matrix <tt>singleMove<\/tt> containing only values 1 and -1 to indicate a pass to the port to left or right using uniform random numbers.\r\n<pre class=\"codeinput\">M = 25;\r\nsingleMove = 2*(rand(nSims, M) &gt;= p) - 1;\r\n<span class=\"comment\">% The first 6 pass directions of 6 simulations<\/span>\r\ndisp(singleMove(1:6, 1:6));\r\n<\/pre>\r\n<pre class=\"codeoutput\">    -1    -1     1     1     1    -1\r\n    -1    -1     1     1    -1     1\r\n    -1     1    -1     1    -1    -1\r\n    -1     1     1     1    -1    -1\r\n    -1     1    -1     1     1    -1\r\n    -1    -1    -1    -1     1     1\r\n<\/pre>\r\nBy knowing the direction of each pass we can compute the location of the port compared to its starting point by taking the cumulative sum of the directions for each pass. To make this an absolute position we add to it the starting position (in this case position 0). Because we need to account for the decanter moving all the way around the table, we are going to use modulo <tt>N<\/tt> arithmetic to fold the course back to the available positions.\r\n<pre class=\"codeinput\">start = 0;\r\nlocations = mod(cumsum(singleMove, 2) + start, N);\r\n<span class=\"comment\">% The first 6 locations of 6 simulations<\/span>\r\ndisp(locations(1:6, 1:6))\r\n<\/pre>\r\n<pre class=\"codeoutput\">    10     9    10     0     1     0\r\n    10     9    10     0    10     0\r\n    10     0    10     0    10     9\r\n    10     0     1     2     1     0\r\n    10     0    10     0     1     0\r\n    10     9     8     7     8     9\r\n<\/pre>\r\nWe now have a matrix <tt>locations<\/tt> that contains 100 simulations of 20 passes. Have any of the simulations reached all places around the table? In the function <tt><a href=\"matlab:edit('portPassingSimulation')\">portPassingSimulation<\/a><\/tt> we are going to iterate over each pass seeing which simulations complete in that pass, but here we will simply work out if any of the simulations are complete. To be complete, a simulation needs to have visited all <tt>0:N-1<\/tt> possible locations. A simple way to ask this question is to see if that simulation contains <tt>N<\/tt> unique elements. If it does, it must have visited all the possible locations.\r\n<pre class=\"codeinput\">finished = false(nSims, 1);\r\n<span class=\"keyword\">for<\/span> ii = 1:nSims\r\n    finished(ii) = numel(unique(locations(ii, :))) == N;\r\n<span class=\"keyword\">end<\/span>\r\nnFinished = sum(finished)\r\n<\/pre>\r\n<pre class=\"codeoutput\">nFinished =\r\n    10\r\n<\/pre>\r\nFor those simulations that have not finished, we know their current location (given by <tt>locations(:,end)<\/tt>) and so can simulate another block of <tt>nSims x M<\/tt> passes, keeping on going until all simulations have reached their goal. The full simulation code (<tt><a href=\"matlab:edit('portPassingSimulation')\">portPassingSimulation<\/a><\/tt> available <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/portPassingSimulation.m\">here<\/a>) returns the last position, the number of passes to reach that position, the direction the last position was reached and the number of visits to each position around the table during each simulation. The code is based closely on this strategy, with a slightly different stopping detection strategy.\r\n<pre class=\"codeinput\">[last, nPasses] = portPassingSimulation(nSims, N, p);\r\nresults = table(last, nPasses);\r\ndisp(results(1:10,:))\r\n<\/pre>\r\n<pre class=\"codeoutput\">    last    nPasses\r\n    ____    _______\r\n     7      215    \r\n     7      129    \r\n     8      127    \r\n     8      127    \r\n     3      124    \r\n    10      108    \r\n     4      102    \r\n     8      102    \r\n     5      101    \r\n     3       98    \r\n<\/pre>\r\n<h4>Investigating the final position of the port<a name=\"3f68217f-5940-40a4-a753-18f925fe2218\"><\/a><\/h4>\r\nThe first problem we set ourselves was deciding if our position around the table affected our probability of being the last person to receive the port. To look at this, let's run a fairly large number ($10^5$) of simulations and see how many end up at each of the <tt>N-1<\/tt> possible finishing positions.\r\n<pre class=\"codeinput\">last = portPassingSimulation(1e5, N, p);\r\nhist(last, 1:N-1);\r\ngrid <span class=\"string\">on<\/span>\r\ntitle <span class=\"string\">'10^5 simulations of 11 people around the table'<\/span>\r\nxlabel <span class=\"string\">'Last person to receive the port'<\/span>\r\nylabel <span class=\"string\">'Number of times that person is last'<\/span>\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_01.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n\r\nFrom this histogram we feel greatly relieved! It shows that no matter where we sit at the table, the number of times we are the last person to receive the port is the same as everyone else. It does not matter where we sit at the table!\r\n\r\nThis result seems somewhat counter intuitive; surely how far you are from the starting position should affect the likelihood you are last? First let's see if this result is the same for a different table size. We can check by running the simulator with a larger number of people.\r\n<pre class=\"codeinput\">last = portPassingSimulation(1e5, 41, p);\r\nhist(last, 1:max(last));\r\ngrid <span class=\"string\">on<\/span>\r\ntitle <span class=\"string\">'10^5 simulations of 41 people around the table'<\/span>\r\nxlabel <span class=\"string\">'Last person to receive the port'<\/span>\r\nylabel <span class=\"string\">'Number of times that person is last'<\/span>\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_02.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n\r\nThere is an intuitive way to describe that the probability of being last is independent of where you sit at the table. To be last, the decanter must have arrived at your left or right, then traversed the whole table without passing via you. No matter where you sit at the table, the probability of that set of actions is the same - and thus the probability of you being last is the same.\r\n<h4>Analytic probability for last position when passing is unbiased<a name=\"4f5e474a-aa14-4142-913d-16e761bb6146\"><\/a><\/h4>\r\nWe can show analytically that the likelihood of being last is independent of your position around the table. We will need to quote a result from the <a href=\"http:\/\/web.mit.edu\/neboat\/Public\/6.042\/randomwalks.pdf\">Gamblers Ruin analysis<\/a>. Let $pSAB$ be the probability that the decanter starts at position S and reaches position A before reaching position B. This probability is identical to starting with an initial stake of (<tt>S-B<\/tt>) in Gambler's Ruin and stopping when either the stake is lost or the winnings are (<tt>A-B<\/tt>).\r\n\r\n$$pSAB=\\frac{S-B}{A-B}\\mbox{ for }B&lt;S&lt;A\\mbox{ or }A&lt;S&lt;B$$\r\n\r\nWe can use the <a href=\"https:\/\/www.mathworks.com\/help\/symbolic\/\">Symbolic Math Toolbox<\/a> to manipulate equations without having to do the algebra ourselves (for me, making a transcription mistake much less likely). In this case the probability that some position k is the last is a combination of\r\n<div>\r\n<ol>\r\n \t<li>Starting at position 0 and going to position L (numbered k-1) whilst <b>not<\/b> going to position R (numbered k+1), followed by going from position L to position R whilst <b>not<\/b> going to position k<\/li>\r\n \t<li>Starting at position 0 and going to position R whilst <b>not<\/b> going to position L, followed by going from position R to position L whilst <b>not<\/b> going to position k<\/li>\r\n<\/ol>\r\n<\/div>\r\nIn terms of $pSAB$ this can be written as:\r\n<pre class=\"codeinput\">syms <span class=\"string\">pSAB(S,A,B)<\/span> <span class=\"string\">k<\/span> <span class=\"string\">N<\/span>\r\npSAB(S,A,B) = (S-B)\/(A-B);\r\nL = k-1;\r\nR = k+1-N;\r\npLastUnbiased = simplify(pSAB(0,L,R)*pSAB(L,R,k) + pSAB(0,R,L)*pSAB(R,L,k-N))\r\n<\/pre>\r\n<pre class=\"codeoutput\">pLastUnbiased =\r\n1\/(N - 1)\r\n<\/pre>\r\nNote that to ensure that either $A&lt;S&lt;B \\mbox{ or } B&lt;S&lt;A$ we sometimes need to undertake a modulo approach to A or B. That is why R is written <tt>k+1-N<\/tt> and why in the last term we use <tt>k-N<\/tt> rather than just <tt>k<\/tt>. This ensures that $k&gt;L \\ge 0 \\ge R &gt; k-N$.\r\n\r\nThe result <tt>pLast<\/tt> is independent of <tt>k<\/tt> and shows equal likelihood of being last, agreeing with the simulation.\r\n<h4>What happens if there is a biased probability of passing left or right?<a name=\"3ebc20bd-261c-436f-aef8-9290e9b7f525\"><\/a><\/h4>\r\nThe above simulations have assumed that the probability that any dinner guest passes left or right are equal. What if there is a slight bias to one direction? Will this have a discernible impact on your enjoyment of the port? Let's simulate a situation where the probability of passing left is 0.54 (rather than 0.5).\r\n<pre class=\"codeinput\">N = 17;\r\np = 0.54;\r\nk = 1:N-1;\r\nnSims = 1e5;\r\nlast = portPassingSimulation(nSims, N, p);\r\nhist(last, k);\r\ngrid <span class=\"string\">on<\/span>\r\ntitle <span class=\"string\">'Biased passing simulation of 17 people around the table'<\/span>\r\nxlabel <span class=\"string\">'Last person to receive the port'<\/span>\r\nylabel <span class=\"string\">'Number of times that person is last'<\/span>\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_03.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n\r\nIt is clear from this histogram that even a small bias in the probability hugely affects the likelihood we will be last. We could undertake studies to decide how many dinner parties we needed to attend before we could reasonably deduce that the decanter was biased, but that is another story!\r\n<h4>Does this biased simulation agree with theory?<a name=\"892f094b-0612-4993-ab57-be816f305617\"><\/a><\/h4>\r\nUsing a very similar approach to the analytic method above (see the <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=2198\">next post (not live right away)<\/a> for a derivation) we can show that the biased probability of being last is\r\n\r\n$$P_{last}=\\frac{r^N(1-r)}{r^k(r-r^N)} \\mbox{ where }r=\\frac{1-p}{p} \\mbox{\r\nand } p\\ne \\frac{1}{2}$$\r\n\r\nPlotting that on top of the simulation shows very good agreement.\r\n<pre class=\"codeinput\">r = (1-p)\/p;\r\npLastBiased = r^N*(1-r).\/(r.^k*(r-r^N));\r\nhold <span class=\"string\">on<\/span>\r\nplot(k, pLastBiased*nSims, <span class=\"string\">'ro'<\/span>, <span class=\"string\">'MarkerSize'<\/span>, 8, <span class=\"string\">'LineWidth'<\/span>, 2);\r\nhold <span class=\"string\">off<\/span>\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_04.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n<h4>Who has the most to drink?<a name=\"fd42656f-4d1e-4e4d-83c9-0bd76f28d2b4\"><\/a><\/h4>\r\nNow that we know each guest is equally likely to be the last to receive the port, we could look at the total number of times that the decanter passes each guest on its travels to the last person. Is this the same for each person around the table or not? And does having a biased passing probability change the number of times it passes each person?\r\n<pre class=\"codeinput\">nSims = 3e4;\r\nN = 25;\r\n[~, ~, ~, visitsUnbiased] = portPassingSimulation(nSims, N, 0.5);\r\n[~, ~, ~, visitsBiasedL]  = portPassingSimulation(nSims, N, 0.53);\r\n[~, ~, ~, visitsBiasedR]  = portPassingSimulation(nSims, N, 0.47);\r\nfigure\r\nhold <span class=\"string\">on<\/span>\r\nplot(1:N-1, visitsUnbiased\/nSims, <span class=\"string\">'o-'<\/span>)\r\nplot(1:N-1, visitsBiasedL\/nSims, <span class=\"string\">'o-'<\/span>)\r\nplot(1:N-1, visitsBiasedR\/nSims, <span class=\"string\">'o-'<\/span>)\r\nhold <span class=\"string\">off<\/span>\r\ngrid <span class=\"string\">on<\/span>\r\nlegend(<span class=\"string\">'Unbiased pass'<\/span>, <span class=\"string\">'Biased left-pass'<\/span>, <span class=\"string\">'Biased right-pass'<\/span>, <span class=\"string\">'Location'<\/span>, <span class=\"string\">'N'<\/span>)\r\ntitle <span class=\"string\">'Decanter visits to each guest'<\/span>\r\nxlabel <span class=\"string\">'Location at table'<\/span>\r\nylabel <span class=\"string\">'Number of times decanter passes location'<\/span>\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_05.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n\r\nFrom the graph we can see that guests who are close to the initial position of the decanter (location 1) receive and pass the port a little less than twice as often (for this size table) as the guest opposite them, and that as the passing becomes biased so one side of the table is more likely to receive more wine than the other side. We can also see that the unbiased case has the largest number of passes on average, which we can guess by considering that in the case p==1 (or p==0) the number of times each location receives the port is just once!\r\n<h4>Does the number of guests affect how much each drinks?<a name=\"651d6c60-53d5-426e-a29e-cdfba29ba1a5\"><\/a><\/h4>\r\nThe above graph was for a single table size (<tt>N == 25<\/tt>). What happens if we have different table sizes? Is there a fundamental relationship between how many visits the port makes to each location? Clearly we are going to need a way to compare the visits at different sized tables and with different numbers of simulations. We will plot all results for a normalized location <tt>[-1 1]<\/tt> which maps to our integer locations <tt>1:N-1<\/tt>. In addition, we need to normalize the visits in some way. To do this we will scale the number of visits to each location as a proportion of the number expected for that table if each location were passed equally. To do this we divide the number of visits to each location by the total number of visits and multiply by the number of people at the table. For a table that had each location equally attended, this would produce the value 1 at each location, indicating that the expected number of visits to that location is 1 times the equally distributed number.\r\n<pre class=\"codeinput\">figure\r\nhold <span class=\"string\">on<\/span>\r\ngrid <span class=\"string\">on<\/span>\r\ntableSizes = [5 10 14 15 20 30 35];\r\n<span class=\"keyword\">for<\/span> kk = tableSizes\r\n    [~, ~, ~, visits] = portPassingSimulation(1e4, kk+1, 0.5);\r\n    plot( linspace(-1, 1, kk), kk*visits\/sum(visits), <span class=\"string\">'.-'<\/span>)\r\n<span class=\"keyword\">end<\/span>\r\nhold <span class=\"string\">off<\/span>\r\nlegend(string(tableSizes), <span class=\"string\">'Location'<\/span>, <span class=\"string\">'N'<\/span>)\r\ntitle <span class=\"string\">'Relative likelihood of holding the port for a given table size'<\/span>\r\nxlabel <span class=\"string\">'Normalized table size'<\/span>\r\nylabel <span class=\"string\">'Relative likelihood of holding decanter'<\/span>\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_06.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n\r\nThere looks to be a quadratic relationship here; the interested reader is encouraged to explore this and produce a functional form for the relationship as table size changes and to look at how the average number of visits scales as the table size increases.\r\n<h4>Will the pipe of port run out?<a name=\"845c2ec3-0c93-4930-8aea-b07ed038c16a\"><\/a><\/h4>\r\nLet's assume that each of the guests takes 2 fl.oz. (a shot) of port every time the decanter passes them. We can then calculate how much port is consumed in a given simulation by tracking the number of iterations each simulation took to finish. The second output of <tt>portPassingSimulation<\/tt> returns the number of passes needed to reach the final location. Across all simulations this will give us a distribution function for passes needed to finish.\r\n<pre class=\"codeinput\">nSims = 1e5;\r\nN = 31;\r\np = 0.5;\r\nbinW = 50;\r\n[~, nPasses] = portPassingSimulation(nSims, N, p);\r\nhist(nPasses, 30:binW:max(nPasses));\r\nhold <span class=\"string\">on<\/span>\r\ngrid <span class=\"string\">on<\/span>\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_07.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n\r\nIf you are interested, it is worth looking at this distribution using <a href=\"https:\/\/www.mathworks.com\/help\/stats\/dfittool.html\">Distribution Fitter<\/a> from the <a href=\"https:\/\/www.mathworks.com\/help\/stats\/index.html\">Statistics and Machine Learning Toolbox<\/a>. It seems to be well fitted using either Lognormal or Birnbaum-Saunders distributions. We can fit either of these using the <tt><a href=\"https:\/\/www.mathworks.com\/help\/stats\/fitdist.html\">fitdist<\/a><\/tt> function and plot the expected distribution over the histogram\r\n<pre class=\"codeinput\">pd = fitdist(nPasses, <span class=\"string\">'LogNormal'<\/span>)\r\nx = linspace(1, max(nPasses), 200);\r\nplot(x, nSims*binW*pdf(pd, x), <span class=\"string\">'r'<\/span>);\r\nhold <span class=\"string\">off<\/span>\r\ntitle <span class=\"string\">'Distribution of total passes'<\/span>\r\nxlabel <span class=\"string\">'Number of passes to reach all dinner guests'<\/span>\r\nlegend(<span class=\"string\">'Simulated PDF'<\/span>, <span class=\"string\">'Expected PDF for LogNormal Distribution'<\/span>)\r\n<\/pre>\r\n<pre class=\"codeoutput\">pd = \r\n  LognormalDistribution\r\n\r\n  Lognormal distribution\r\n       mu =  5.98543   [5.98189, 5.98897]\r\n    sigma = 0.570894   [0.568403, 0.573407]\r\n\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_08.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n\r\nWe can use a probability plot to see how well the LogNormal distribution fits the data. You can see that in the tails the distribution is incorrect.\r\n<pre class=\"codeinput\">probplot(<span class=\"string\">'LogNormal'<\/span>, nPasses)\r\ngrid <span class=\"string\">on<\/span>\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_09.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n\r\nBoth LogNormal and Birnbaum-Sanders over-estimate the number of passes around the mean and underestimate around 1.5 times the mean, so whilst fitting fairly well they don't actually model the distribution correctly. The mean of the fitted and actual distributions is shown below.\r\n<pre class=\"codeinput\">fittedMean = pd.mean\r\nactualMean = mean(nPasses)\r\n<\/pre>\r\n<pre class=\"codeoutput\">fittedMean =\r\n       467.96\r\nactualMean =\r\n       465.29\r\n<\/pre>\r\nTo account for this inability to fit the simulation data we are going to use the empirical mean for the rest of this analysis. If we were more sure that we understood the expected distribution, we would use a fitted distribution.\r\n<h4>Predicting the amount of port consumed<a name=\"0ec66b5d-cd38-49c4-a4a2-d77604961ef5\"><\/a><\/h4>\r\nNext let's investigate if there is a simple relationship between the number of people at the table and the expected number of passes for the port to go around the table. We can do this by running a number of different simulations, looking at the relationship between the mean and number of people. Let's do this for dinner tables of size <tt>2:30<\/tt> with unbiased passing probability. To speed up the computation we use a <a href=\"https:\/\/www.mathworks.com\/help\/distcomp\/parallel-for-loops-parfor.html\">parallel for-loop<\/a> from the <a href=\"https:\/\/www.mathworks.com\/help\/distcomp\/index.html\">Parallel Computing Toolbox<\/a>, since each of the simulations is independent of the others. By default, this will automatically open a parallel pool if one isn't already open.\r\n<pre class=\"codeinput\">p = 0.5;\r\ntableSize = 2:30;\r\ntToFinish = zeros(size(tableSize));\r\n<span class=\"keyword\">parfor<\/span> ii = 1:numel(tableSize)\r\n    [~, nPasses] = portPassingSimulation(1e5, tableSize(ii), p);\r\n    tToFinish(ii) = mean(nPasses);\r\n<span class=\"keyword\">end<\/span>\r\nplot(tableSize, tToFinish, <span class=\"string\">'o'<\/span>)\r\ntitle  <span class=\"string\">'Mean Passes to Finish vs. Number of Guests'<\/span>\r\nxlabel <span class=\"string\">'Number of people at the table'<\/span>\r\nylabel <span class=\"string\">'Mean number of passes to reach all people'<\/span>\r\ngrid <span class=\"string\">on<\/span>\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_10.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n\r\nIt looks like there is a very strong polynomial relationship between these quantities. What does a polynomial fit of degree 4 and degree 2 look like?\r\n<pre class=\"codeinput\">f4 = polyfit(tableSize, tToFinish, 4)\r\nf2 = polyfit(tableSize, tToFinish, 2)\r\n<\/pre>\r\n<pre class=\"codeoutput\">f4 =\r\n   1.8679e-05   -0.0011103      0.52081     -0.63573      0.24387\r\nf2 =\r\n      0.49955     -0.49157    -0.015421\r\n<\/pre>\r\nNotice how the 3rd and 4th order parameters of <tt>f4<\/tt> are basically zero? This shows that the data is well fitted using just a second-order polynomial.\r\n<pre class=\"codeinput\">hold <span class=\"string\">on<\/span>\r\nx = linspace(min(tableSize), max(tableSize), 100);\r\nplot(x, polyval(f2, x), <span class=\"string\">'r'<\/span>);\r\nhold <span class=\"string\">off<\/span>\r\nlegend(<span class=\"string\">'Measured mean of distribution'<\/span>, <span class=\"string\">'Fitted quadratic'<\/span>, <span class=\"string\">'Location'<\/span>, <span class=\"string\">'NW'<\/span>)\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_11.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n<h4>An analytic conjecture<a name=\"a2e730e0-a6b0-4fe5-8f70-3a0ddf407ee9\"><\/a><\/h4>\r\nAlso I noticed how close the values of <tt>f2<\/tt> are to <tt>[0.5 -0.5 0]<\/tt>. This seems too good to be true! If you run more simulations (for example with tables sizes from <tt>2:100<\/tt>) with a larger number of simulations per table size, the polynomial fit to the mean gets closer to this set of values. My conjecture is that for a table of size <tt>N<\/tt>\r\n\r\n$$E(\\mbox{passes to finish}) = \\frac{N(N-1)}{2}$$\r\n\r\nWe can try out this conjecture for a few table sizes by running a number of different simulations, taking away the expected number of passes for that table size and looking at the residuals to see if their mean is zero.\r\n<pre class=\"codeinput\">residuals = zeros(60,1);\r\n<span class=\"keyword\">parfor<\/span> ii = 1:60\r\n    <span class=\"comment\">% Use table sizes 9:4:29<\/span>\r\n    N = 9 + 4*mod(ii, 6);\r\n    [~, nPasses] = portPassingSimulation(1e5, N, 0.5);\r\n    eMeanPasses = polyval([0.5 -0.5 0], N);\r\n    residuals(ii) = mean(nPasses) - eMeanPasses;\r\n<span class=\"keyword\">end<\/span>\r\n<span class=\"comment\">% Are the residuals drawn from a normal distribution with zero mean? We can<\/span>\r\n<span class=\"comment\">% use a t-test to look at this<\/span>\r\n[H1, pVal1, CI, stats] = ttest(residuals)\r\n<\/pre>\r\n<pre class=\"codeoutput\">H1 =\r\n     0\r\npVal1 =\r\n      0.32436\r\nCI =\r\n     -0.18322\r\n     0.061617\r\nstats = \r\n  struct with fields:\r\n\r\n    tstat: -0.99384\r\n       df: 59\r\n       sd: 0.47389\r\n<\/pre>\r\nThe output (<tt>H==0<\/tt>) from t-test shows that we cannot reject the null hypothesis - the data could be drawn from a normal distribution with zero mean, and with only with probability <tt>pVal<\/tt> would we expect a distribution less like a normal one be drawn at random.\r\n\r\nLooking at a number of these results we can see that the tails of the distribution are not actually normal, so we can use a \"t-location scale\" probability plot to see how well the residuals are fitted by the distribution. If it fits well then all the points should be on the straight line.\r\n<pre class=\"codeinput\">pd = fitdist(residuals, <span class=\"string\">'tLocationScale'<\/span>);\r\nprobplot(pd, residuals)\r\ngrid <span class=\"string\">on<\/span>\r\ntitle <span class=\"string\">'Probability plot with t Location Scale distribution'<\/span>\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_12.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n\r\nFinally, we could test the likelihood that the data was drawn from that fitted distribution, but with zero mean, using a chi-squared goodness of fit test\r\n<pre class=\"codeinput\">pd.mu = 0;\r\n[H2, pVal2] = chi2gof(residuals, <span class=\"string\">'cdf'<\/span>, pd)\r\n<\/pre>\r\n<pre class=\"codeoutput\">H2 =\r\n     0\r\npVal2 =\r\n      0.57262\r\n<\/pre>\r\nThat seems to fit reasonably well, indicating that our conjecture seems to fit the data. See my next blog post for an analytic proof of this conjecture. So we can now finally answer the question about the pipe of port running out, since we can predict the mean number of passes for a given table size.\r\n<pre class=\"codeinput\"><span class=\"comment\">% 1008 Imperial Pints in a Pipe and 2 fl.oz. glass<\/span>\r\n<span class=\"comment\">% and 20 fl.oz. in Imperial Pint<\/span>\r\npintsInPipeOfPort = 1008;\r\nmaxPasses = pintsInPipeOfPort*20\/2;\r\nN = sym(<span class=\"string\">'N'<\/span>);\r\nassume(N &gt; 0)\r\n<span class=\"comment\">% Use the Symbolic Math Toolbox to solve the equation for k<\/span>\r\ndouble(solve( N*(N-1)\/2 == maxPasses, N ))\r\n<\/pre>\r\n<pre class=\"codeoutput\">ans =\r\n       142.49\r\n<\/pre>\r\nThis tells us that with a table size of 142 less than half of her dinners will run out of port!\r\n<h4>How does the amount of port consumed depend on the final position?<a name=\"cf6ce7a9-5eec-452e-94c8-b98f5d14369c\"><\/a><\/h4>\r\nOne final aspect of the problem that we can investigate is how the overall path length (and hence amount of port consumed) varies depending on the position that comes last. Thinking about the paths that need to be taken for positions near the starting point to be last as opposed to ones further away we might intuitively think that the nearer ones might be shorter. Let's see if this is the case, and how the paths change as the passing becomes biased.\r\n\r\nFirstly, we need to run the simulation (unbiased for a table of size 17).\r\n<pre class=\"codeinput\">N = 17;\r\np = 0.5;\r\n[last, nPasses] = portPassingSimulation(5e5, N, p);\r\n<\/pre>\r\nNext, look at the mean of the number of passes, but group those means by the possible last position. There is a really easy way to do this using the <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/varfun.html\">varfun<\/a><\/tt> method of a table, and grouping by the variable <tt>last<\/tt>.\r\n<pre class=\"codeinput\">G = varfun(@mean, table(last, nPasses), <span class=\"string\">'GroupingVariable'<\/span>, <span class=\"string\">'last'<\/span>)\r\n<\/pre>\r\n<pre class=\"codeoutput\">G = \r\n    last    GroupCount    mean_nPasses\r\n    ____    __________    ____________\r\n     1      31392         100.71      \r\n     2      31392         114.98      \r\n     3      31256         127.18      \r\n     4      31205          137.1      \r\n     5      31209         144.15      \r\n     6      31254         151.21      \r\n     7      31255         155.31      \r\n     8      31429         156.65      \r\n     9      30950         156.79      \r\n    10      31405         155.29      \r\n    11      31182         150.78      \r\n    12      31333         144.77      \r\n    13      31276         136.87      \r\n    14      31015          127.2      \r\n    15      31313         115.35      \r\n    16      31134         100.88      \r\n<\/pre>\r\nPlotting the mean passes we can see that positions closest to zero have the shortest paths and those furthest away the longest.\r\n<pre class=\"codeinput\">plot(G.last, G.mean_nPasses, <span class=\"string\">'o'<\/span>)\r\ntitle  <span class=\"string\">'Expected passes given position that is last'<\/span>\r\nxlabel <span class=\"string\">'Position that receives port last'<\/span>\r\nylabel <span class=\"string\">'Expected number of passes'<\/span>\r\ngrid <span class=\"string\">on<\/span>\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_13.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n\r\nFinally, looking at what happens if the situation becomes biased, we probably expect that the position with the longest expected path will shift to one side (i.e. closer to the start position). In fact, thinking about the limit where p approaches 1 we should expect that the path length to position N-1 simply becomes N (since the decanter will pass from k to k+1 with probability close to 1). The path length for N-2 will probably be very close to that for N-1, with a deviation to the right somewhere around the table (with close to zero probability). Thus for p &gt; 0.5 we expect the peak in the path length to move towards position zero and for the expected path length for k = N-1 to be shorter than for k = 1.\r\n<pre class=\"codeinput\">N = 17;\r\np = 0.57;\r\n[last, nPasses] = portPassingSimulation(5e5, N, p);\r\nG = varfun(@mean, table(last, nPasses), <span class=\"string\">'GroupingVariable'<\/span>, <span class=\"string\">'last'<\/span>);\r\nplot(G.last, G.mean_nPasses, <span class=\"string\">'o'<\/span>)\r\ntitle  <span class=\"string\">'Expected passes given position that is last (biased passing)'<\/span>\r\nxlabel <span class=\"string\">'Position that receives port last'<\/span>\r\nylabel <span class=\"string\">'Expected number of passes'<\/span>\r\ngrid <span class=\"string\">on<\/span>\r\n<\/pre>\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_14.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n<h4>And finally<a name=\"a814ebcc-8a1f-406f-85b4-792ff05b2d3a\"><\/a><\/h4>\r\nThe thing I've enjoyed most about exploring this problem is how simple it is to state and yet how much complexity I've found as I progressed further. It started out by simply asking how likely someone was to be last, yet I've ended up learning how to solve fairly complicated recurrence relationships, and exploring aspects of random walks that I didn't know about. In addition, I've found that the experimental nature of the simulation, combined with the analytic rigor of solving the mathematics of the random walk processes enlightening. It is not common (for me at least) to be able to bring these two different sides of MATLAB together and have simulation results help guide analytic discovery. In a few weeks' time I'll post another article on how I used the <a href=\"https:\/\/www.mathworks.com\/help\/symbolic\/\">Symbolic Math Toolbox<\/a> to deduce these expected path lengths analytically and compare the results with the simulations. In the meantime, maybe you can take this analysis further? One direction that I've so far ignored is to deduce the expected form of the path length distribution. Or perhaps you have a related random walk process that might succumb to a similar analysis? Let me know in <a href=\"https:\/\/blogs.mathworks.com\/loren\/2017\/02\/14\/the-perverse-port-passing-problem\/#respond\">here<\/a>.\r\n\r\n<script language=\"JavaScript\"> <!-- \r\n    function grabCode_22db8d1fd51d4ca49684521b462c0bd8() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='22db8d1fd51d4ca49684521b462c0bd8 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 22db8d1fd51d4ca49684521b462c0bd8';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2017 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<\/p>\r\n<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\r\n<p>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script>\r\n<p style=\"text-align: right; font-size: xx-small; font-weight: lighter; font-style: italic; color: gray;\">\r\n<a><span style=\"font-size: x-small; font-style: italic;\">Get\r\nthe MATLAB code<noscript>(requires JavaScript)<\/noscript><\/span><\/a>\r\n\r\nPublished with MATLAB\u00ae R2016b<\/p>\r\n\r\n<\/div>\r\n<!--\r\n22db8d1fd51d4ca49684521b462c0bd8 ##### SOURCE BEGIN #####\r\n%% The Perverse Port Passing Problem\r\n% Today's guest blogger is <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/1141841-Jos-Martin % Jos Martin>, from the Parallel Computing group at MathWorks. Whilst normally\r\n% focussing on making parallel computing both easy and fast in MATLAB, occasionally\r\n% he likes to use MATLAB to explore other problem spaces. Here, he writes about\r\n% an interesting random walk process he encountered that has allowed him to explore\r\n% both simulation and analytic methods in MATLAB, and using each of those methods\r\n% to help guide the other.\r\n%% Introduction\r\n% <http:\/\/www.piday.org\/ Pi day> is coming and you have been invited to dine\r\n% with an eccentric but mathematically minded host. She has provided many <http:\/\/www.georgehart.com\/bagel\/bagel.html % interesting delicacies> and you have now <http:\/\/www.bbc.co.uk\/news\/blogs-magazine-monitor-26526076 % finally arrived> at the <https:\/\/geometricdelights.wordpress.com\/2011\/04\/17\/cheese-usamts\/ % cheese course>; of course port is served alongside a wonderful round of Stilton.\r\n% However, in defiance of normal protocol (<http:\/\/www.taylor.pt\/en\/enjoy-port-wine\/traditions\/passing-the-decanter\/ % passing to the left>), she instructs her guests to randomly choose a direction\r\n% in which to pass the port decanter. You are some way around the circular table\r\n% and feeling a degree of consternation at this strange turn of events. Should\r\n% you be concerned? How does your position at the table affect the likelihood\r\n% of being the last person to sample this exceptional <http:\/\/www.decanter.com\/reviews\/port\/fonseca-port-douro-portugal-1963\/ % Fonseca '63>? For those pedantic few who immediately worry that the port might\r\n% run out, I can assert she is a very accommodating host and has assured her guests\r\n% that the decanter will continue to be filled from the <https:\/\/en.wikipedia.org\/wiki\/English_wine_cask_units % pipe of port> she <http:\/\/www.decanter.com\/wine-news\/bonhams-to-hold-rare-fonseca-1963-auction-286004\/ % recently acquired>. Another question one might pose is: on average how much\r\n% port will be consumed (assuming that at each pass each guest takes a fixed amount\r\n% of port)?\r\n%\r\n% This problem can be modelled with a\r\n% <https:\/\/en.wikipedia.org\/wiki\/Random_walk random walk>, similar in\r\n% nature to a well-studied case called\r\n% <http:\/\/web.mit.edu\/neboat\/Public\/6.042\/randomwalks.pdf Gambler's Ruin>.\r\n% The major difference here will be the need to visit all locations around\r\n% the table, rather than simply reaching one end or other of the walk. To\r\n% explore this apparently simple problem I will start by using a simulation\r\n% approach that will help guide analytic methods (in a future\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=2198 post>). To do this I'll be\r\n% using our statistics and symbolic maths tools.\r\n%% A simulation approach\r\n% There are analytical methods to attack this problem (I will quote some of\r\n% these during this article, for which all <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/61285-port-passing-problem % files and proofs> are available on File Exchange) but here I want to focus on\r\n% a <https:\/\/en.wikipedia.org\/wiki\/Monte_Carlo_method monte-carlo simulation>\r\n% approach. To exemplify the approach, we will run |nSims| independent simulations\r\n% passing the decanter between |N| people (always starting at person 0), with\r\n% a probability |p| of passing left (and |1-p| of passing right).\r\n\r\nnSims = 100;\r\nN = 11;\r\np = 0.5;\r\n%%\r\n% To simulate |M| passes of the port across the |nSims| simulations we make\r\n% a matrix |singleMove| containing only values 1 and -1 to indicate a pass to\r\n% the port to left or right using uniform random numbers.\r\n\r\nM = 25;\r\nsingleMove = 2*(rand(nSims, M) >= p) - 1;\r\n% The first 6 pass directions of 6 simulations\r\ndisp(singleMove(1:6, 1:6));\r\n%%\r\n% By knowing the direction of each pass we can compute the location of the\r\n% port compared to its starting point by taking the cumulative sum of the directions\r\n% for each pass. To make this an absolute position we add to it the starting position\r\n% (in this case position 0). Because we need to account for the decanter moving\r\n% all the way around the table, we are going to use modulo |N| arithmetic to fold\r\n% the course back to the available positions.\r\n\r\nstart = 0;\r\nlocations = mod(cumsum(singleMove, 2) + start, N);\r\n% The first 6 locations of 6 simulations\r\ndisp(locations(1:6, 1:6))\r\n%%\r\n% We now have a matrix |locations| that contains 100 simulations of 20 passes.\r\n% Have any of the simulations reached all places around the table? In the function\r\n% |<matlab:edit('portPassingSimulation') portPassingSimulation>| we are going\r\n% to iterate over each pass seeing which simulations complete in that pass, but\r\n% here we will simply work out if any of the simulations are complete. To be complete,\r\n% a simulation needs to have visited all |0:N-1| possible locations. A simple\r\n% way to ask this question is to see if that simulation contains |N| unique elements.\r\n% If it does, it must have visited all the possible locations.\r\n\r\nfinished = false(nSims, 1);\r\nfor ii = 1:nSims\r\nfinished(ii) = numel(unique(locations(ii, :))) == N;\r\nend\r\nnFinished = sum(finished)\r\n%%\r\n% For those simulations that have not finished, we know their current location\r\n% (given by |locations(:,end)|) and so can simulate another block of |nSims x\r\n% M| passes, keeping on going until all simulations have reached their goal. The\r\n% full simulation code (|<matlab:edit('portPassingSimulation') portPassingSimulation>|\r\n% available <https:\/\/blogs.mathworks.com\/images\/loren\/2017\/portPassingSimulation.m % here>) returns the last position, the number of passes to reach that position,\r\n% the direction the last position was reached and the number of visits to each\r\n% position around the table during each simulation. The code is based closely\r\n% on this strategy, with a slightly different stopping detection strategy.\r\n\r\n[last, nPasses] = portPassingSimulation(nSims, N, p);\r\nresults = table(last, nPasses);\r\ndisp(results(1:10,:))\r\n%% Investigating the final position of the port\r\n% The first problem we set ourselves was deciding if our position around the\r\n% table affected our probability of being the last person to receive the port.\r\n% To look at this, let's run a fairly large number ($10^5$) of simulations and\r\n% see how many end up at each of the |N-1| possible finishing positions.\r\n\r\nlast = portPassingSimulation(1e5, N, p);\r\nhist(last, 1:N-1);\r\ngrid on\r\ntitle '10^5 simulations of 11 people around the table'\r\nxlabel 'Last person to receive the port'\r\nylabel 'Number of times that person is last'\r\n%%\r\n% From this histogram we feel greatly relieved! It shows that no matter\r\n% where we sit at the table, the number of times we are the last person to receive\r\n% the port is the same as everyone else. It does not matter where we sit at the\r\n% table!\r\n%\r\n% This result seems somewhat counter intuitive; surely how far you are from\r\n% the starting position should affect the likelihood you are last? First let's\r\n% see if this result is the same for a different table size. We can check by running\r\n% the simulator with a larger number of people.\r\n\r\nlast = portPassingSimulation(1e5, 41, p);\r\nhist(last, 1:max(last));\r\ngrid on\r\ntitle '10^5 simulations of 41 people around the table'\r\nxlabel 'Last person to receive the port'\r\nylabel 'Number of times that person is last'\r\n%%\r\n% There is an intuitive way to describe that the probability of being last\r\n% is independent of where you sit at the table. To be last, the decanter must\r\n% have arrived at your left or right, then traversed the whole table without passing\r\n% via you. No matter where you sit at the table, the probability of that set of\r\n% actions is the same - and thus the probability of you being last is the same.\r\n%% Analytic probability for last position when passing is unbiased\r\n% We can show analytically that the likelihood of being last is independent\r\n% of your position around the table. We will need to quote a result from the <http:\/\/web.mit.edu\/neboat\/Public\/6.042\/randomwalks.pdf % Gamblers Ruin analysis>. Let $pSAB$ be the probability that the decanter starts\r\n% at position S and reaches position A before reaching position B. This probability\r\n% is identical to starting with an initial stake of (|S-B|) in Gambler's Ruin\r\n% and stopping when either the stake is lost or the winnings are (|A-B|).\r\n%\r\n% $$pSAB=\\frac{S-B}{A-B}\\mbox{ for }B<S<A\\mbox{ or }A<S<B$$\r\n%\r\n% We can use the <https:\/\/www.mathworks.com\/help\/symbolic\/ Symbolic Math % Toolbox> to manipulate equations without having to do the algebra ourselves\r\n% (for me, making a transcription mistake much less likely). In this case the\r\n% probability that some position k is the last is a combination of\r\n%\r\n% # Starting at position 0 and going to position L (numbered k-1) whilst *not*\r\n% going to position R (numbered k+1), followed by going from position L to position\r\n% R whilst *not* going to position k\r\n% # Starting at position 0 and going to position R whilst *not* going to position\r\n% L, followed by going from position R to position L whilst *not* going to position\r\n% k\r\n%\r\n% In terms of $pSAB$ this can be written as:\r\n\r\nsyms pSAB(S,A,B) k N\r\npSAB(S,A,B) = (S-B)\/(A-B);\r\nL = k-1;\r\nR = k+1-N;\r\npLastUnbiased = simplify(pSAB(0,L,R)*pSAB(L,R,k) + pSAB(0,R,L)*pSAB(R,L,k-N))\r\n%%\r\n% Note that to ensure that either $A<S<B \\mbox{ or } B<S<A$ we sometimes % need to undertake a modulo approach to A or B. That is why R is written |k+1-N| % and why in the last term we use |k-N| rather than just |k|. This ensures that % $k>L \\ge 0 \\ge R > k-N$.\r\n%\r\n% The result |pLast| is independent of |k| and shows equal likelihood of\r\n% being last, agreeing with the simulation.\r\n%% What happens if there is a biased probability of passing left or right?\r\n% The above simulations have assumed that the probability that any dinner guest\r\n% passes left or right are equal. What if there is a slight bias to one direction?\r\n% Will this have a discernible impact on your enjoyment of the port? Let's simulate\r\n% a situation where the probability of passing left is 0.54 (rather than 0.5).\r\n\r\nN = 17;\r\np = 0.54;\r\nk = 1:N-1;\r\nnSims = 1e5;\r\nlast = portPassingSimulation(nSims, N, p);\r\nhist(last, k);\r\ngrid on\r\ntitle 'Biased passing simulation of 17 people around the table'\r\nxlabel 'Last person to receive the port'\r\nylabel 'Number of times that person is last'\r\n%%\r\n% It is clear from this histogram that even a small bias in the probability\r\n% hugely affects the likelihood we will be last. We could undertake studies to\r\n% decide how many dinner parties we needed to attend before we could reasonably\r\n% deduce that the decanter was biased, but that is another story!\r\n%% Does this biased simulation agree with theory?\r\n% Using a very similar approach to the analytic method above (see the\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=2198 next post (not live right % away)> for a derivation) we can show that the biased probability of being\r\n% last is\r\n%\r\n% $$P_{last}=\\frac{r^N(1-r)}{r^k(r-r^N)} \\mbox{ where }r=\\frac{1-p}{p} \\mbox{\r\n% and } p\\ne \\frac{1}{2}$$\r\n%\r\n% Plotting that on top of the simulation shows very good agreement.\r\n\r\nr = (1-p)\/p;\r\npLastBiased = r^N*(1-r).\/(r.^k*(r-r^N));\r\nhold on\r\nplot(k, pLastBiased*nSims, 'ro', 'MarkerSize', 8, 'LineWidth', 2);\r\nhold off\r\n%% Who has the most to drink?\r\n% Now that we know each guest is equally likely to be the last to receive the\r\n% port, we could look at the total number of times that the decanter passes each\r\n% guest on its travels to the last person. Is this the same for each person around\r\n% the table or not? And does having a biased passing probability change the number\r\n% of times it passes each person?\r\n\r\nnSims = 3e4;\r\nN = 25;\r\n[~, ~, ~, visitsUnbiased] = portPassingSimulation(nSims, N, 0.5);\r\n[~, ~, ~, visitsBiasedL]  = portPassingSimulation(nSims, N, 0.53);\r\n[~, ~, ~, visitsBiasedR]  = portPassingSimulation(nSims, N, 0.47);\r\nfigure\r\nhold on\r\nplot(1:N-1, visitsUnbiased\/nSims, 'o-')\r\nplot(1:N-1, visitsBiasedL\/nSims, 'o-')\r\nplot(1:N-1, visitsBiasedR\/nSims, 'o-')\r\nhold off\r\ngrid on\r\nlegend('Unbiased pass', 'Biased left-pass', 'Biased right-pass', 'Location', 'N')\r\ntitle 'Decanter visits to each guest'\r\nxlabel 'Location at table'\r\nylabel 'Number of times decanter passes location'\r\n%%\r\n% From the graph we can see that guests who are close to the initial position\r\n% of the decanter (location 1) receive and pass the port a little less than twice\r\n% as often (for this size table) as the guest opposite them, and that as the passing\r\n% becomes biased so one side of the table is more likely to receive more wine\r\n% than the other side. We can also see that the unbiased case has the largest\r\n% number of passes on average, which we can guess by considering that in the case\r\n% p==1 (or p==0) the number of times each location receives the port is just once!\r\n%% Does the number of guests affect how much each drinks?\r\n% The above graph was for a single table size (|N == 25|). What happens if we\r\n% have different table sizes? Is there a fundamental relationship between how\r\n% many visits the port makes to each location? Clearly we are going to need a\r\n% way to compare the visits at different sized tables and with different numbers\r\n% of simulations. We will plot all results for a normalized location |[-1 1]|\r\n% which maps to our integer locations |1:N-1|. In addition, we need to normalize\r\n% the visits in some way. To do this we will scale the number of visits to each\r\n% location as a proportion of the number expected for that table if each location\r\n% were passed equally. To do this we divide the number of visits to each location\r\n% by the total number of visits and multiply by the number of people at the table.\r\n% For a table that had each location equally attended, this would produce the\r\n% value 1 at each location, indicating that the expected number of visits to that\r\n% location is 1 times the equally distributed number.\r\n\r\nfigure\r\nhold on\r\ngrid on\r\ntableSizes = [5 10 14 15 20 30 35];\r\nfor kk = tableSizes\r\n[~, ~, ~, visits] = portPassingSimulation(1e4, kk+1, 0.5);\r\nplot( linspace(-1, 1, kk), kk*visits\/sum(visits), '.-')\r\nend\r\nhold off\r\nlegend(string(tableSizes), 'Location', 'N')\r\ntitle 'Relative likelihood of holding the port for a given table size'\r\nxlabel 'Normalized table size'\r\nylabel 'Relative likelihood of holding decanter'\r\n%%\r\n% There looks to be a quadratic relationship here; the interested reader\r\n% is encouraged to explore this and produce a functional form for the relationship\r\n% as table size changes and to look at how the average number of visits scales\r\n% as the table size increases.\r\n%% Will the pipe of port run out?\r\n% Let's assume that each of the guests takes 2 fl.oz. (a shot) of port every\r\n% time the decanter passes them. We can then calculate how much port is consumed\r\n% in a given simulation by tracking the number of iterations each simulation took\r\n% to finish. The second output of |portPassingSimulation| returns the number of\r\n% passes needed to reach the final location. Across all simulations this will\r\n% give us a distribution function for passes needed to finish.\r\n\r\nnSims = 1e5;\r\nN = 31;\r\np = 0.5;\r\nbinW = 50;\r\n[~, nPasses] = portPassingSimulation(nSims, N, p);\r\nhist(nPasses, 30:binW:max(nPasses));\r\nhold on\r\ngrid on\r\n%%\r\n% If you are interested, it is worth looking at this distribution using\r\n% <https:\/\/www.mathworks.com\/help\/stats\/dfittool.html Distribution Fitter> from\r\n% the <https:\/\/www.mathworks.com\/help\/stats\/index.html Statistics and Machine % Learning Toolbox>. It seems to be well fitted using either Lognormal or Birnbaum-Saunders\r\n% distributions. We can fit either of these using the |<https:\/\/www.mathworks.com\/help\/stats\/fitdist.html % fitdist>| function and plot the expected distribution over the histogram\r\n\r\npd = fitdist(nPasses, 'LogNormal')\r\nx = linspace(1, max(nPasses), 200);\r\nplot(x, nSims*binW*pdf(pd, x), 'r');\r\nhold off\r\ntitle 'Distribution of total passes'\r\nxlabel 'Number of passes to reach all dinner guests'\r\nlegend('Simulated PDF', 'Expected PDF for LogNormal Distribution')\r\n%%\r\n% We can use a probability plot to see how well the LogNormal distribution\r\n% fits the data. You can see that in the tails the distribution is incorrect.\r\n\r\nprobplot('LogNormal', nPasses)\r\ngrid on\r\n%%\r\n% Both LogNormal and Birnbaum-Sanders over-estimate the number of passes\r\n% around the mean and underestimate around 1.5 times the mean, so whilst fitting\r\n% fairly well they don't actually model the distribution correctly. The mean of\r\n% the fitted and actual distributions is shown below.\r\n\r\nfittedMean = pd.mean\r\nactualMean = mean(nPasses)\r\n%%\r\n% To account for this inability to fit the simulation data we are going\r\n% to use the empirical mean for the rest of this analysis. If we were more sure\r\n% that we understood the expected distribution, we would use a fitted distribution.\r\n%% Predicting the amount of port consumed\r\n% Next let's investigate if there is a simple relationship between the number\r\n% of people at the table and the expected number of passes for the port to go\r\n% around the table. We can do this by running a number of different simulations,\r\n% looking at the relationship between the mean and number of people. Let's do\r\n% this for dinner tables of size |2:30| with unbiased passing probability. To\r\n% speed up the computation we use a <https:\/\/www.mathworks.com\/help\/distcomp\/parallel-for-loops-parfor.html % parallel for-loop> from the <https:\/\/www.mathworks.com\/help\/distcomp\/index.html % Parallel Computing Toolbox>, since each of the simulations is independent of\r\n% the others. By default, this will automatically open a parallel pool if one\r\n% isn't already open.\r\n\r\np = 0.5;\r\ntableSize = 2:30;\r\ntToFinish = zeros(size(tableSize));\r\nparfor ii = 1:numel(tableSize)\r\n[~, nPasses] = portPassingSimulation(1e5, tableSize(ii), p);\r\ntToFinish(ii) = mean(nPasses);\r\nend\r\nplot(tableSize, tToFinish, 'o')\r\ntitle  'Mean Passes to Finish vs. Number of Guests'\r\nxlabel 'Number of people at the table'\r\nylabel 'Mean number of passes to reach all people'\r\ngrid on\r\n%%\r\n% It looks like there is a very strong polynomial relationship between these\r\n% quantities. What does a polynomial fit of degree 4 and degree 2 look like?\r\n\r\nf4 = polyfit(tableSize, tToFinish, 4)\r\nf2 = polyfit(tableSize, tToFinish, 2)\r\n%%\r\n% Notice how the 3rd and 4th order parameters of |f4| are basically zero?\r\n% This shows that the data is well fitted using just a second-order polynomial.\r\n\r\nhold on\r\nx = linspace(min(tableSize), max(tableSize), 100);\r\nplot(x, polyval(f2, x), 'r');\r\nhold off\r\nlegend('Measured mean of distribution', 'Fitted quadratic', 'Location', 'NW')\r\n%% An analytic conjecture\r\n% Also I noticed how close the values of |f2| are to |[0.5 -0.5 0]|. This seems\r\n% too good to be true! If you run more simulations (for example with tables sizes\r\n% from |2:100|) with a larger number of simulations per table size, the polynomial\r\n% fit to the mean gets closer to this set of values. My conjecture is that for\r\n% a table of size |N|\r\n%\r\n% $$E(\\mbox{passes to finish}) = \\frac{N(N-1)}{2}$$\r\n%\r\n% We can try out this conjecture for a few table sizes by running a number\r\n% of different simulations, taking away the expected number of passes for that\r\n% table size and looking at the residuals to see if their mean is zero.\r\n\r\nresiduals = zeros(60,1);\r\nparfor ii = 1:60\r\n% Use table sizes 9:4:29\r\nN = 9 + 4*mod(ii, 6);\r\n[~, nPasses] = portPassingSimulation(1e5, N, 0.5);\r\neMeanPasses = polyval([0.5 -0.5 0], N);\r\nresiduals(ii) = mean(nPasses) - eMeanPasses;\r\nend\r\n% Are the residuals drawn from a normal distribution with zero mean? We can\r\n% use a t-test to look at this\r\n[H1, pVal1, CI, stats] = ttest(residuals)\r\n%%\r\n% The output (|H==0|) from t-test shows that we cannot reject the null hypothesis\r\n% - the data could be drawn from a normal distribution with zero mean, and with\r\n% only with probability |pVal| would we expect a distribution less like a normal\r\n% one be drawn at random.\r\n%\r\n% Looking at a number of these results we can see that the tails of the distribution\r\n% are not actually normal, so we can use a \"t-location scale\" probability plot\r\n% to see how well the residuals are fitted by the distribution. If it fits well\r\n% then all the points should be on the straight line.\r\n\r\npd = fitdist(residuals, 'tLocationScale');\r\nprobplot(pd, residuals)\r\ngrid on\r\ntitle 'Probability plot with t Location Scale distribution'\r\n%%\r\n% Finally, we could test the likelihood that the data was drawn from that\r\n% fitted distribution, but with zero mean, using a chi-squared goodness of fit\r\n% test\r\n\r\npd.mu = 0;\r\n[H2, pVal2] = chi2gof(residuals, 'cdf', pd)\r\n%%\r\n% That seems to fit reasonably well, indicating that our conjecture seems\r\n% to fit the data. See my next blog post for an analytic proof of this conjecture.\r\n% So we can now finally answer the question about the pipe of port running out,\r\n% since we can predict the mean number of passes for a given table size.\r\n\r\n% 1008 Imperial Pints in a Pipe and 2 fl.oz. glass\r\n% and 20 fl.oz. in Imperial Pint\r\npintsInPipeOfPort = 1008;\r\nmaxPasses = pintsInPipeOfPort*20\/2;\r\nN = sym('N');\r\nassume(N > 0)\r\n% Use the Symbolic Math Toolbox to solve the equation for k\r\ndouble(solve( N*(N-1)\/2 == maxPasses, N ))\r\n%%\r\n% This tells us that with a table size of 142 less than half of her dinners\r\n% will run out of port!\r\n%% How does the amount of port consumed depend on the final position?\r\n% One final aspect of the problem that we can investigate is how the overall\r\n% path length (and hence amount of port consumed) varies depending on the position\r\n% that comes last. Thinking about the paths that need to be taken for positions\r\n% near the starting point to be last as opposed to ones further away we might\r\n% intuitively think that the nearer ones might be shorter. Let's see if this is\r\n% the case, and how the paths change as the passing becomes biased.\r\n%\r\n% Firstly, we need to run the simulation (unbiased for a table of size 17).\r\n\r\nN = 17;\r\np = 0.5;\r\n[last, nPasses] = portPassingSimulation(5e5, N, p);\r\n%%\r\n% Next, look at the mean of the number of passes, but group those means\r\n% by the possible last position. There is a really easy way to do this using the\r\n% |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/varfun.html varfun>| method of a\r\n% table, and grouping by the variable |last|.\r\n\r\nG = varfun(@mean, table(last, nPasses), 'GroupingVariable', 'last')\r\n%%\r\n% Plotting the mean passes we can see that positions closest to zero have\r\n% the shortest paths and those furthest away the longest.\r\n\r\nplot(G.last, G.mean_nPasses, 'o')\r\ntitle  'Expected passes given position that is last'\r\nxlabel 'Position that receives port last'\r\nylabel 'Expected number of passes'\r\ngrid on\r\n%%\r\n% Finally, looking at what happens if the situation becomes biased, we probably\r\n% expect that the position with the longest expected path will shift to one side\r\n% (i.e. closer to the start position). In fact, thinking about the limit where\r\n% p approaches 1 we should expect that the path length to position N-1 simply\r\n% becomes N (since the decanter will pass from k to k+1 with probability close\r\n% to 1). The path length for N-2 will probably be very close to that for N-1,\r\n% with a deviation to the right somewhere around the table (with close to zero\r\n% probability). Thus for p > 0.5 we expect the peak in the path length to move\r\n% towards position zero and for the expected path length for k = N-1 to be shorter\r\n% than for k = 1.\r\n\r\nN = 17;\r\np = 0.57;\r\n[last, nPasses] = portPassingSimulation(5e5, N, p);\r\nG = varfun(@mean, table(last, nPasses), 'GroupingVariable', 'last');\r\nplot(G.last, G.mean_nPasses, 'o')\r\ntitle  'Expected passes given position that is last (biased passing)'\r\nxlabel 'Position that receives port last'\r\nylabel 'Expected number of passes'\r\ngrid on\r\n%% And finally\r\n% The thing I've enjoyed most about exploring this problem is how simple it\r\n% is to state and yet how much complexity I've found as I progressed further.\r\n% It started out by simply asking how likely someone was to be last, yet I've\r\n% ended up learning how to solve fairly complicated recurrence relationships,\r\n% and exploring aspects of random walks that I didn't know about. In addition,\r\n% I've found that the experimental nature of the simulation, combined with the\r\n% analytic rigor of solving the mathematics of the random walk processes enlightening.\r\n% It is not common (for me at least) to be able to bring these two different sides\r\n% of MATLAB together and have simulation results help guide analytic discovery.\r\n% In a few weeks' time I'll post another article on how I used the <https:\/\/www.mathworks.com\/help\/symbolic\/ % Symbolic Math Toolbox> to deduce these expected path lengths analytically and\r\n% compare the results with the simulations. In the meantime, maybe you can take\r\n% this analysis further? One direction that I've so far ignored is to deduce the\r\n% expected form of the path length distribution. Or perhaps you have a related\r\n% random walk process that might succumb to a similar analysis? Let me know in\r\n% <https:\/\/blogs.mathworks.com\/?p=2196#respond here>.\r\n##### SOURCE END ##### 22db8d1fd51d4ca49684521b462c0bd8\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogPortPassingProblemM_14.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction-->\r\n\r\nToday's guest blogger is <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/1141841-Jos-Martin\">Jos Martin<\/a>, from the Parallel Computing group at MathWorks. Whilst normally focussing on making parallel computing both easy and fast in MATLAB, occasionally he likes to use MATLAB to explore other problem spaces. Here, he writes about an interesting random walk process he encountered that has allowed him to explore both simulation and analytic methods in MATLAB, and using each of those methods to help guide the other.\r\n\r\n<!--\/introduction-->... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2017\/02\/14\/the-perverse-port-passing-problem\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[33,34,38],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2196"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=2196"}],"version-history":[{"count":4,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2196\/revisions"}],"predecessor-version":[{"id":2239,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2196\/revisions\/2239"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=2196"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=2196"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=2196"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}