{"id":2198,"date":"2017-02-28T15:56:41","date_gmt":"2017-02-28T20:56:41","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=2198"},"modified":"2017-03-20T02:43:44","modified_gmt":"2017-03-20T07:43:44","slug":"symbolic-analysis-of-the-port-passing-problem","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2017\/02\/28\/symbolic-analysis-of-the-port-passing-problem\/","title":{"rendered":"Symbolic Analysis of the Port Passing Problem"},"content":{"rendered":"<div class=\"content\"><!--introduction--><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#3e4a7627-ab47-49fd-8a03-df0f168ce6db\">Introduction<\/a><\/li><li><a href=\"#0324bab7-e5f3-4f99-a9fe-9e0d948f4e24\">Gambler's Ruin<\/a><\/li><li><a href=\"#5d44e4b0-d1bb-473b-8ae3-53d6617aef78\">Calculating the probability that position <tt>k<\/tt> is last<\/a><\/li><li><a href=\"#ebaa73f7-0f59-4234-8092-26cdf28adb7f\">What routes do we need to consider to compute the expected path length?<\/a><\/li><li><a href=\"#7ee2914b-5f6d-40c6-8825-30601be42c21\">Probability that the port takes route L or R<\/a><\/li><li><a href=\"#bd877ff1-eae1-43e3-a6be-3c00ec898714\">Calculating the expected path length when <tt>k<\/tt> is last<\/a><\/li><li><a href=\"#65c5321b-25b6-451a-8d2e-cae0562924c4\">What is the mean number of passes needed for a table of size N?<\/a><\/li><li><a href=\"#1a6da36d-f440-4957-8957-ec65016dd2bc\">And finally<\/a><\/li><\/ul><\/div><h4>Introduction<a name=\"3e4a7627-ab47-49fd-8a03-df0f168ce6db\"><\/a><\/h4><p>Hopefully you recall the <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=2196\">recent post<\/a> where we used a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Monte_Carlo_method\">monte-carlo technique<\/a> to simulate an infinitely-stocked port decanter passing around a table of <tt>N<\/tt> people (always starting at position 0), moving with probability <i>p<\/i> to the left and <i>q<\/i> to the right, only stopping when everyone at the table had received some port. Those simulations seemed to show that simple analytic solutions to expected path lengths might be possible and today, I'll show you what I managed to prove.<\/p><p>In the last post we decided that the functions we wanted to find are<\/p><div><ol><li>$p_k(N)$ the probability that position k is the last to receive the port<\/li><li>$E_k(N)$ the expected number of passes needed when k is the last<\/li><li>$E_N$ the expected number of passes needed for a table of size N<\/li><\/ol><\/div><h4>Gambler's Ruin<a name=\"0324bab7-e5f3-4f99-a9fe-9e0d948f4e24\"><\/a><\/h4><p>Thanks to my colleague Rick Amos, we have a starting point for thinking about an analytical solution. He pointed out that our port passing problem was very similar to the random walk process <a href=\"http:\/\/web.mit.edu\/neboat\/Public\/6.042\/randomwalks.pdf\">Gambler's Ruin<\/a>, in which the gambler starts out with some stake (<tt>k<\/tt>) and wins another unit of that stake with probability <tt>p<\/tt> (or loses with probability <tt>1-p<\/tt>). The game continues until the gambler has either lost all the money, or has a total of <tt>N<\/tt>. The major difference is that in Gambler's Ruin there are two different end conditions (winning with <tt>N<\/tt> or losing with <tt>0<\/tt>).<\/p><p>To enable us to attack the port passing problem, I'll need to quote two results from Gambler's Ruin. You can find derivations of these results in the <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/61285-port-passing-problem\">associated files<\/a> on File Exchange (see the files <tt>BiasedGamblersRuin<\/tt> and <tt>UnbiasedGamblersRuin<\/tt>). The first result allows us to answer the question: \"what is the probability, given that we are currently at position S, of arriving at position A before arriving at position B?\". In terms of Gambler's Ruin this is the same as starting with stake $k=S-B$ and finishing with $N=A-B$. To make some of the equations simpler I'll use $q=1-p$, and note we assume $A&lt;S&lt;B \\mbox{ or } B&lt;S&lt;A$.<\/p><p>$$pSAB=\\left\\lbrace \\begin{array}{cc}\\frac{k}{N} &amp; p=q=\\frac{1}{2}\\\\\\frac{1-r^k}{1-r^N}\r\n&amp; p\\ne q\\mbox{ and } r=\\frac{q}{p}\\end{array}\\right.$$<\/p><p>Similarly, we also need to answer: \"what is the expected number of passes (starting at S) to arrive at position A before arriving at position B?\".<\/p><p>$$eSAB=\\left\\lbrace \\begin{array}{cc}\\frac{N^2-k^2}{3} &amp; p=q=\\frac{1}{2}\\\\\\frac{1+r}{1-r}(N(\\frac{2}{1-r^N\r\n}-1)-k(\\frac{2}{{1-r}^k }-1)) &amp; p\\ne q\\end{array}\\right.$$<\/p><p>Note how each result has a different expression for the biased and unbiased case? Below we will run through the biased case (which has more complexity). Exactly the same code could be run with the unbiased case and the correct result obtained - if you are interested simply change <tt>pToUse<\/tt> below back to <tt>0.5<\/tt> and execute.<\/p><pre class=\"codeinput\">syms <span class=\"string\">pSAB(S,A,B)<\/span> <span class=\"string\">eSAB(S,A,B)<\/span> <span class=\"string\">p<\/span> <span class=\"string\">q<\/span> <span class=\"string\">k<\/span> <span class=\"string\">N<\/span>\r\n\r\npToUse = 0.55;\r\nassume(p == pToUse);\r\n\r\n<span class=\"comment\">% Define the functions for k and N in terms of S, A, and B.<\/span>\r\nk(S,A,B) = S-B;\r\nN(S,A,B) = A-B;\r\n\r\n<span class=\"keyword\">if<\/span> pToUse == 0.5\r\n    pSAB(S,A,B) = k\/N;\r\n    eSAB(S,A,B) = (N^2 - k^2)\/3;\r\n<span class=\"keyword\">else<\/span>\r\n    <span class=\"comment\">% In the biased case we will use r rather than p in the equations<\/span>\r\n    syms <span class=\"string\">r<\/span>\r\n    assume(r ~= 1);\r\n\r\n    pSAB(S,A,B) = (1-r^k)\/(1-r^N);\r\n    eSAB(S,A,B) = (1+r)\/(1-r) * (N*(2\/(1-r^N)-1) - k*(2\/(1-r^k)-1));\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><p>To allow us to talk about the general case where position <tt>k<\/tt> is the last to receive the port around a table of size <tt>N<\/tt> we will use position <tt>L<\/tt> to refer to a position one to the right of <tt>k<\/tt> and <tt>R<\/tt> to one to the left of <tt>k<\/tt> (note the use of modulo arithmetic to ensure $k-N&lt;R \\le 0 \\le L &lt; k$ and that the direction left is defined as an increase in position number, and right is a decrease in position number).<\/p><pre class=\"codeinput\">syms <span class=\"string\">L<\/span> <span class=\"string\">R<\/span> <span class=\"string\">N<\/span> <span class=\"string\">k<\/span>\r\nL = k-1;\r\nR = k+1-N;\r\n<\/pre><h4>Calculating the probability that position <tt>k<\/tt> is last<a name=\"5d44e4b0-d1bb-473b-8ae3-53d6617aef78\"><\/a><\/h4><p>For position <tt>k<\/tt> to be last we either need the port to move from the starting position 0 to L and then to R, or move from 0 to R and then L. This probability is given by<\/p><p>$$p_k = pSAB(0,L,R) * pSAB(L,R,k) + pSAB(0,R,L) * pSAB(R,L,k)$$<\/p><pre class=\"codeinput\">p0L = pSAB(0, L, R);\r\np0R = pSAB(0, R, L);\r\npLR = pSAB(L, R, k);\r\npRL = pSAB(R, L, k-N);\r\np0LR = p0L*pLR;\r\np0RL = p0R*pRL;\r\n\r\npk = simplify(p0LR + p0RL)\r\n<\/pre><pre class=\"codeoutput\">pk =\r\n-(r^N*(r - 1))\/(r^k*(r - r^N))\r\n<\/pre><p>In the previous blog post we showed that for the unbiased case this result was independent of position <tt>k<\/tt>, but in the biased case the position around the table affects the last probability. Let's look at the residuals from some simulations to check if this is correct (the <tt>portPassingSimulation<\/tt> code is available <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/portPassingSimulation.m\">here<\/a>).<\/p><pre class=\"codeinput\">p = pToUse;\r\nN = 17;\r\nk = 1:N-1;\r\nnSims = 1e5;\r\nresiduals = NaN(10,N-1);\r\nexpected = double(subs(pk, {<span class=\"string\">'r'<\/span> <span class=\"string\">'N'<\/span> <span class=\"string\">'k'<\/span>}, {(1-p)\/p, N, k}));\r\n<span class=\"comment\">% Simulate 10 times passing around a table of 17<\/span>\r\n<span class=\"keyword\">parfor<\/span> ii = 1:10\r\n    last = portPassingSimulation(nSims, N, p);\r\n    <span class=\"comment\">% Group simulations into k (1:N-1) bins and normalize by number of simulations<\/span>\r\n    lastProb = hist(last, k)\/nSims;\r\n    residuals(ii, :) = lastProb - expected;\r\n<span class=\"keyword\">end<\/span>\r\n<span class=\"comment\">% Are the residuals normally distributed?<\/span>\r\nprobplot(<span class=\"string\">'normal'<\/span>, residuals(:))\r\ngrid <span class=\"string\">on<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogSymbolicAnalysisM_01.png\" alt=\"\"> <p>We can see that the residuals from the simulation are roughly normally distributed (some tail points are not quite normal) but there isn't any bias from the expected results (because the mean of the residuals is close to zero).<\/p><pre class=\"codeinput\">meanR = mean(residuals(:))\r\n<\/pre><pre class=\"codeoutput\">meanR =\r\n   7.5894e-20\r\n<\/pre><h4>What routes do we need to consider to compute the expected path length?<a name=\"ebaa73f7-0f59-4234-8092-26cdf28adb7f\"><\/a><\/h4><p>To compute the expected path length, we are going to have to consider all possible paths that finally end with position <tt>k<\/tt>, work out the probability with which each path occurs and multiply by the expected length of that path. Fortunately, whilst there are infinitely many possible paths we can decompose the problem into just 2 that relate directly to Gambler's Ruin. These are the L route and R route as specified by<\/p><div><ol><li>Start at 0 and move to L (but <b>not R<\/b>) <tt>E0L<\/tt><\/li><li>Move from L to R (but <b>not k<\/b>) <tt>ELR<\/tt><\/li><li>Then either <b>(a)<\/b> move from R to k-N (but <b>not k<\/b>) <tt>ERk<\/tt><\/li><li>Or <b>(b)<\/b> move from R to L to k (but <b>not k-N<\/b>) <tt>ERLk<\/tt><\/li><\/ol><\/div><p>The alternative route R (which is basically the inverse of route L) is<\/p><div><ol><li>Start at 0 and move to R (but <b>not L<\/b>) <tt>E0R<\/tt><\/li><li>Move from R to L (but <b>not k-N<\/b>) <tt>ERL<\/tt><\/li><li>Then either <b>(a)<\/b> move from L to k (but <b>not k-N<\/b>) <tt>ELk<\/tt><\/li><li>Or <b>(b)<\/b> move from L to R to k-N (but <b>not k<\/b>) <tt>ELRk<\/tt><\/li><\/ol><\/div><p>The expected lengths of each of these parts of the path are<\/p><pre class=\"codeinput\">syms <span class=\"string\">k<\/span> <span class=\"string\">N<\/span>\r\nE0L  = eSAB(0, L, R);\r\nE0R  = eSAB(0, R, L);\r\n\r\nELR  = eSAB(L, R, k);\r\nERL  = eSAB(R, L, k-N);\r\n\r\nERk  = eSAB(R, k-N, k);\r\nELk  = eSAB(L, k, k-N);\r\n\r\nERLk = eSAB(R, k, k-N);\r\nELRk = eSAB(L, k-N, k);\r\n<\/pre><p>It is interesting to note that the expected path length <tt>eSAB<\/tt> is symmetric with respect to swapping the variables <tt>p<\/tt> and <tt>q<\/tt>.  We can see this in the above equations as <tt>ELR==ERL<\/tt>, <tt>ERk==ELk<\/tt> and <tt>ERLk==ELRk<\/tt><\/p><pre class=\"codeinput\">simplify(ELR  == ERL)\r\nsimplify(ERk  == ELk)\r\nsimplify(ELRk == ERLk)\r\n<\/pre><pre class=\"codeoutput\">ans =\r\nTRUE\r\nans =\r\nTRUE\r\nans =\r\nTRUE\r\n<\/pre><p>To combine the parts of route L and route R where there is a choice we need to include the probabilities of undertaking part 3 and 4 of L and part 3 and 4 of R. These are simply the probabilities of those paths in Gambler's Ruin.<\/p><pre class=\"codeinput\">pRk  = pSAB(R, k-N, k);\r\npLk  = pSAB(L, k, k-N);\r\n\r\npRLk = pSAB(R, k, k-N);\r\npLRk = pSAB(L, k-N, k);\r\n<\/pre><p>So finally we can combine these parts for our two routes. <tt>EL<\/tt> is the expected path for route L and <tt>ER<\/tt> the expected path length for route R.<\/p><pre class=\"codeinput\">EL = E0L + ELR + pRk.*ERk + pRLk.*ERLk;\r\nER = E0R + ERL + pLk.*ELk + pLRk.*ELRk\r\n<\/pre><pre class=\"codeoutput\">ER =\r\n((1\/r - 1)*(r + 1)*(2\/(1\/r - 1) - N*(2\/(1\/r^N - 1) + 1) + 1))\/((r - 1)*(1\/r^N - 1)) - (((2\/(r^(2 - N) - 1) + 1)*(N - 2) - (2\/(r^(1 - k) - 1) + 1)*(k - 1))*(r + 1))\/(r - 1) - ((r + 1)*(2\/(r - 1) - (N - 1)*(2\/(r^(N - 1) - 1) + 1) + 1))\/(r - 1) - (((N - 1)*(2\/(r^(N - 1) - 1) + 1) - N*(2\/(r^N - 1) + 1))*(r + 1)*(r^(N - 1) - 1))\/((r^N - 1)*(r - 1))\r\n<\/pre><p>I've only printed one result out simply to show that at the moment this looks pretty complicated! Hopefully this is going to get simpler some way down the line!<\/p><h4>Probability that the port takes route L or R<a name=\"7ee2914b-5f6d-40c6-8825-30601be42c21\"><\/a><\/h4><p>To be able to compute the overall expected path we also need to know the probability that the port takes route L or R, since<\/p><p>$$E_k = p_L E_L + p_R E_R$$<\/p><p>Fortunately, we already have the components to compute these probabilities from the analysis of finish position probabilities. The probability the we take route L is just the probability of from 0 to L to R divided by the probability that k is last. Similarly, for the route R.<\/p><pre class=\"codeinput\">pL = simplify(p0L*pLR\/pk)\r\npR = simplify(p0R*pRL\/pk)\r\n<\/pre><pre class=\"codeoutput\">pL =\r\n(r^N - r^(k + 1))\/(r^N - r^2)\r\npR =\r\n-(r*(r - r^k))\/(r^N - r^2)\r\n<\/pre><h4>Calculating the expected path length when <tt>k<\/tt> is last<a name=\"bd877ff1-eae1-43e3-a6be-3c00ec898714\"><\/a><\/h4><p>With all these results in place we can compute the expected length by adding the two possible path lengths times their probabilities.<\/p><pre class=\"codeinput\">Ek = simplify(pL.*EL + pR.*ER, 400)\r\n<\/pre><pre class=\"codeoutput\">Ek =\r\n(2*(2*N - k - 3*r^N + N*r^N - N*r^k + k*r^N + 1))\/((r^N - 1)*(r - 1)) - (k - 2*N + N*r^N + N*r^k - k*r^N)\/(r^N - 1) - 4\/(r - 1)^2 - (2*r^N*(r^N + 1)*(N - 1))\/((r - r^N)*(r^N - 1))\r\n<\/pre><p><b>Does this agree with simulation?<\/b><\/p><p>We can look to see how well <tt>Ek<\/tt> agrees with the simulated data by running the simulation and computing the mean number of passes for each finishing position. As we saw in the previous blog post, a simple way to look at the mean number of passes for each last position is by using <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/varfun.html\">varfun<\/a><\/tt> for a table, with a grouping variable.<\/p><pre class=\"codeinput\">p = pToUse;\r\nN = 17;\r\nk = 1:N-1;\r\n[last,nPass] = portPassingSimulation(1e6, N, p);\r\nt = varfun(@mean, table(last, nPass), <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'GroupingVariable'<\/span>, <span class=\"string\">'last'<\/span>)\r\n<\/pre><pre class=\"codeoutput\">t = \r\n    last    GroupCount    mean_nPass\r\n    ____    __________    __________\r\n     1            9291    95.262    \r\n     2           11379    111.75    \r\n     3           13888    123.74    \r\n     4           17016    130.06    \r\n     5           20845    135.69    \r\n     6           25541    136.47    \r\n     7           31030    136.05    \r\n     8           38271     133.8    \r\n     9           46659    130.33    \r\n    10           56701    125.85    \r\n    11           69383    119.94    \r\n    12           84602    112.84    \r\n    13        1.04e+05    106.39    \r\n    14      1.2629e+05    98.765    \r\n    15      1.5536e+05    90.384    \r\n    16      1.8975e+05    82.086    \r\n<\/pre><p>Plotting these against the analytical results gives<\/p><pre class=\"codeinput\">cMean = double(subs(Ek, {<span class=\"string\">'r'<\/span> <span class=\"string\">'N'<\/span> <span class=\"string\">'k'<\/span>}, {(1-p)\/p N k}));\r\nplot(t.last, t.mean_nPass, <span class=\"string\">'ro'<\/span>);\r\nhold <span class=\"string\">on<\/span>\r\nplot(k, cMean, <span class=\"string\">'.-'<\/span>)\r\ngrid <span class=\"string\">on<\/span>\r\nhold <span class=\"string\">off<\/span>\r\ntitle <span class=\"string\">'Simulated vs expected mean number of passes'<\/span>\r\nxlabel <span class=\"string\">'Position port received last at the table'<\/span>\r\nylabel <span class=\"string\">'Mean number of passes to reach last position at table'<\/span>\r\nlegend(<span class=\"string\">'Simulated results'<\/span>, <span class=\"string\">'Expected result'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogSymbolicAnalysisM_02.png\" alt=\"\"> <p>And lastly the difference in the mean and expected mean is almost zero<\/p><pre class=\"codeinput\">mean(cMean(:) - t.mean_nPass(:))\r\n<\/pre><pre class=\"codeoutput\">ans =\r\n    -0.041108\r\n<\/pre><p>The average of the residuals is close to zero showing the simulated data are in agreement with the expression.<\/p><h4>What is the mean number of passes needed for a table of size N?<a name=\"65c5321b-25b6-451a-8d2e-cae0562924c4\"><\/a><\/h4><p>Having deduced the expected number of passes needed when the last position is <tt>k<\/tt>, we can now, finally, compute the expected path length for the table of size <tt>N<\/tt> because we can simply sum the expected path for each of the possible k positions times the probability that position k is the last.<\/p><p>$$E_N =\\sum_{k=1}^{N-1} p_k E_k$$<\/p><pre class=\"codeinput\">syms <span class=\"string\">k<\/span> <span class=\"string\">N<\/span> <span class=\"string\">r<\/span>\r\nEN = simplify(symsum(pk*Ek, k, 1, N-1), 1000)\r\n<\/pre><pre class=\"codeoutput\">EN =\r\n(N^2*(r + 1))\/((r^N - 1)*(r - 1)) - ((r + 1)*(N + r - N*r))\/(r - 1)^2 + (r*(N - 1)^2*(r + 1))\/((r - r^N)*(r - 1))\r\n<\/pre><p>Thankfully this looks a lot simpler than some of the other results earlier.<\/p><p><b>Check against a known analytical solution<\/b><\/p><p>Before continuing we can check this result in a couple of ways. Firstly, let's check specifically against the <tt>N==3<\/tt> case where we already know the analytic solution (see <tt>SymbolicAnalysisFor3People.mlx<\/tt>)<\/p><p>$$E_3 =\\frac{3}{q^2 -q+1}-1$$<\/p><p>Substituting in a value for N and equation for r<\/p><pre class=\"codeinput\">E3 = simplify(subs(EN, {N, r}, {3 q\/(1-q)}), 400)\r\n<\/pre><pre class=\"codeoutput\">E3 =\r\n3\/(q^2 - q + 1) - 1\r\n<\/pre><p>Alternatively, we can check the results against some simulations for different values of N<\/p><pre class=\"codeinput\">p = pToUse;\r\ntableSize = 5:3:37;\r\nsim = NaN(size(tableSize));\r\n<span class=\"keyword\">parfor<\/span> ii = 1:numel(tableSize)\r\n    N = tableSize(ii);\r\n    [~,nPass] = portPassingSimulation(1e5, N, p);\r\n    sim(ii) = mean(nPass);\r\n<span class=\"keyword\">end<\/span>\r\nplot(tableSize, sim, <span class=\"string\">'ro'<\/span>)\r\ngrid <span class=\"string\">on<\/span>\r\nhold <span class=\"string\">on<\/span>\r\nN = min(tableSize):max(tableSize);\r\nexpected = double(subs(EN, {<span class=\"string\">'r'<\/span> <span class=\"string\">'N'<\/span>}, {(1-p)\/p N}));\r\nplot(N, expected);\r\nhold <span class=\"string\">off<\/span>\r\ntitle <span class=\"string\">'Expected passes to finish random walk'<\/span>\r\nxlabel <span class=\"string\">'Number of people at the table'<\/span>\r\nylabel <span class=\"string\">'Number of passes'<\/span>\r\nlegend(<span class=\"string\">'Simulated results'<\/span>, <span class=\"string\">'Expected result'<\/span>, <span class=\"string\">'Location'<\/span>, <span class=\"string\">'N'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogSymbolicAnalysisM_03.png\" alt=\"\"> <p>Writing the biased and unbiased results out in full, we get<\/p><p>$$E_N =\\left\\lbrace \\begin{array}{cc}\\frac{N(N-1)}{2} &amp; p=q=\\frac{1}{2}\\\\\\frac{r+1}{r-1}(\\frac{N^2}{r^N\r\n-1}-\\frac{{(N-1)}^2 }{r^{N-1} -1}+\\frac{Nr-N-r}{r-1})&amp; p\\ne q\\end{array}\\right.$$<\/p><p>Written this way, it is quite complicated, but there is a symmetry to it. A term in N, the same term in N-1 and another term. All-in-all distinctly simpler than <tt>ER<\/tt> that I printed out further up the page. And with that we have produced analytic solutions for all the values we wanted to investigate - perhaps it is now time to pass the port!<\/p><h4>And finally<a name=\"1a6da36d-f440-4957-8957-ec65016dd2bc\"><\/a><\/h4><p>Manipulation of the fairly complicated algebraic expression that resulted from joining all the expected paths with probabilities together was something I thought would be really hard when done by hand. Yet I think you will agree using the <a href=\"https:\/\/www.mathworks.com\/help\/symbolic\/index.html\">Symbolic Math Toolbox<\/a> has made it tractable. I'm also impressed with its ability to solve recurrence relationships (see the associated files that prove the initial definitions of <tt>pSAB<\/tt> and <tt>eSAB<\/tt>).<\/p><p>From the article it might look like I've done all the work here myself, so I'd like to thank both <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/2826662-rick-amos\">Rick Amos<\/a> (MathWorks) and <a title=\"https:\/\/www.dur.ac.uk\/ecs\/profiles\/?id=15258 (link no longer works)\">Barnaby Martin<\/a> (Durham University) for helping out with many of the details.<\/p><p>Having seen these related blog posts, what problems might you now attack using simulation to guide your mathematical approach? Can you see opportunities to bring statistics and symbolic manipulation together? Let me know <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=2198#respond\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_38de0bc0164b4f07aa9f5c1974d43859() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='38de0bc0164b4f07aa9f5c1974d43859 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 38de0bc0164b4f07aa9f5c1974d43859';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2017 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_38de0bc0164b4f07aa9f5c1974d43859()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2016b<br><\/p><\/div><!--\r\n38de0bc0164b4f07aa9f5c1974d43859 ##### SOURCE BEGIN #####\r\n%% Symbolic Analysis of the Port Passing Problem\r\n%% Introduction\r\n% Hopefully you recall the <https:\/\/blogs.mathworks.com\/loren\/?p=2196 recent\r\n% post> where we used a <https:\/\/en.wikipedia.org\/wiki\/Monte_Carlo_method\r\n% monte-carlo technique> to simulate an infinitely-stocked port decanter\r\n% passing around a table of |N| people (always starting at position 0),\r\n% moving with probability _p_ to the left and _q_ to the right, only\r\n% stopping when everyone at the table had received some port. Those\r\n% simulations seemed to show that simple analytic solutions to expected\r\n% path lengths might be possible and today, I'll show you what I managed to\r\n% prove.\r\n% \r\n% In the last post we decided that the functions we wanted to find are\r\n% \r\n% # $p_k(N)$ the probability that position k is the last to receive the port\r\n% # $E_k(N)$ the expected number of passes needed when k is the last \r\n% # $E_N$ the expected number of passes needed for a table of size N\r\n%% Gambler's Ruin\r\n% Thanks to my colleague Rick Amos, we have a starting point for thinking about \r\n% an analytical solution. He pointed out that our port passing problem was very \r\n% similar to the random walk process <http:\/\/web.mit.edu\/neboat\/Public\/6.042\/randomwalks.pdf \r\n% Gambler's Ruin>, in which the gambler starts out with some stake (|k|) and wins \r\n% another unit of that stake with probability |p| (or loses with probability |1-p|). \r\n% The game continues until the gambler has either lost all the money, or has a \r\n% total of |N|. The major difference is that in Gambler's Ruin there are two different \r\n% end conditions (winning with |N| or losing with |0|).\r\n% \r\n% To enable us to attack the port passing problem, I'll need to quote two \r\n% results from Gambler's Ruin. You can find derivations of these results in the \r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/61285-port-passing-problem \r\n% associated files> on File Exchange (see the files |BiasedGamblersRuin| and |UnbiasedGamblersRuin|). \r\n% The first result allows us to answer the question: \"what is the probability, \r\n% given that we are currently at position S, of arriving at position A before \r\n% arriving at position B?\". In terms of Gambler's Ruin this is the same as starting \r\n% with stake $k=S-B$ and finishing with $N=A-B$. To make some of the equations \r\n% simpler I'll use $q=1-p$, and note we assume $A<S<B \\mbox{ or } B<S<A$.\r\n% \r\n% $$pSAB=\\left\\lbrace \\begin{array}{cc}\\frac{k}{N} & p=q=\\frac{1}{2}\\\\\\frac{1-r^k}{1-r^N} \r\n% & p\\ne q\\mbox{ and } r=\\frac{q}{p}\\end{array}\\right.$$\r\n% \r\n% Similarly, we also need to answer: \"what is the expected number of passes \r\n% (starting at S) to arrive at position A before arriving at position B?\".\r\n% \r\n% \r\n% \r\n% $$eSAB=\\left\\lbrace \\begin{array}{cc}\\frac{N^2-k^2}{3} & p=q=\\frac{1}{2}\\\\\\frac{1+r}{1-r}(N(\\frac{2}{1-r^N \r\n% }-1)-k(\\frac{2}{{1-r}^k }-1)) & p\\ne q\\end{array}\\right.$$\r\n% \r\n% Note how each result has a different expression for the biased and unbiased \r\n% case? Below we will run through the biased case (which has more complexity). \r\n% Exactly the same code could be run with the unbiased case and the correct result \r\n% obtained - if you are interested simply change |pToUse| below back to |0.5| \r\n% and execute.\r\n\r\nsyms pSAB(S,A,B) eSAB(S,A,B) p q k N\r\n\r\npToUse = 0.55;\r\nassume(p == pToUse);\r\n\r\n% Define the functions for k and N in terms of S, A, and B.\r\nk(S,A,B) = S-B;\r\nN(S,A,B) = A-B;\r\n\r\nif pToUse == 0.5\r\n    pSAB(S,A,B) = k\/N;\r\n    eSAB(S,A,B) = (N^2 - k^2)\/3;\r\nelse\r\n    % In the biased case we will use r rather than p in the equations\r\n    syms r\r\n    assume(r ~= 1);\r\n\r\n    pSAB(S,A,B) = (1-r^k)\/(1-r^N);\r\n    eSAB(S,A,B) = (1+r)\/(1-r) * (N*(2\/(1-r^N)-1) - k*(2\/(1-r^k)-1));\r\nend\r\n%% \r\n% To allow us to talk about the general case where position |k| is the last \r\n% to receive the port around a table of size |N| we will use position |L| to refer \r\n% to a position one to the right of |k| and |R| to one to the left of |k| (note \r\n% the use of modulo arithmetic to ensure $k-N<R \\le 0 \\le L < k$ and that the \r\n% direction left is defined as an increase in position number, and right is a \r\n% decrease in position number).\r\n\r\nsyms L R N k\r\nL = k-1;\r\nR = k+1-N;\r\n%% Calculating the probability that position |k| is last\r\n% For position |k| to be last we either need the port to move from the starting \r\n% position 0 to L and then to R, or move from 0 to R and then L. This probability \r\n% is given by\r\n% \r\n% $$p_k = pSAB(0,L,R) * pSAB(L,R,k) + pSAB(0,R,L) * pSAB(R,L,k)$$\r\n\r\np0L = pSAB(0, L, R);\r\np0R = pSAB(0, R, L);\r\npLR = pSAB(L, R, k);\r\npRL = pSAB(R, L, k-N);\r\np0LR = p0L*pLR;\r\np0RL = p0R*pRL;\r\n\r\npk = simplify(p0LR + p0RL)\r\n%% \r\n% In the previous blog post we showed that for the unbiased case this result \r\n% was independent of position |k|, but in the biased case the position around \r\n% the table affects the last probability. Let's look at the residuals from some \r\n% simulations to check if this is correct (the |portPassingSimulation| code is \r\n% available <https:\/\/blogs.mathworks.com\/images\/loren\/2017\/portPassingSimulation.m \r\n% here>).\r\n\r\np = pToUse;\r\nN = 17;\r\nk = 1:N-1;\r\nnSims = 1e5;\r\nresiduals = NaN(10,N-1);\r\nexpected = double(subs(pk, {'r' 'N' 'k'}, {(1-p)\/p, N, k}));\r\n% Simulate 10 times passing around a table of 17\r\nparfor ii = 1:10\r\n    last = portPassingSimulation(nSims, N, p);\r\n    % Group simulations into k (1:N-1) bins and normalize by number of simulations\r\n    lastProb = hist(last, k)\/nSims;\r\n    residuals(ii, :) = lastProb - expected;\r\nend\r\n% Are the residuals normally distributed?\r\nprobplot('normal', residuals(:))\r\ngrid on\r\n%% \r\n% We can see that the residuals from the simulation are roughly normally \r\n% distributed (some tail points are not quite normal) but there isn't any bias \r\n% from the expected results (because the mean of the residuals is close to zero).\r\n\r\nmeanR = mean(residuals(:))\r\n%% What routes do we need to consider to compute the expected path length?\r\n% To compute the expected path length, we are going to have to consider all \r\n% possible paths that finally end with position |k|, work out the probability \r\n% with which each path occurs and multiply by the expected length of that path. \r\n% Fortunately, whilst there are infinitely many possible paths we can decompose \r\n% the problem into just 2 that relate directly to Gambler's Ruin. These are the \r\n% L route and R route as specified by\r\n% \r\n% # Start at 0 and move to L (but *not R*) |E0L|\r\n% # Move from L to R (but *not k*) |ELR|\r\n% # Then either *(a)* move from R to k-N (but *not k*) |ERk|\r\n% # Or *(b)* move from R to L to k (but *not k-N*) |ERLk|\r\n% \r\n% The alternative route R (which is basically the inverse of route L) is\r\n% \r\n% # Start at 0 and move to R (but *not L*) |E0R|\r\n% # Move from R to L (but *not k-N*) |ERL|\r\n% # Then either *(a)* move from L to k (but *not k-N*) |ELk|\r\n% # Or *(b)* move from L to R to k-N (but *not k*) |ELRk|\r\n% \r\n% The expected lengths of each of these parts of the path are\r\n\r\nsyms k N\r\nE0L  = eSAB(0, L, R);\r\nE0R  = eSAB(0, R, L);\r\n\r\nELR  = eSAB(L, R, k);\r\nERL  = eSAB(R, L, k-N);\r\n\r\nERk  = eSAB(R, k-N, k);\r\nELk  = eSAB(L, k, k-N);\r\n\r\nERLk = eSAB(R, k, k-N);\r\nELRk = eSAB(L, k-N, k);\r\n%% \r\n% It is interesting to note that the expected path length |eSAB| is symmetric \r\n% with respect to swapping the variables |p| and |q|.  We can see this in the \r\n% above equations as |ELR==ERL|, |ERk==ELk| and |ERLk==ELRk|\r\n\r\nsimplify(ELR  == ERL)\r\nsimplify(ERk  == ELk)\r\nsimplify(ELRk == ERLk)\r\n%% \r\n% To combine the parts of route L and route R where there is a choice we \r\n% need to include the probabilities of undertaking part 3 and 4 of L and part \r\n% 3 and 4 of R. These are simply the probabilities of those paths in Gambler's \r\n% Ruin.\r\n\r\npRk  = pSAB(R, k-N, k);\r\npLk  = pSAB(L, k, k-N);\r\n\r\npRLk = pSAB(R, k, k-N);\r\npLRk = pSAB(L, k-N, k);\r\n%% \r\n% So finally we can combine these parts for our two routes. |EL| is the \r\n% expected path for route L and |ER| the expected path length for route R.\r\n\r\nEL = E0L + ELR + pRk.*ERk + pRLk.*ERLk;\r\nER = E0R + ERL + pLk.*ELk + pLRk.*ELRk\r\n%% \r\n% I've only printed one result out simply to show that at the moment this \r\n% looks pretty complicated! Hopefully this is going to get simpler some way down \r\n% the line!\r\n%% Probability that the port takes route L or R\r\n% To be able to compute the overall expected path we also need to know the probability \r\n% that the port takes route L or R, since \r\n% \r\n% $$E_k = p_L E_L + p_R E_R$$\r\n% \r\n% Fortunately, we already have the components to compute these probabilities \r\n% from the analysis of finish position probabilities. The probability the we take \r\n% route L is just the probability of from 0 to L to R divided by the probability \r\n% that k is last. Similarly, for the route R.\r\n\r\npL = simplify(p0L*pLR\/pk)\r\npR = simplify(p0R*pRL\/pk)\r\n\r\n%% Calculating the expected path length when |k| is last\r\n% With all these results in place we can compute the expected length by adding \r\n% the two possible path lengths times their probabilities.\r\n\r\nEk = simplify(pL.*EL + pR.*ER, 400)\r\n%% \r\n% *Does this agree with simulation?*\r\n% \r\n% We can look to see how well |Ek| agrees with the simulated data by running \r\n% the simulation and computing the mean number of passes for each finishing position. \r\n% As we saw in the previous blog post, a simple way to look at the mean number \r\n% of passes for each last position is by using |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/varfun.html \r\n% varfun>| for a table, with a grouping variable.\r\n\r\np = pToUse;\r\nN = 17;\r\nk = 1:N-1;\r\n[last,nPass] = portPassingSimulation(1e6, N, p);\r\nt = varfun(@mean, table(last, nPass), ...\r\n    'GroupingVariable', 'last')\r\n%% \r\n% Plotting these against the analytical results gives\r\n\r\ncMean = double(subs(Ek, {'r' 'N' 'k'}, {(1-p)\/p N k}));\r\nplot(t.last, t.mean_nPass, 'ro');\r\nhold on\r\nplot(k, cMean, '.-')\r\ngrid on\r\nhold off\r\ntitle 'Simulated vs expected mean number of passes'\r\nxlabel 'Position port received last at the table'\r\nylabel 'Mean number of passes to reach last position at table' \r\nlegend('Simulated results', 'Expected result')\r\n%% \r\n% And lastly the difference in the mean and expected mean is almost zero\r\n\r\nmean(cMean(:) - t.mean_nPass(:))\r\n%% \r\n% The average of the residuals is close to zero showing the simulated data \r\n% are in agreement with the expression.\r\n%% What is the mean number of passes needed for a table of size N?\r\n% Having deduced the expected number of passes needed when the last position \r\n% is |k|, we can now, finally, compute the expected path length for the table \r\n% of size |N| because we can simply sum the expected path for each of the possible \r\n% k positions times the probability that position k is the last.\r\n% \r\n% $$E_N =\\sum_{k=1}^{N-1} p_k E_k$$\r\n\r\nsyms k N r\r\nEN = simplify(symsum(pk*Ek, k, 1, N-1), 1000)\r\n%% \r\n% Thankfully this looks a lot simpler than some of the other results earlier. \r\n% \r\n% *Check against a known analytical solution*\r\n% \r\n% Before continuing we can check this result in a couple of ways. Firstly, \r\n% let's check specifically against the |N==3| case where we already know the analytic \r\n% solution (see |SymbolicAnalysisFor3People.mlx|)\r\n% \r\n% $$E_3 =\\frac{3}{q^2 -q+1}-1$$\r\n% \r\n% Substituting in a value for N and equation for r\r\n\r\nE3 = simplify(subs(EN, {N, r}, {3 q\/(1-q)}), 400)\r\n%% \r\n% Alternatively, we can check the results against some simulations for different \r\n% values of N\r\n\r\np = pToUse;\r\ntableSize = 5:3:37;\r\nsim = NaN(size(tableSize));\r\nparfor ii = 1:numel(tableSize)\r\n    N = tableSize(ii);\r\n    [~,nPass] = portPassingSimulation(1e5, N, p);\r\n    sim(ii) = mean(nPass);\r\nend\r\nplot(tableSize, sim, 'ro')\r\ngrid on\r\nhold on\r\nN = min(tableSize):max(tableSize);\r\nexpected = double(subs(EN, {'r' 'N'}, {(1-p)\/p N}));\r\nplot(N, expected); \r\nhold off\r\ntitle 'Expected passes to finish random walk'\r\nxlabel 'Number of people at the table'\r\nylabel 'Number of passes'\r\nlegend('Simulated results', 'Expected result', 'Location', 'N')\r\n%% \r\n% Writing the biased and unbiased results out in full, we get\r\n% \r\n% \r\n% \r\n% $$E_N =\\left\\lbrace \\begin{array}{cc}\\frac{N(N-1)}{2} & p=q=\\frac{1}{2}\\\\\\frac{r+1}{r-1}(\\frac{N^2}{r^N \r\n% -1}-\\frac{{(N-1)}^2 }{r^{N-1} -1}+\\frac{Nr-N-r}{r-1})& p\\ne q\\end{array}\\right.$$\r\n% \r\n% Written this way, it is quite complicated, but there is a symmetry to it. \r\n% A term in N, the same term in N-1 and another term. All-in-all distinctly simpler \r\n% than |ER| that I printed out further up the page. And with that we have produced \r\n% analytic solutions for all the values we wanted to investigate - perhaps it \r\n% is now time to pass the port!\r\n%% And finally\r\n% Manipulation of the fairly complicated algebraic expression that resulted \r\n% from joining all the expected paths with probabilities together was something \r\n% I thought would be really hard when done by hand. Yet I think you will agree \r\n% using the <https:\/\/www.mathworks.com\/help\/symbolic\/index.html Symbolic Math \r\n% Toolbox> has made it tractable. I'm also impressed with its ability to solve \r\n% recurrence relationships (see the associated files that prove the initial definitions \r\n% of |pSAB| and |eSAB|). \r\n% \r\n% From the article it might look like I've done all the work here myself, \r\n% so I'd like to thank both <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/2826662-rick-amos \r\n% Rick Amos> (MathWorks) and <https:\/\/www.dur.ac.uk\/ecs\/profiles\/?id=15258 Barnaby \r\n% Martin> (Durham University) for helping out with many of the details.\r\n% \r\n% Having seen these related blog posts, what problems might you now attack\r\n% using simulation to guide your mathematical approach? Can you see\r\n% opportunities to bring statistics and symbolic manipulation together? Let\r\n% me know <https:\/\/blogs.mathworks.com\/loren\/?p=2198#respond here>.\r\n% \r\n%\r\n##### SOURCE END ##### 38de0bc0164b4f07aa9f5c1974d43859\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/blogSymbolicAnalysisM_03.png\" onError=\"this.style.display ='none';\" \/><\/div><p>ContentsIntroductionGambler's RuinCalculating the probability that position k is lastWhat routes do we need to consider to compute the expected path length?Probability that the port takes route L or... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2017\/02\/28\/symbolic-analysis-of-the-port-passing-problem\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[38],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2198"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=2198"}],"version-history":[{"count":4,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2198\/revisions"}],"predecessor-version":[{"id":2247,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2198\/revisions\/2247"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=2198"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=2198"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=2198"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}