How Many Times Should You Shuffle the Cards?

We say that a deck of playing cards is completely shuffled if it is impossible to predict which card is coming next when they are dealt one at a time. So a completely shuffled deck is like a good random number generator. We saw in my previous post that a perfect faro shuffle fails to completely shuffle a deck. But a riffle shuffle, with some randomness in the process, can produce complete shuffling. How many repeated riffle shuffles does that take?

"Stop Shuffling and Deal."

Contents

Background

I learned about this topic from my friend Nick Trefethen, a Professor at Oxford and a MATLAB enthusiast. In 2000 he and his father wrote a paper, "How Many Shuffles to Randomize a Deck of Cards?". They refer to work by Persi Diaconis and colleagues. Diaconis is a Professor at Stanford and a world-class magician who has written extensively about the probability involved in activities like card shuffling and coin flipping.

I want to describe some simulations that reproduce the cutoff phenomenon investigated by the Trefethens and Diaconis.

Rising Sequences

The analysis is based on the number of rising sequences in the shuffled deck. A rising sequence is a maximal subset of cards in increasing numerical order. A deck is the union of its rising seguences. Here is an example, a small deck with just eight cards.

   v = [3 1 4 5 7 2 8 6];

There are three rising sequences is this example.

   [1 2]; [3 4 5 6]; [7 8];

Signature

We define the signature of a deck to be the number of rising sequences. Here is a tricky, but elegant, way to compute the signature of a deck v. Sort the deck to find the permutation vbar that reverses v, then count the number of places where vbar has a negative first difference.

   type signature
   function sig = signature(v);
      [~,vbar] = sort(v);
      sig = sum(diff(vbar)<0)+1;
   end
   sigv = signature(v)
sigv =
     3

Let's see how this works on the example.

   [~,vbar] = sort(v)
   d = diff(vbar)<0;
   disp('d = ')
   disp(['   ', sprintf('%6.0f', d)]),
   sig = sum(d)+1
vbar =
     2     6     1     3     4     8     5     7
d = 
        0     1     0     0     0     1     0
sig =
     3

Unshuffled Decks

An unshuffled deck of n cards, fresh out of the package, is represented by the vector v = 1:n. This has one long rising sequence, so the signature is 1. If you reverse the order, each card by itself is a singleton rising sequence, so the signature of the deck is n. These are the minimum and maximum possible values for the signature.

   n = 52;
   v = 1:n;
   sig_min = signature(v)
   sig_max = signature(fliplr(v))
sig_min =
     1
sig_max =
    52

Perfect Shuffles

As we saw in my previous post, the "out" perfect shuffle keeps the first and last cards in first and last place, while the "in" perfect shuffle inserts these cards in the second and next-to-last places. Here are their permutation vectors.

   m = n/2;
   pout = reshape([1:m; m+(1:m)],[1,n]);
   pin = reshape([m+(1:m); 1:m],[1,n]);

Both shuffles produce decks with two rising sequences obtained from the two halves of the deck. So their signatures are both equal to 2.

   sig_pout = signature(v(pout))
   sig_pin = signature(v(pin))
sig_pout =
     2
sig_pin =
     2

These signatures for pout and pin are way too small; the presence of just two rising sequences indicates perfect shuffles do not produce completely shuffled results.

Random Permutations

A completely shuffled deck could be obtained by spreading the cards out on the table and then picking them up one at a time in a random order. This would be a random permutation of the integers 1:52. Let's examine the signature of random permutations. We'll generate a million samples, then normalize the histogram of the resulting signatures so that it becomes a probability distribution, $\sigma_\infty$. This is the signature distribution of completely shuffled decks. Our goal is to get close to this distribution with riffle shuffling. (In order to reduce publication time, distributions are computed separately.)

    type compute_sigma_inf
    ksamples = 1000000;
    s = zeros(ksamples,1);
    for k = 1:ksamples
       v = randperm(n);
       s(k) = signature(v);
    end
    sigma_inf = hist(s,0:n)/length(s);
    save sigma_inf sigma_inf
    load sigmas_52
    plot(0:n,sigma_inf)
    plot_details_inf

Support

The distribution $\sigma_\infty$ is centered at 26 and 27 and falls off quickly from these two points. Here are the indices and values of all probabilities greater than .005. This tells us that completely shuffled decks have 26 or 27 increasing sequences 18.4% of the time each and that fewer than 21 or more than 32 increasing sequences are very unlikely.

     support(sigma_inf,.005)
     21     22     23     24     25     26     27     28     29     30     31     32
 0.0061 0.0193 0.0480 0.0942 0.1471 0.1840 0.1841 0.1467 0.0936 0.0475 0.0196 0.0062

Riffle Shuffle

There are two sources of randomness in this model. The deck is cut at a depth determined by a random number chosen from a binomial distribution centered at its midpoint. Cards are then riffled back together. At each step the probability that the next card drops from either cut is proportional to the size of that cut.

     type riffle
    function vout = riffle(v)
    % Riffle Shuffle.
    % vout = riffle(v)

       % Cut deck roughly in half.
       n = length(v);
       m = binornd(n,0.5); % Binomial distribution

       % Interleave cuts.
       vout = zeros(1,n);
       i = 1;
       j = m+1; 
       for k = 1:n
          % Choose which cut releases next card.
          % Probability is proportional to size of cut.
          if rand*(n-k+1) < m-i+1
             vout(k) = v(i);
             i = i+1;
          else
             vout(k) = v(j);
             j = j+1;
          end
       end
    end

Here is one riffle shuffle of a fresh deck. The result almost certainly has a signature equal to 2, although the signature could be only 1 in the extremely unlikely event that the entire first cut falls in one clump before any of the second cut.

     v = riffle(1:n);
     deck_view(v)
     title(['one riffle, signature = ' int2str(signature(v))])

Here are three riffles of a fresh deck. The result is likely to have a signature equal to 8, although any smaller value is possible.

     v = riffle(riffle(riffle(1:52)));
     deck_view(v)
     title(['3 riffles, signature = ' int2str(signature(v))])
     snapnow
     close

Repeated Riffle Shuffles

Find the probability distribution generated by repeating a riffle shuffle r times. Denote this by $\sigma_r$.

     type repeated_riffles
    function sigma_r = repeated_riffles(n,r,ksamples)
       % sigma_r = repeated_riffles(n,r,ksamples)
       % Signature distribution of r repeated riffle shuffles.
       s = zeros(ksamples,1);
       for k = 1:ksamples
          v = 1:n;
          for j = 1:r
             v = riffle(v);
          end
          s(k) = signature(v);
       end
       sigma_r = hist(s,0:n)/length(s);
    end

Compute Distributions

For each value r between 0 and rmax, sample the signature $\sigma_r$ observed when a deck is subjected to r repeated riffle shuffles.

     rmax = 14;
     type compute_sigmas
     sigma = zeros(rmax+1,n+1);
     ksamples = 1000000;
     for r = 0:rmax
        sigma(r+1,:) = repeated_riffles(n,r,ksamples)
     end
     save sigmas
     load sigmas_52

For example, here is $\sigma_3$. Almost 95% of the distribution is concentrated at the maximum signature, 8. Slightly over 5% of the support has a signature in the range of 5 through 7.

     r = 3;
     plot(0:52,sigma(r+1,:),'.-')
     plot_details_r

Here is the support of $\sigma_3$.

     support(sigma(r+1,:),0)
         5         6         7         8
  0.000001  0.000424  0.050308  0.949267

Here are all of the $\sigma_r$. As expected, they appear to be heading towards $\sigma_{\infty}$.

     plot(sigma','.-')
     plot_details_sigma

Total Variation

The total variation norm is a very general concept that measures the distance between two probability distributions. In our situation it is simply the 1-norm of their difference, scaled by 1/2 so that it lies between 0 and 1.

     norm_tv = @(s1,s2)(1/2)*norm(s1-s2,1)
norm_tv = 
    @(s1,s2)(1/2)*norm(s1-s2,1)

Total Variation of Repeated Riffles

For each value r between 0 and rmax, sample the signature $\sigma_r$ observed when a deck is subjected to r repeated riffle shuffles. Compare this with the completely shuffled distribution $\sigma_{\infty}$ and compute the total variation.

     tv = zeros(rmax+1,1);
     for r = 0:rmax
        tv(r+1) = norm_tv(sigma(r+1,:), sigma_inf);
     end
     disp('tv =')
     fprintf('%12.6f\n',tv)
tv =
    1.000000
    1.000000
    1.000000
    1.000000
    1.000000
    0.923751
    0.613194
    0.333949
    0.166832
    0.084924
    0.042375
    0.021786
    0.010285
    0.004999
    0.001935

Make an animated gif.

     type make_tv_movie
     gif_frame('html/tv_movie.gif');
     for r = 0:rmax
        plot(0:n,[sigma(r+1,:); sigma_inf]','.-')
        plot_details_movie
        gif_frame(2)
     end
     clf
     gif_frame(2)

How many?

How many riffle shuffles are required to produce a completely shuffled deck? Plot the total variation as a function of the number of repetitions. The value of tv hovers very close to 1.0 for the first four shuffles, but then between five and nine shuffles it drops quickly to below 0.1. This is the cutoff phenomenon.

     plot(0:rmax,tv,'o-')
     plot_details_cutoff

It takes seven shuffles for the total variation to fall below 0.5. The conclusion is that about seven riffle shuffles are usually necessary and sufficient to produce a completely shuffled deck. Diaconis reaches this conclusion with a rigorous probabilistic analysis. Trefethen sees the cutoff with an analysis involving matrix powers and pseudospectra.

Four decks

Here are the animation and the cutoff plot for four decks, so n = 208. The distribution of the random permutations, $\sigma_{\infty}$, is more spread out. But the shape of the cutoff is the same, it just occurs three shuffles later.

The conclusion here is that about ten riffle shuffles are usually necessary and sufficient to completely shuffle four decks.

     n = 208;
     load sigmas_208
     disp('tv =')
     fprintf('%12.6f\n',tv)
tv =
    1.000000
    1.000000
    1.000000
    1.000000
    1.000000
    1.000000
    1.000000
    0.999555
    0.913595
    0.601832
    0.328962
    0.168368
    0.083952
    0.042310
    0.021535

     plot(0:rmax,tv,'o-')
     plot_details_cutoff

Acknowledgements

Thanks very much to Nick Trefethen for help with this post.

References

L. N. Trefethen and L. M. Trefethen, "How Many Shuffles to Randomize a Deck of Cards?", Proceedings: Mathematical, Physical and Engineering Sciences 456, 2000. <http://www.jstor.org/stable/2665604>. <http://people.maths.ox.ac.uk/trefethen/publication/PDF/2000_87.pdf>.

Gudbjorn F. Jonsson and Lloyd N. Trefethen, "A Numerical Analyst Looks at the Cutoff Phenomenon in Card Shuffling and Other Markov Chains", Numerical Analysis (ed. D. F. Griffiths, D. J.Higmam and G. A. Watson), Addison-Wesley, 1998. <http://people.maths.ox.ac.uk/trefethen/publication/PDF/1997_74.pdf>.

Dave Bayer and Persi Diaconis, "Trailing the Dovetail Shuffle to its Lair", Annals of Probability 2, 1992. <http://projecteuclid.org/download/pdf_1/euclid.aoap/1177005705>.

A terrific series of videos from an interview with Persi Diaconis on Numberphile.




Published with MATLAB® R2016a

|
  • print

댓글

댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.