Comments on: New Ways With Random Numbers, Part II

By: Peter Perkins

Peter Perkins — Sun, 07 Aug 2011 16:36:30 +0000

Jan, this requires some details, but you asked.

The Ziggurat algorithm, most of the time, requires a random integer to determine the level, and a random uniform to determine the position within that level. The original Ziggurat algorithm as published by Marsaglia used and reused 32bits of randomness for both purposes, and that’s what SHR3CONG still does (largely to be backwards compatible). Very fast, but if you look _very_ closely, you can see a “griddiness” that’s a consequence of reusing bits. Almost certainly not a practical problem, though. The MT algorithm as implemented in MATLAB’s RandStream uses 64bits of randomness for those two values, and doesn’t reuse any bits. But the original author of MRG32K3A didn’t do that, L’Ecuyer recommended using two full U(0,1)’s, so that’s what MATLAB does, and it uses up 128 bits of randomness.

Of course, something like 1.5% of the time, the RandStream Ziggurat algorithm requires more randomness, so that’s why you don’t see _exactly_ 1 (MT) or 2(MRG32K3A) d.p. uniform per normal.

If you switch to ‘Inversion’, you’ll find that _exactly_ one uniform gets used up for each normal in all cases, but it’s slower to do that computation.

By: Jan

Jan — Fri, 05 Aug 2011 22:25:37 +0000

Hi, I have a question about the difference in generating normal (pseudo) random numbers with the mt19937ar algorithm and the mrg32k3a algorithm. For some reason the mt19937ar algorithm uses 1.0218 uniform (pseudo) random numbers and the mrg32k3a algorithm uses 2.0282. Both algorithms use the Ziggurat algorithm to tranform the uniform random numbers into normal random numbers. Can you explain the difference? Test code:

stream0 = RandStream('mt19937ar','Seed',0);
RandStream.setDefaultStream(stream0);
a=rand(3e6,1);

stream0 = RandStream('mt19937ar','Seed',0);
RandStream.setDefaultStream(stream0);
b=randn(1e6,1);
c=rand;
find(a==c)

stream1 = RandStream('mrg32k3a','Seed',0);
RandStream.setDefaultStream(stream1);
a=rand(3e6,1);

stream1 = RandStream('mrg32k3a','Seed',0);
RandStream.setDefaultStream(stream1);
b=randn(1e6,1);
c=rand;
find(a==c)

ans = 1021808 ans = 2028211

By: Jan

Jan — Fri, 05 Aug 2011 20:25:43 +0000

Consider the following Matlab code.

stream0 = RandStream('mt19937ar','Seed',0);
RandStream.setDefaultStream(stream0);
a=rand(4e6,1);

stream0 = RandStream('mt19937ar','Seed',0);
RandStream.setDefaultStream(stream0);
b=randn(1e6,1);
c=rand(1);
find(a==c)


stream1 = RandStream('mrg32k3a','Seed',0);
RandStream.setDefaultStream(stream1);
a=rand(4e6,1);

stream1 = RandStream('mrg32k3a','Seed',0);
RandStream.setDefaultStream(stream1);
b=randn(1e6,1);
c=rand(1);
find(a==c)

The output is: ans = 1021808 ans = 2028211 Why does the mt19937ar algorithm uses only 1.0218 rand values on average? The mrg32k3a uses 2.028 on average as expected.

By: Michael

Michael — Tue, 09 Mar 2010 01:22:33 +0000

Boh just forget i wrote above. That was my bad. Just lost some 5 hours on this. A couple beers too many and i havent noticed matlab had been writing my rand data to a mat file in a different directory. Meantime another part of the programme kept reading an old file of the same name placed where i was expecting new rand data to be written in at each iteration, but of cours never beeing updated with the new data at all. Anyways.

Thnanks for the info on rand!

I wish everyone all the best including seldom “matrix dimensions must agree” error.

Keep wearing them F5 buttons off!

Kind Regards.

By: Michael

Michael — Tue, 09 Mar 2010 00:23:46 +0000

Hello.
I havent read all the piecess of the article so please exuse me if i’m asking something obvious brought up already…
My Matlab 7.1.02xxx does generate uncorrelated randn results after using

randn(‘state’,sum(100*clock));

however when i use this inside a function only subsequent randn() functions inside that function generate different random numbers, as soon as another main file iteration enters the same function again these sequences repeat as if the generator state was reset and not really “listening” to changing sum(100*clock) values… Any idea if thats a know bug or some fault on my side?

This is the second time i struggle against this loosing many, many hours…

Please help.

Kind Regars
Thanks for the article

Michael

By: Nugroho

Nugroho — Mon, 01 Mar 2010 08:52:56 +0000

Dear Peter,

I am sorry for the late reply.

Thank you very much for making me think about the things at the higher level so that I do not overuse the substreams feature.

Thank you too for giving the idea on the number of available substreams.

Have a great month.

By: Peter Perkins

Peter Perkins — Fri, 19 Feb 2010 18:00:03 +0000

Mike, parfor can be tricky, and there are a couple things going on here:

1) parfor and for do not iterate the same “order”, and because your loop bodies have side effects (meaning that you’re calling rand which maintains an internal state), you won’t get the same results. The loop body in the for case gets run with i going sequentially from 1 to 10. The loop body in the parfor case gets executed in “some” order. There’s no wo way to guarantee, for example, that the 4th and 5th iterations are run on the same worker in that order. If you run the same parfor twice, or run it with a different number of workers, things change. That’s the nature of parfor. It’s also a good reason to use substreams if you want reproduceability.

2) You’re using _copies_ of the same stream on your workers, and they work separately from each other, so the first time rand is called on each worker, you’ll get the same value. In fact, that’s the whole reason for using multiple parallel streams when you want calculations on different workers to be statistically independent.

To answer your specific question: you can get the exact same calculations in a parfor that you’d get in a for by using substreams. Something like

stream = RandStream('mrg32k3a');
parfor ii = 1:10
    set(stream,'Substream',ii);
    par(ii) = rand(stream);
end

This uses (copies of) the same stream on all workers, but ties the values returned by rand to the iteration number, so regardless of whether you run this loop on one worker, or many, or change it to a for loop in serial mode, it will return the same vector. Ordinarily, you’d probably set the substream index by assigning to the property

   stream.Substream = ii;

but parfor won’t let you do that.

If you want to run this same loop again but get different (but still repeatable) results, you might:

* use a different seed to create the stream, or
* add some offset to the substream index.

Using substreams gets you reproduceability. If you don’t care about that, you most likely want to create a set of independent parallel streams using RandStream.create, and passing in labindex as the StreamIndices parameter.

By: Mike McCoy

Mike McCoy — Fri, 19 Feb 2010 02:04:13 +0000

How do we ensure stream independence when using parfor? For instance, if I do this:

% Assume a matlabpool open
stream = RandStream.create('mrg32k3a','seed',1234);
parfor ii = 1:10
    par(ii) = rand(stream);
end

stream = RandStream.create('mrg32k3a','seed',1234);

for ii = 1:10
    nopar(ii) = rand(stream);
end

Not only do I get different answers in "par" and "nopar", but the values in "par" are repeated! How do I avoid this issue while still maintaining reproducibility of my results?

By: Peter Perkins

Peter Perkins — Thu, 11 Feb 2010 14:30:57 +0000

Nugroho, I’m not sure I see the need for you to use substreams at all. You say you, “need to use the same random number seed for each design when testing their performance in each simulation trial”. Forget about seeds for the moment, and think about things at a higher level. I think what you mean by the above is that you want to use the same random numbers for each of the 20 simulations, so that you know that each of the 20 procedures was given the exact same data to work on. If that’s the case, then I see no reason why you can’t simply reset the RNG before each of 20 runs.

If for some reason the different procedures use up different numbers of random numbers, then it might make sense to use substreams, one for each iteration, but even then it would seem like you’d still want to reset the RNG to the same point before each of 20 runs.

I can’t tell for sure what you need. But I can say that it’s possible to “overuse” some of these parallel RNG features.

In any case, the number of substreams for ‘mrg32k3a’ or ‘mlfg6331_64’, as described in the documentation in the User Guide, is much larger than 20,000. It’s something like 2^51.

Hope this helps.

By: Nugroho

Nugroho — Thu, 11 Feb 2010 13:28:36 +0000

Dear Loren and Peter,

Thank you very much for your insightful posts. They are very helpful.

I have one question, is there any maximum number of substreams we can use?

I am currently writing a program to compare the performance of several procedures. The purpose of each procedure is to select the best among 20 independent alternatives. In order to make a fair comparison between the procedures, I need to use the same random number seed for each design when testing their performance in each simulation trial.

I need to independently repeat the simulation 10,000 times. As far as I can think of, the best way to have the statistical independence is to have 20 x 10,000 = 200,000 substreams. However, I am not sure whether there could be 200,000 independent substreams (or more).

Thank you for your kind attention.