New Ways With Random Numbers, Part I
Today I'd like to introduce a guest blogger, Peter Perkins, who is a statistical software developer here at The MathWorks. In his life prior to The MathWorks, he worked as a statistician doing marine mammal research for the U.S. National Marine Fisheries Service, and one of his interests in MATLAB is random number generation and Monte-Carlo simulation.
Contents
Almost everyone who's used MATLAB has used the rand or randn functions at one time or another. The way people use them varies widely, though. Sometimes, you might just want to fill in a vector or matrix with values other than 0 or 1, in order to try out some new lines of code that you've written. At the other end of the spectrum, you might be running a Monte-Carlo simulation where the statistical properties of the pseudo-random values matter.
But regardless of the context, one of the questions I consistently hear is, "How and when should I initialize or reinitialize or reset MATLAB's random number generator?" The answer I usually give is, annoyingly, another question: "Why do you want to do that?" That's because messing with the state of a random number generator can affect the statistical properties of the values it generates, and the right answer to the original question depends on the context - quite often, the answer is "Don't." There's a lot that could be said about the when and the why, and I'll touch on that. But mostly, I want to show some of the how that's new in MATLAB R2008b.
If you want to run the code that I used to create this blog entry, you'll need R2008b or later, and you'll want to start with a fresh MATLAB session, or else execute these two commands (don't worry, I'll explain what they do).
mtstream = RandStream('mt19937ar');
RandStream.setDefaultStream(mtstream);
Random Streams in MATLAB
New in MATLAB R2008b is the addition of random number streams. A random number stream is just what it sounds like: a source for a sequence of random values. Usually, people talk about streams of values from a standard uniform distribution, and that's the sense that MATLAB uses too. Actually, MATLAB has always had random number streams in the abstract, underneath rand and randn. But in R2008b, streams are something you can interact with directly, and you can even create your own streams. There's lots to talk about there, but this article is about how to initialize or reset a stream, so what follows is just a brief summary. There's lots more in MATLAB's documentation.
Quite often, you don't need to worry about streams at all; MATLAB creates one for you when it starts up. rand, randn, and the new randi function draw values from that stream, and you can generate random numbers without ever knowing or caring what stream they come from. And there's nothing wrong with that - if the random number generators in MATLAB are doing their job correctly (and in R2008b, you have a choice among three generators based on state-of-the-art algorithms), you should be able to just generate random numbers using rand, randn or randi, and treat everything they return as independent random values.
But how do you go back and reproduce the random values you generate or otherwise control how they're generated? There's a new way to do that: you interact with a new kind of object in MATLAB, one that represents a random stream. Let's get the stream that MATLAB creates at startup.
defaultStream = RandStream.getDefaultStream()
defaultStream = mt19937ar random stream (current default) Seed: 0 RandnAlg: Ziggurat
The "mt19937ar" indicates that this stream generates values using the most common of the Mersenne Twister algorithms, the same algorithm that's been underneath rand for several releases. There are other generator choices; more about that in a moment. MATLAB created the stream using a seed of 0; more about that in a moment as well.
What's RandnAlg? One of the things that's new in R2008b is that randn draws values from the same stream as rand, although of course the uniform values have to be transformed to a standard normal distribution. The stream shown here makes that transformation using the Ziggurat algorithm, although you can change it (sorry, no more about that; you'll have to read the doc).
Finally, the display for this stream shows that it is the default stream. That just means that it's the stream that rand, randn, and randi draw their values from. Below, I'll show how you can create your own stream, and make it the default. There are a couple of reasons why you might want to do that. But first, let's see how to use the default stream to reproduce results.
Resetting a Random Stream
Sometimes it's useful to be able to reproduce the output from MATLAB's random number generators. For example, you might want to rerun a Monte-Carlo simulation repeatedly, using exactly the same random values because you are debugging the code. In fact, MATLAB initializes the default random stream to the same internal state each time it starts up, so the same calls to rand, randn, and randi will return the same "random" values every you start MATLAB (unless you take steps - more about that in a minute). But it would be tedious to have to restart MATLAB each time you wanted to rerun a simulation.
The simplest way to reproduce random number generator output is to reset the default stream, which puts that stream back to the state it had when initially created.
reset(defaultStream) z1 = randn(1,5)
z1 = 0.53767 1.8339 -2.2588 0.86217 0.31877
If I do it again, I'll get the same values again.
reset(defaultStream) z2 = randn(1,5)
z2 = 0.53767 1.8339 -2.2588 0.86217 0.31877
While simple, reset is kind of a blunt instrument, and provides reproducibility only on a very gross scale - it's as if you restarted MATLAB. It's not something you should do unless you really want to reuse the same random sequence over again.
Seeding a Stream
All pseudo-random number generators have the notion of a seed. Think of a stream as a very long sequence of values, wrapped into a circle, because it repeats when exhausted. Think of a seed as a starting position on the circle that you specify when you create a stream. There are (usually) many more random values than there are seeds, so it's not possible to start just anywhere by specifying a seed. And there's usually no simple description of where, say, the seed 0 lies in relation to, say, the seeds 1 or 2 other than knowing that they represent different starting positions. But if you create a stream with the same seed, you'll get the same sequence of values every time.
When MATLAB starts, it creates the default stream with commands equivalent to
stream0 = RandStream('mt19937ar','Seed',0) RandStream.setDefaultStream(stream0);
stream0 = mt19937ar random stream Seed: 0 RandnAlg: Ziggurat
We saw that earlier. If I create a new stream with a different seed, such as
stream1 = RandStream('mt19937ar','Seed',1)
stream1 = mt19937ar random stream Seed: 1 RandnAlg: Ziggurat
and make it the default stream,
RandStream.setDefaultStream(stream1);
I'll get a different sequence of random values.
randn(1,5)
ans = -0.64901 1.1812 -0.75845 -1.1096 -0.84555
But if I always use the same seed, I'll always get the same random number stream. This is a lot like resetting the current default stream - in fact, reset is nothing more than reseeding. The main difference between how I used reset above and here is that here, I've created my own stream, so I know exactly what I'm getting. Before, I'd get repeatability, but the specific random values depended on what generator algorithm was underneath the current default stream.
Reseeding is kind of a big hammer - you would not want to create new streams every time you needed a set of random values. It's most useful if you use it as an initialization step, perhaps at MATLAB startup, perhaps before running a simulation.
Using Substreams for Reproducibility
We've seen that a drawback of using reset or 'Seed' to reproduce results is that you essentially have to start over from the beginning. A new feature in R2008b, substreams, provides a new way around that drawback.
Remember the circle? Substreams are nothing more than fixed "checkpoints" in a random number stream, defined by their position around the circle, spaced evenly at very large intervals. The substreams themselves are indexed beginning at 1, and you can jump to the beginning of a substream by setting a stream's Substream property. Values generated from different substreams are statistically independent in the same sense as consecutive values in a stream are.
Two new generator algorithms in R2008b support substreams: the Combined Multiple Recursive ('mrg32k3a') and the Multiplicative Lagged Fibonacci ('mlfg6331_64') generators (the doc has references). I'll create one of the former, and make it the default stream.
stream = RandStream('mrg32k3a');
RandStream.setDefaultStream(stream);
To reposition a stream to the beginning of a particular substream, I just set its Substream property.
stream.Substream = 2;
Much like seeds, substreams are nothing more than aliases for particular points along a sequence of random values. So what's different about them? There are some differences that I'll get into next time, but for now, the most obvious difference is that you can jump between substreams without having to create a new stream. It's a fairly lightweight operation.
Let's see what happens if I position a random number stream to the beginning of a substream before each iteration of a loop.
for i = 1:5 stream.Substream = i; z = rand(1,i) end
z = 0.72701 z = 0.5582 0.85273 z = 0.16662 0.29236 0.77278 z = 0.34773 0.38864 0.80161 0.95025 z = 0.56709 0.68377 0.29879 0.59318 0.69247
Now I can reproduce results from the 3rd iteration simply by repositioning the stream to the corresponding substream. I didn't have to rerun the whole loop, and I didn't even have to know beforehand that I wanted to look at this iteration.
i = 3; stream.Substream = i; z = rand(1,i)
z = 0.16662 0.29236 0.77278
If that loop were a time-consuming Monte-Carlo simulation, it would be a real boon to be able to go recreate and investigate any specific iteration without rerunning the whole thing.
Keep in mind that you don't need to overuse substreams. They're there to use, you don't have to worry about "using up" all the values in each substream before moving to the next one, but it would be pointless to take the other extreme and jump to a different substream every time you generate a new value. Substreams don't add randomness, they just make it easier to reproduce values.
Saving and Restoring State
rand and randn are pseudorandom number generators, and all pseudorandom generator algorithms have an internal state that determines what the next value will be. It's a deterministic algorithm. Remember the circle? Each point along the circle corresponds to a different state. But for almost all modern algorithms, the state is more complicated than just the last number generated or a simple index of the current position in the stream.
Let's get the state of the default generator. To do that, I'll read the State property.
defaultStream = RandStream.getDefaultStream()
savedState = defaultStream.State;
whos savedState
defaultStream = mrg32k3a random stream (current default) Seed: 0 RandnAlg: Ziggurat Name Size Bytes Class Attributes savedState 12x1 48 uint32
If I call rand to generate a few random values,
u1 = rand(1,5)
u1 = 0.83911 0.51069 0.84469 0.68938 0.23777
and then I restore the state that I saved,
defaultStream.State = savedState;
rand will generate those exact same values the next time I call it.
u2 = rand(1,5)
u2 = 0.83911 0.51069 0.84469 0.68938 0.23777
The same is also true for randn and for randi. And because all three functions draw values from the same default stream, there's only one state to worry about - the default stream's state. I can use the state I saved earlier to reproduce results from randn, for example.
defaultStream.State = savedState; z1 = randn(1,5)
z1 = 0.048564 0.87031 0.52365 0.095609 -0.79233
defaultStream.State = savedState; z2 = randn(1,5)
z2 = 0.048564 0.87031 0.52365 0.095609 -0.79233
It's important to remember that because there's a single default stream, calling one function affects what specific values the others return.
savedState = defaultStream.State; u1 = rand(1,5) u2 = rand(1,5)
u1 = 0.045212 0.48758 0.75138 0.36639 0.5676 u2 = 0.10969 0.27046 0.10359 0.87516 0.28295
defaultStream.State = savedState; % This returns the same vector as before u1 = rand(1,5) % Call randn in between the two calls to rand z = randn(1,5); % This returns different values than before u2 = rand(1,5)
u1 = 0.045212 0.48758 0.75138 0.36639 0.5676 u2 = 0.20425 0.043083 0.54273 0.10648 0.20139
The functions do not act independently with respect to the values they return, however, the values generated by the three functions can still be treated as statistically independent regardless of what order you generate them in.
One practical application of saving and restoring the state might be to reproduce an entire Monte-Carlo simulation run, by saving the state before running it. Another might be to recreate one particularly interesting iteration in a Monte-Carlo simulation, having saved the state before that iteration. Another might be to recreate just one call to rand that happened deep in your code, so that you can recreate that call elsewhere, perhaps for debugging. As long as you have saved the state at the appropriate point, you can "jump to" anywhere in the sequence of random values. That's the advantage over resetting or seeding a stream, but it's the catch too: you have to have saved the right state beforehand.
As an aside, the internal states of the various generators available in MATLAB have different sizes and formats, and you can see what they are by looking at the State property of different streams. But I recommend not getting too familiar with, or modifying, the inner workings of these state vectors. Read 'n' write 'em, yes, perhaps even rope 'n' brand 'em, but don't try and understand 'em.
Where Not To Do Any Of This
Sometimes, discretion is the better part of valor. Be careful to understand why and when you do any of this. Remember, resetting or replacing the default stream, or writing to its state, or jumping to a substream affects all subsequent calls to rand, randn, and randi, and is probably not something you want to do deep in your code - too easy to forget that it is in there. Replacing the default stream, especially, is probably not something you need to do a lot of.
Next Time
We've seen four ways to reproduce results from MATLAB's random number generators.
- Resetting a stream
- Creating a new stream with a fixed seed
- Substreams
- Explicitly saving and restoring the generator state
But sometimes what's needed is the "opposite of reproducibility" - you want to make sure that different runs of a simulation use different random inputs, because you want to be able to assume that the different runs are statistically independent. Until next time ...
Other Situations?
Can you think of any cases where you've needed to reproduce random numbers that can't be handled by the examples I've shown? Let me know here.