# New Ways With Random Numbers, Part I

Today I'd like to introduce a guest blogger, Peter Perkins, who is a statistical software developer here at The MathWorks. In his life prior to The MathWorks, he worked as a statistician doing marine mammal research for the U.S. National Marine Fisheries Service, and one of his interests in MATLAB is random number generation and Monte-Carlo simulation.

### Contents

Almost everyone who's used MATLAB has used the `rand` or `randn` functions at one time or another. The way people use them varies widely, though. Sometimes, you might just want to fill
in a vector or matrix with values other than 0 or 1, in order to try out some new lines of code that you've written. At the
other end of the spectrum, you might be running a Monte-Carlo simulation where the statistical properties of the pseudo-random
values matter.

But regardless of the context, one of the questions I consistently hear is, "How and when should I initialize or reinitialize
or reset MATLAB's random number generator?" The answer I usually give is, annoyingly, another question: "Why do you want to
do that?" That's because messing with the state of a random number generator can affect the statistical properties of the
values it generates, and the right answer to the original question depends on the context - quite often, the answer is "Don't."
There's a lot that could be said about the *when* and the *why*, and I'll touch on that. But mostly, I want to show some of the *how* that's new in MATLAB R2008b.

If you want to run the code that I used to create this blog entry, you'll need R2008b or later, and you'll want to start with a fresh MATLAB session, or else execute these two commands (don't worry, I'll explain what they do).

```
mtstream = RandStream('mt19937ar');
RandStream.setDefaultStream(mtstream);
```

### Random Streams in MATLAB

New in MATLAB R2008b is the addition of random number streams. A random number stream is just what it sounds like: a source
for a sequence of random values. Usually, people talk about streams of values from a standard uniform distribution, and that's
the sense that MATLAB uses too. Actually, MATLAB has always had random number streams in the abstract, underneath `rand` and `randn`. But in R2008b, streams are something you can interact with directly, and you can even create your own streams. There's
lots to talk about there, but this article is about how to initialize or reset a stream, so what follows is just a brief summary.
There's lots more in MATLAB's documentation.

Quite often, you don't need to worry about streams at all; MATLAB creates one for you when it starts up. `rand`, `randn`, and the new `randi` function draw values from that stream, and you can generate random numbers without ever knowing or caring what stream they
come from. And there's nothing wrong with that - if the random number generators in MATLAB are doing their job correctly
(and in R2008b, you have a choice among three generators based on state-of-the-art algorithms), you should be able to just
generate random numbers using `rand`, `randn` or `randi`, and treat everything they return as independent random values.

But how do you go back and reproduce the random values you generate or otherwise control how they're generated? There's a new way to do that: you interact with a new kind of object in MATLAB, one that represents a random stream. Let's get the stream that MATLAB creates at startup.

defaultStream = RandStream.getDefaultStream()

defaultStream = mt19937ar random stream (current default) Seed: 0 RandnAlg: Ziggurat

The "mt19937ar" indicates that this stream generates values using the most common of the Mersenne Twister algorithms, the same algorithm that's been underneath `rand` for several releases. There are other generator choices; more about that in a moment. MATLAB created the stream using a
seed of 0; more about *that* in a moment as well.

What's `RandnAlg`? One of the things that's new in R2008b is that `randn` draws values from the same stream as `rand`, although of course the uniform values have to be transformed to a standard normal distribution. The stream shown here makes
that transformation using the Ziggurat algorithm, although you can change it (sorry, no more about that; you'll have to read the doc).

Finally, the display for this stream shows that it is the *default stream*. That just means that it's the stream that `rand`, `randn`, and `randi` draw their values from. Below, I'll show how you can create your own stream, and make *it* the default. There are a couple of reasons why you might want to do that. But first, let's see how to use the default stream
to reproduce results.

### Resetting a Random Stream

Sometimes it's useful to be able to reproduce the output from MATLAB's random number generators. For example, you might want
to rerun a Monte-Carlo simulation repeatedly, using exactly the same random values because you are debugging the code. In
fact, MATLAB initializes the default random stream to the same internal state each time it starts up, so the same calls to
`rand`, `randn`, and `randi` will return the same "random" values every you start MATLAB (unless you take steps - more about that in a minute). But it
would be tedious to have to restart MATLAB each time you wanted to rerun a simulation.

The simplest way to reproduce random number generator output is to reset the default stream, which puts that stream back to the state it had when initially created.

reset(defaultStream) z1 = randn(1,5)

z1 = 0.53767 1.8339 -2.2588 0.86217 0.31877

If I do it again, I'll get the same values again.

reset(defaultStream) z2 = randn(1,5)

z2 = 0.53767 1.8339 -2.2588 0.86217 0.31877

While simple, `reset` is kind of a blunt instrument, and provides reproducibility only on a very gross scale - it's as if you restarted MATLAB.
It's not something you should do unless you really want to reuse the same random sequence over again.

### Seeding a Stream

All pseudo-random number generators have the notion of a seed. Think of a stream as a very long sequence of values, wrapped into a circle, because it repeats when exhausted. Think of a seed as a starting position on the circle that you specify when you create a stream. There are (usually) many more random values than there are seeds, so it's not possible to start just anywhere by specifying a seed. And there's usually no simple description of where, say, the seed 0 lies in relation to, say, the seeds 1 or 2 other than knowing that they represent different starting positions. But if you create a stream with the same seed, you'll get the same sequence of values every time.

When MATLAB starts, it creates the default stream with commands equivalent to

stream0 = RandStream('mt19937ar','Seed',0) RandStream.setDefaultStream(stream0);

stream0 = mt19937ar random stream Seed: 0 RandnAlg: Ziggurat

We saw that earlier. If I create a new stream with a different seed, such as

stream1 = RandStream('mt19937ar','Seed',1)

stream1 = mt19937ar random stream Seed: 1 RandnAlg: Ziggurat

and make *it* the default stream,

RandStream.setDefaultStream(stream1);

I'll get a *different* sequence of random values.

randn(1,5)

ans = -0.64901 1.1812 -0.75845 -1.1096 -0.84555

But if I always use the same seed, I'll always get the same random number stream. This is a lot like resetting the current
default stream - in fact, `reset` is nothing more than reseeding. The main difference between how I used `reset` above and here is that here, I've created my own stream, so I know *exactly* what I'm getting. Before, I'd get repeatability, but the specific random values depended on what generator algorithm was
underneath the current default stream.

Reseeding is kind of a big hammer - you would not want to create new streams every time you needed a set of random values. It's most useful if you use it as an initialization step, perhaps at MATLAB startup, perhaps before running a simulation.

### Using Substreams for Reproducibility

We've seen that a drawback of using `reset` or `'Seed'` to reproduce results is that you essentially have to start over from the beginning. A new feature in R2008b, substreams,
provides a new way around that drawback.

Remember the circle? Substreams are nothing more than fixed "checkpoints" in a random number stream, defined by their position
around the circle, spaced evenly at very large intervals. The substreams themselves are indexed beginning at 1, and you can
jump to the beginning of a substream by setting a stream's `Substream` property. Values generated from different substreams are statistically independent in the same sense as consecutive values
in a stream are.

Two new generator algorithms in R2008b support substreams: the Combined Multiple Recursive (`'mrg32k3a'`) and the Multiplicative Lagged Fibonacci (`'mlfg6331_64'`) generators (the doc has references). I'll create one of the former, and make it the default stream.

```
stream = RandStream('mrg32k3a');
RandStream.setDefaultStream(stream);
```

To reposition a stream to the beginning of a particular substream, I just set its `Substream` property.

stream.Substream = 2;

Much like seeds, substreams are nothing more than aliases for particular points along a sequence of random values. So what's different about them? There are some differences that I'll get into next time, but for now, the most obvious difference is that you can jump between substreams without having to create a new stream. It's a fairly lightweight operation.

Let's see what happens if I position a random number stream to the beginning of a substream before each iteration of a loop.

for i = 1:5 stream.Substream = i; z = rand(1,i) end

z = 0.72701 z = 0.5582 0.85273 z = 0.16662 0.29236 0.77278 z = 0.34773 0.38864 0.80161 0.95025 z = 0.56709 0.68377 0.29879 0.59318 0.69247

Now I can reproduce results from the 3rd iteration simply by repositioning the stream to the corresponding substream. I didn't have to rerun the whole loop, and I didn't even have to know beforehand that I wanted to look at this iteration.

i = 3; stream.Substream = i; z = rand(1,i)

z = 0.16662 0.29236 0.77278

If that loop were a time-consuming Monte-Carlo simulation, it would be a real boon to be able to go recreate and investigate any specific iteration without rerunning the whole thing.

Keep in mind that you don't need to overuse substreams. They're there to use, you don't have to worry about "using up" all the values in each substream before moving to the next one, but it would be pointless to take the other extreme and jump to a different substream every time you generate a new value. Substreams don't add randomness, they just make it easier to reproduce values.

### Saving and Restoring State

`rand` and `randn` are pseudorandom number generators, and all pseudorandom generator algorithms have an internal state that determines what the next value will be. It's a deterministic
algorithm. Remember the circle? Each point along the circle corresponds to a different state. But for almost all modern
algorithms, the state is more complicated than just the last number generated or a simple index of the current position in
the stream.

Let's get the state of the default generator. To do that, I'll read the `State` property.

```
defaultStream = RandStream.getDefaultStream()
savedState = defaultStream.State;
whos savedState
```

defaultStream = mrg32k3a random stream (current default) Seed: 0 RandnAlg: Ziggurat Name Size Bytes Class Attributes savedState 12x1 48 uint32

If I call `rand` to generate a few random values,

u1 = rand(1,5)

u1 = 0.83911 0.51069 0.84469 0.68938 0.23777

and then I restore the state that I saved,

defaultStream.State = savedState;

`rand` will generate those exact same values the next time I call it.

u2 = rand(1,5)

u2 = 0.83911 0.51069 0.84469 0.68938 0.23777

The same is also true for `randn` and for `randi`. And because all three functions draw values from the same default stream, there's only one state to worry about - the default
stream's state. I can use the state I saved earlier to reproduce results from `randn`, for example.

defaultStream.State = savedState; z1 = randn(1,5)

z1 = 0.048564 0.87031 0.52365 0.095609 -0.79233

defaultStream.State = savedState; z2 = randn(1,5)

z2 = 0.048564 0.87031 0.52365 0.095609 -0.79233

It's important to remember that because there's a single default stream, calling one function affects what specific values the others return.

savedState = defaultStream.State; u1 = rand(1,5) u2 = rand(1,5)

u1 = 0.045212 0.48758 0.75138 0.36639 0.5676 u2 = 0.10969 0.27046 0.10359 0.87516 0.28295

defaultStream.State = savedState; % This returns the same vector as before u1 = rand(1,5) % Call randn in between the two calls to rand z = randn(1,5); % This returns different values than before u2 = rand(1,5)

u1 = 0.045212 0.48758 0.75138 0.36639 0.5676 u2 = 0.20425 0.043083 0.54273 0.10648 0.20139

The functions do not act independently with respect to the values they return, however, the values generated by the three
functions can still be treated as *statistically* independent regardless of what order you generate them in.

One practical application of saving and restoring the state might be to reproduce an entire Monte-Carlo simulation run, by
saving the state before running it. Another might be to recreate one particularly interesting iteration in a Monte-Carlo simulation,
having saved the state before that iteration. Another might be to recreate just one call to `rand` that happened deep in your code, so that you can recreate that call elsewhere, perhaps for debugging. As long as you have
saved the state at the appropriate point, you can "jump to" *anywhere* in the sequence of random values. That's the advantage over resetting or seeding a stream, but it's the catch too: you have
to have saved the right state beforehand.

As an aside, the internal states of the various generators available in MATLAB have different sizes and formats, and you can
see what they are by looking at the `State` property of different streams. But I recommend not getting too familiar with, or modifying, the inner workings of these
state vectors. Read 'n' write 'em, yes, perhaps even rope 'n' brand 'em, but don't try and understand 'em.

### Where Not To Do Any Of This

Sometimes, discretion is the better part of valor. Be careful to understand why and when you do any of this. Remember, resetting
or replacing the default stream, or writing to its state, or jumping to a substream affects all subsequent calls to `rand`, `randn`, and `randi`, and is probably not something you want to do deep in your code - too easy to forget that it is in there. Replacing the
default stream, especially, is probably not something you need to do a lot of.

### Next Time

We've seen four ways to reproduce results from MATLAB's random number generators.

- Resetting a stream
- Creating a new stream with a fixed seed
- Substreams
- Explicitly saving and restoring the generator state

But sometimes what's needed is the "opposite of reproducibility" - you want to make sure that different runs of a simulation
use *different* random inputs, because you want to be able to assume that the different runs are statistically independent. Until next time ...

### Other Situations?

Can you think of any cases where you've needed to reproduce random numbers that can't be handled by the examples I've shown? Let me know here.