Loren on the Art of MATLAB

November 10th, 2008

New Features in R2008b and How to Ask for More

By now, you many know that the second 2008 release of MATLAB and related products is out.

Contents

Some Quick Highlights

Here are just a few quick highlights, some of which you will see in future posts on this blog, and some of which have already shown up on other MathWorks blogs.

Highlights:

  • Enhanced random number generation
  • IDE elements such as a function browser, function hints, and a more flexible current directory browser
  • Additional supported file formats: NetCDF (read/write) and JPEG2000 (read)
  • New data structure containers.Map
  • doc command enhanced to work with user created classes

User Survey

In the past you have added plenty of requests on this blog. Today I ask you to consider filling out this survey. We are looking for input about how you work so we can create new features and functionality to meet your needs.


Get the MATLAB code

Published with MATLAB® 7.7

November 5th, 2008

New Ways With Random Numbers, Part I

Today I'd like to introduce a guest blogger, Peter Perkins, who is a statistical software developer here at The MathWorks. In his life prior to The MathWorks, he worked as a statistician doing marine mammal research for the U.S. National Marine Fisheries Service, and one of his interests in MATLAB is random number generation and Monte-Carlo simulation.

Contents

Almost everyone who's used MATLAB has used the rand or randn functions at one time or another. The way people use them varies widely, though. Sometimes, you might just want to fill in a vector or matrix with values other than 0 or 1, in order to try out some new lines of code that you've written. At the other end of the spectrum, you might be running a Monte-Carlo simulation where the statistical properties of the pseudo-random values matter.

But regardless of the context, one of the questions I consistently hear is, "How and when should I initialize or reinitialize or reset MATLAB's random number generator?" The answer I usually give is, annoyingly, another question: "Why do you want to do that?" That's because messing with the state of a random number generator can affect the statistical properties of the values it generates, and the right answer to the original question depends on the context - quite often, the answer is "Don't." There's a lot that could be said about the when and the why, and I'll touch on that. But mostly, I want to show some of the how that's new in MATLAB R2008b.

If you want to run the code that I used to create this blog entry, you'll need R2008b or later, and you'll want to start with a fresh MATLAB session, or else execute these two commands (don't worry, I'll explain what they do).

    mtstream = RandStream('mt19937ar');
    RandStream.setDefaultStream(mtstream);

Random Streams in MATLAB

New in MATLAB R2008b is the addition of random number streams. A random number stream is just what it sounds like: a source for a sequence of random values. Usually, people talk about streams of values from a standard uniform distribution, and that's the sense that MATLAB uses too. Actually, MATLAB has always had random number streams in the abstract, underneath rand and randn. But in R2008b, streams are something you can interact with directly, and you can even create your own streams. There's lots to talk about there, but this article is about how to initialize or reset a stream, so what follows is just a brief summary. There's lots more in MATLAB's documentation.

Quite often, you don't need to worry about streams at all; MATLAB creates one for you when it starts up. rand, randn, and the new randi function draw values from that stream, and you can generate random numbers without ever knowing or caring what stream they come from. And there's nothing wrong with that - if the random number generators in MATLAB are doing their job correctly (and in R2008b, you have a choice among three generators based on state-of-the-art algorithms), you should be able to just generate random numbers using rand, randn or randi, and treat everything they return as independent random values.

But how do you go back and reproduce the random values you generate or otherwise control how they're generated? There's a new way to do that: you interact with a new kind of object in MATLAB, one that represents a random stream. Let's get the stream that MATLAB creates at startup.

defaultStream = RandStream.getDefaultStream()
defaultStream = 
mt19937ar random stream (current default)
             Seed: 0
         RandnAlg: Ziggurat

The "mt19937ar" indicates that this stream generates values using the most common of the Mersenne Twister algorithms, the same algorithm that's been underneath rand for several releases. There are other generator choices; more about that in a moment. MATLAB created the stream using a seed of 0; more about that in a moment as well.

What's RandnAlg? One of the things that's new in R2008b is that randn draws values from the same stream as rand, although of course the uniform values have to be transformed to a standard normal distribution. The stream shown here makes that transformation using the Ziggurat algorithm, although you can change it (sorry, no more about that; you'll have to read the doc).

Finally, the display for this stream shows that it is the default stream. That just means that it's the stream that rand, randn, and randi draw their values from. Below, I'll show how you can create your own stream, and make it the default. There are a couple of reasons why you might want to do that. But first, let's see how to use the default stream to reproduce results.

Resetting a Random Stream

Sometimes it's useful to be able to reproduce the output from MATLAB's random number generators. For example, you might want to rerun a Monte-Carlo simulation repeatedly, using exactly the same random values because you are debugging the code. In fact, MATLAB initializes the default random stream to the same internal state each time it starts up, so the same calls to rand, randn, and randi will return the same "random" values every you start MATLAB (unless you take steps - more about that in a minute). But it would be tedious to have to restart MATLAB each time you wanted to rerun a simulation.

The simplest way to reproduce random number generator output is to reset the default stream, which puts that stream back to the state it had when initially created.

reset(defaultStream)
z1 = randn(1,5)
z1 =
      0.53767       1.8339      -2.2588      0.86217      0.31877

If I do it again, I'll get the same values again.

reset(defaultStream)
z2 = randn(1,5)
z2 =
      0.53767       1.8339      -2.2588      0.86217      0.31877

While simple, reset is kind of a blunt instrument, and provides reproducibility only on a very gross scale - it's as if you restarted MATLAB. It's not something you should do unless you really want to reuse the same random sequence over again.

Seeding a Stream

All pseudo-random number generators have the notion of a seed. Think of a stream as a very long sequence of values, wrapped into a circle, because it repeats when exhausted. Think of a seed as a starting position on the circle that you specify when you create a stream. There are (usually) many more random values than there are seeds, so it's not possible to start just anywhere by specifying a seed. And there's usually no simple description of where, say, the seed 0 lies in relation to, say, the seeds 1 or 2 other than knowing that they represent different starting positions. But if you create a stream with the same seed, you'll get the same sequence of values every time.

When MATLAB starts, it creates the default stream with commands equivalent to

stream0 = RandStream('mt19937ar','Seed',0)
RandStream.setDefaultStream(stream0);
stream0 = 
mt19937ar random stream
             Seed: 0
         RandnAlg: Ziggurat

We saw that earlier. If I create a new stream with a different seed, such as

stream1 = RandStream('mt19937ar','Seed',1)
stream1 = 
mt19937ar random stream
             Seed: 1
         RandnAlg: Ziggurat

and make it the default stream,

RandStream.setDefaultStream(stream1);

I'll get a different sequence of random values.

randn(1,5)
ans =
     -0.64901       1.1812     -0.75845      -1.1096     -0.84555

But if I always use the same seed, I'll always get the same random number stream. This is a lot like resetting the current default stream - in fact, reset is nothing more than reseeding. The main difference between how I used reset above and here is that here, I've created my own stream, so I know exactly what I'm getting. Before, I'd get repeatability, but the specific random values depended on what generator algorithm was underneath the current default stream.

Reseeding is kind of a big hammer - you would not want to create new streams every time you needed a set of random values. It's most useful if you use it as an initialization step, perhaps at MATLAB startup, perhaps before running a simulation.

Using Substreams for Reproducibility

We've seen that a drawback of using reset or 'Seed' to reproduce results is that you essentially have to start over from the beginning. A new feature in R2008b, substreams, provides a new way around that drawback.

Remember the circle? Substreams are nothing more than fixed "checkpoints" in a random number stream, defined by their position around the circle, spaced evenly at very large intervals. The substreams themselves are indexed beginning at 1, and you can jump to the beginning of a substream by setting a stream's Substream property. Values generated from different substreams are statistically independent in the same sense as consecutive values in a stream are.

Two new generator algorithms in R2008b support substreams: the Combined Multiple Recursive ('mrg32k3a') and the Multiplicative Lagged Fibonacci ('mlfg6331_64') generators (the doc has references). I'll create one of the former, and make it the default stream.

stream = RandStream('mrg32k3a');
RandStream.setDefaultStream(stream);

To reposition a stream to the beginning of a particular substream, I just set its Substream property.

stream.Substream = 2;

Much like seeds, substreams are nothing more than aliases for particular points along a sequence of random values. So what's different about them? There are some differences that I'll get into next time, but for now, the most obvious difference is that you can jump between substreams without having to create a new stream. It's a fairly lightweight operation.

Let's see what happens if I position a random number stream to the beginning of a substream before each iteration of a loop.

for i = 1:5
    stream.Substream = i;
    z = rand(1,i)
end
z =
      0.72701
z =
       0.5582      0.85273
z =
      0.16662      0.29236      0.77278
z =
      0.34773      0.38864      0.80161      0.95025
z =
      0.56709      0.68377      0.29879      0.59318      0.69247

Now I can reproduce results from the 3rd iteration simply by repositioning the stream to the corresponding substream. I didn't have to rerun the whole loop, and I didn't even have to know beforehand that I wanted to look at this iteration.

i = 3;
stream.Substream = i;
z = rand(1,i)
z =
      0.16662      0.29236      0.77278

If that loop were a time-consuming Monte-Carlo simulation, it would be a real boon to be able to go recreate and investigate any specific iteration without rerunning the whole thing.

Keep in mind that you don't need to overuse substreams. They're there to use, you don't have to worry about "using up" all the values in each substream before moving to the next one, but it would be pointless to take the other extreme and jump to a different substream every time you generate a new value. Substreams don't add randomness, they just make it easier to reproduce values.

Saving and Restoring State

rand and randn are pseudorandom number generators, and all pseudorandom generator algorithms have an internal state that determines what the next value will be. It's a deterministic algorithm. Remember the circle? Each point along the circle corresponds to a different state. But for almost all modern algorithms, the state is more complicated than just the last number generated or a simple index of the current position in the stream.

Let's get the state of the default generator. To do that, I'll read the State property.

defaultStream = RandStream.getDefaultStream()
savedState = defaultStream.State;
whos savedState
defaultStream = 
mrg32k3a random stream (current default)
             Seed: 0
         RandnAlg: Ziggurat
  Name             Size            Bytes  Class     Attributes

  savedState      12x1                48  uint32              

If I call rand to generate a few random values,

u1 = rand(1,5)
u1 =
      0.83911      0.51069      0.84469      0.68938      0.23777

and then I restore the state that I saved,

defaultStream.State = savedState;

rand will generate those exact same values the next time I call it.

u2 = rand(1,5)
u2 =
      0.83911      0.51069      0.84469      0.68938      0.23777

The same is also true for randn and for randi. And because all three functions draw values from the same default stream, there's only one state to worry about - the default stream's state. I can use the state I saved earlier to reproduce results from randn, for example.

defaultStream.State = savedState;
z1 = randn(1,5)
z1 =
     0.048564      0.87031      0.52365     0.095609     -0.79233
defaultStream.State = savedState;
z2 = randn(1,5)
z2 =
     0.048564      0.87031      0.52365     0.095609     -0.79233

It's important to remember that because there's a single default stream, calling one function affects what specific values the others return.

savedState = defaultStream.State;
u1 = rand(1,5)
u2 = rand(1,5)
u1 =
     0.045212      0.48758      0.75138      0.36639       0.5676
u2 =
      0.10969      0.27046      0.10359      0.87516      0.28295
defaultStream.State = savedState;
% This returns the same vector as before
u1 = rand(1,5)
% Call randn in between the two calls to rand
z = randn(1,5);
% This returns different values than before
u2 = rand(1,5)
u1 =
     0.045212      0.48758      0.75138      0.36639       0.5676
u2 =
      0.20425     0.043083      0.54273      0.10648      0.20139

The functions do not act independently with respect to the values they return, however, the values generated by the three functions can still be treated as statistically independent regardless of what order you generate them in.

One practical application of saving and restoring the state might be to reproduce an entire Monte-Carlo simulation run, by saving the state before running it. Another might be to recreate one particularly interesting iteration in a Monte-Carlo simulation, having saved the state before that iteration. Another might be to recreate just one call to rand that happened deep in your code, so that you can recreate that call elsewhere, perhaps for debugging. As long as you have saved the state at the appropriate point, you can "jump to" anywhere in the sequence of random values. That's the advantage over resetting or seeding a stream, but it's the catch too: you have to have saved the right state beforehand.

As an aside, the internal states of the various generators available in MATLAB have different sizes and formats, and you can see what they are by looking at the State property of different streams. But I recommend not getting too familiar with, or modifying, the inner workings of these state vectors. Read 'n' write 'em, yes, perhaps even rope 'n' brand 'em, but don't try and understand 'em.

Where Not To Do Any Of This

Sometimes, discretion is the better part of valor. Be careful to understand why and when you do any of this. Remember, resetting or replacing the default stream, or writing to its state, or jumping to a substream affects all subsequent calls to rand, randn, and randi, and is probably not something you want to do deep in your code - too easy to forget that it is in there. Replacing the default stream, especially, is probably not something you need to do a lot of.

Next Time

We've seen four ways to reproduce results from MATLAB's random number generators.

  • Resetting a stream
  • Creating a new stream with a fixed seed
  • Substreams
  • Explicitly saving and restoring the generator state

But sometimes what's needed is the "opposite of reproducibility" - you want to make sure that different runs of a simulation use different random inputs, because you want to be able to assume that the different runs are statistically independent. Until next time ...

Other Situations?

Can you think of any cases where you've needed to reproduce random numbers that can't be handled by the examples I've shown? Let me know here.


Get the MATLAB code

Published with MATLAB® 7.7

October 31st, 2008

Olympic Rings

Well, I am very late on this challenge. In August, Mike posted some code to reproduce the Olympic rings and mentioned that I might have a more clever way to create the same plot. Here's my attempt.

Contents

Create X and Y Values for the 5 Rings

First, I'll follow Mike's parameters and code to create the initial five rings.

N = 1000;
angle = linspace(pi/4,9*pi/4,N);

xb = cos(angle) * 0.9;
yb = sin(angle) * 0.9;

xy = cos(angle) * 0.9 + 1;
yy = sin(angle) * 0.9 - 1;

xk = cos(angle) * 0.9 + 2;
yk = sin(angle) * 0.9;

xg = cos(angle) * 0.9 + 3;
yg = sin(angle) * 0.9 - 1;

xr = cos(angle) * 0.9 + 4;
yr = sin(angle) * 0.9;

Mike's Rings

What Mike does next is to break rings into segments for plotting so the last ring plotted in any given location is the color that should be on top.

h1 = figure;
hold on
plot(xb(1:3*N/4),yb(1:3*N/4),'b','linewidth',5);
plot(xy(N/4:N),yy(N/4:N),'y','linewidth',5)

plot(xk(1:3*N/4),yk(1:3*N/4),'k','linewidth',5);
plot(xy(1:N/4),yy(1:N/4),'y','linewidth',5);
plot(xb(3*N/4:end),yb(3*N/4:end),'b','linewidth',5);

plot(xr(1:N/2),yr(1:N/2),'r','linewidth',5);
plot(xg(1:N),yg(1:N),'g','linewidth',5);

plot(xk(3*N/4:N),yk(3*N/4:N),'k','linewidth',5);
plot(xr(N/2:N),yr(N/2:N),'r','linewidth',5);

% make the axis pretty
axis equal
axis off
xlim([-1.2 5.2])
set(h1,'Color',[1 1 1])
hold off

My Solution

My "cleverness" is to offset the rings in the Z-plane.

First I noticed that all the top rings have an axis about which I can "tilt" them ever so slightly to get the most of the over/under behavior for the rings. Add small Z values to each ring.

thetab = -pi/4;
thetak = -pi/4;
thetar = -pi/4;

Now I create Z values for the rings in the upper rows by tilting them about the axes defined by the angles above (which happen to be identical).

zb = cos(angle + thetab) * 0.1;
zk = cos(angle + thetak) * 0.1;
zr = cos(angle + thetar) * 0.1;

Next I deal with the lower rings in a lazy way, introducing a discontinuity in Z. If I had more patience today, I could have come up with an undulation in Z that was more than a tilted plane or a step.

I find the angles where I need to depress the bottom rings relative to the top and set appropriate Z values for these to be less than 0.

mask = angle >= 2*pi;
zg = zeros(size(xb));
zy = zeros(size(xb));
zg(mask) = -0.1;
zy(mask) = -0.1;

Loren's Rings

Here's my rendition of the rings. My plot commands are a bit less complicated than Mike's. I do the work earlier by setting Z values. Also, because of my laziness, if you were to rotate my version of rings in 3-D, you'd see the bottom two aren't really rings, but have discontinuities.

h2 = figure;
hold on
plot3(xb,yb,zb,'b.','markersize',10);
plot3(xy,yy,zy,'y.','markersize',10);
plot3(xk,yk,zk,'k.','markersize',10);
plot3(xg,yg,zg,'g.','markersize',10);
plot3(xr,yr,zr,'r.','markersize',10);
% make the axis pretty
axis equal
axis off
xlim([-1.2 5.2])
set(h2,'Color',[1 1 1])
hold off

Any Other Methods?

Mike and I show 2 methods to generate interlocking rings. Do you have any other ideas for doing this? Let me know here.


Get the MATLAB code

Published with MATLAB® 7.7

October 13th, 2008

Vectorizing the Notion of Colon (:)

The other day, one of my MathWorks buddies, Peter, asked me if I knew any tricks for vectorizing the notion of the colon (:) operator. For me, this sort of inquiry sometimes has an effect similar to waving a flag in front of a bull! The challenge was on.

I know people have posted methods in the past on the MathWorks newsgroup. I quickly had a few more ideas and implemented them.

Contents

Problem Statement

Suppose I have matching pairs of integers depicting the starting and stopping indices for selecting points from another vector. I've shown in other posts that vectorizing the indexing can have tremendous performance benefits for your code. What I'd like to do is create one full list of indices so I ultimately only index into the data vector once.

Solution Variants

Here are a few ways to solve the problem.

  • Build up the array of indices, looping (for) over each pair, one at a time.
  • Write the indices to a string and have MATLAB evaluate the various : expressions.
  • Use arrayfun (two algorithms) as the engine for creating the output.
  • Modify the idea of run-length decoding, inspired by Peter Acklam's run length decoder. MathWorks Peter used this as a basis for the rld-based algorithm.

Test the Solutions

Let's prove that the solutions all get the same answers first before delving into the algorithm details. I'll give a vector of starting indices followed by a vector of ending indices. If an ending value is smaller than a starting value, that pair is effectively skipped. Duplicate starting points are allowed.

starts = [1 7 7 15 17];
ends = [3 8 6 15 19];
expectedAnswer = [1 2 3 7 8 15 17 18 19];

Round up the files to run.

d = dir('coloncat*.m');
d.name
ans =
coloncatarrf1.m
ans =
coloncatarrf2.m
ans =
coloncatfor.m
ans =
coloncatrld.m
ans =
coloncatstr.m

Run each file with the test data.

for k = 1:length(d)
    func = str2func(d(k).name(1:end-2));
    outputs{k} = func(starts, ends);
end

Validate that all the methods calculated the expected results.

rightAnswers = isequal(expectedAnswer,outputs{:})
rightAnswers =
     1

Look at the Algorithms

Let's look at the variety of algorithms and coding techniques I used. If you look at the first chunk of code in each file, you will see they are algorithmically identical - checking for sequences that have length of at least one (1) and deleting the other entries. What we really want to compare are the lines that follow that section.

Use a for Loop

This is perhaps the most straightforward method.

dbtype coloncatfor
1     function x = coloncatfor(start, stop)
2     % COLONCAT Concatenate colon expressions
3     %    X = COLONCAT(START,STOP) returns a vector containing the values
4     %    [START(1):STOP(1) START(2):STOP(2) START(END):STOP(END)].
5     
6     len = stop - start + 1;
7     
8     % keep only sequences whose length is positive
9     pos = len > 0;
10    start = start(pos);
11    stop = stop(pos);
12    len = len(pos);
13    if isempty(len)
14        x = [];
15        return;
16    end
17    
18    % find end and beginning indices for each chunk
19    partialLength = cumsum(len);
20    cumStart = [1 partialLength(1:end-1)+1];
21    
22    % preallocate output
23    % then loop through start/stop pairs, in order, and fill
24    numtot = sum(len);
25    x = zeros(1,numtot);
26    for k = 1:length(start)
27        x(cumStart(k):partialLength(k)) = start(k):stop(k);
28    end

After culling the start and stop indices, I calculate where each successive sequence ends (partialLength) and where they each begin (cumStart). I next preallocate the output array to its final size. And finally, as I loop through the matched pairs of start and stop values, I generate the colon sequence and fill these values into the correct output locations.

Use sprintf and eval

This method uses sprintf which allows me to gather all of the index start and stop values into a string without looping. I use eval evaluate the string, thereby computing the output.

dbtype coloncatstr
1     function x = coloncatstr(start, stop)
2     % COLONCAT Concatenate colon expressions
3     %    X = COLONCAT(START,STOP) returns a vector containing the values
4     %    [START(1):STOP(1) START(2):STOP(2) START(END):STOP(END)].
5     
6     len = stop - start + 1;
7     
8     % keep only sequences whose length is positive
9     pos = len > 0;
10    start = start(pos);
11    stop = stop(pos);
12    len = len(pos);
13    if isempty(len)
14        x = [];
15        return;
16    end
17    
18    % create list of indices for : and evaluate
19    indices = [start; stop];
20    list = sprintf('%d:%d ',indices);
21    x = eval(['[', list, ']']);

Use arrayfun - Method 1

The next two methods use arrayfun to do the work. Here's the first approach.

dbtype coloncatarrf1
1     function x = coloncatarrf1(start, stop)
2     % COLONCAT Concatenate colon expressions
3     %    X = COLONCAT(START,STOP) returns a vector containing the values
4     %    [START(1):STOP(1) START(2):STOP(2) START(END):STOP(END)].
5     
6     len = stop - start + 1;
7     
8     % keep only sequences whose length is positive
9     pos = len > 0;
10    start = start(pos);
11    stop = stop(pos);
12    len = len(pos);
13    if isempty(len)
14        x = [];
15        return;
16    end
17    
18    % put entries into a cell array for later use by arrayfun
19    cstart = num2cell(start);
20    cstop = num2cell(stop);
21    vals = arrayfun(@(x,y){x{1}:y{1}},cstart,cstop);
22    x = [vals{:}];

Place the start and stop values into cell arrays to feed to arrayfun. The entries for the two arrays act in pairs, and the function that I have arrayfun apply places in a cell array the sequence of values between the numbers in the paired cells. Finally, take all the output cells in the cell array and string out the values into a single numeric vector.

Use arrayfun - Method 2

Here's the second approach using arrayfun. It involves a for loop as well.

dbtype coloncatarrf2
1     function x = coloncatarrf2(start, stop)
2     % COLONCAT Concatenate colon expressions
3     %    X = COLONCAT(START,STOP) returns a vector containing the values
4     %    [START(1):STOP(1) START(2):STOP(2) START(END):STOP(END)].
5     
6     len = stop - start + 1;
7     
8     % keep only sequences whose length is positive
9     pos = len > 0;
10    start = start(pos);
11    stop = stop(pos);
12    len = len(pos);
13    if isempty(len)
14        x = [];
15        return;
16    end
17    
18    % Find unique sequence lengths
19    uniqlens = unique(len);
20    
21    % preallocate space for cell outputs of arrayfun
22    chunks = cell(1,length(len));
23    
24    % Now loop over the lengths, being sure to place the 
25    % outputs in the right positions.
26    for ilen = uniqlens(:)'
27        idxlen = find(len==ilen);
28        chunks(idxlen) = arrayfun(@(xx,yy)xx:yy,...
29            start(idxlen),stop(idxlen),'UniformOutput',false);
30    end
31    % reassemble chunks into output
32    x = [chunks{:}];

The concept here is to loop over sequences that have the same length. At most, this for loop is the same length as the number of values in the initial start vector. If the length of the inputs is long, however, it might be advantageous to loop over sequences that generate the same length (commensurate size) output. Placing the output into cells allows me to not worry that I might be intermingling output vectors of different sizes. Once all the output sequences are collected in the cell array, I flatten it into a vector the same way I did with method 1.

Use Run-Length Decoding Idea

Here's an algorithm based on the idea of run-length decoding.

dbtype coloncatrld
1     function x = coloncatrld(start, stop)
2     % COLONCAT Concatenate colon expressions
3     %    X = COLONCAT(START,STOP) returns a vector containing the values
4     %    [START(1):STOP(1) START(2):STOP(2) START(END):STOP(END)].
5     
6     % Based on Peter Acklam's code for run length decoding.
7     len = stop - start + 1;
8     
9     % keep only sequences whose length is positive
10    pos = len > 0;
11    start = start(pos);
12    stop = stop(pos);
13    len = len(pos);
14    if isempty(len)
15        x = [];
16        return;
17    end
18    
19    % expand out the colon expressions
20    endlocs = cumsum(len);  
21    incr = ones(1, endlocs(end));  
22    jumps = start(2:end) - stop(1:end-1);  
23    incr(endlocs(1:end-1)+1) = jumps;
24    incr(1) = start(1);
25    x = cumsum(incr);

The idea is to expand the colon sequences out. I know the lengths of each sequence so I know the starting points in the output array. Fill the values after the start values with 1s. Then I figure out how much to jump from the end of one sequence to the beginning of the next one. If there are repeated start values, the jumps might be negative. Once this array is filled, the output is simply the cumulative sum or cumsum of the sequence.

Other Methods?

These methods I've shown, for vectorizing the notion of the colon operator, are ones that came very quickly. I haven't spent concerted time looking for other algorithms though I am positive some others, including some clever ones, probably exist. Can you help me and provide other possible algorithmic ideas? Please post them here.


Get the MATLAB code

Published with MATLAB® 7.7

October 3rd, 2008

MATLAB for Teaching

I occasionally get involved in activities at MathWorks that aren't strictly focused on development. These activities often include interacting with customers, giving seminars, writing articles, for example. Early this summer, for instance, I spent a little bit of time creating a webinar about using MATLAB for Teaching. I tried to design it with the goal of reaching people already familiar with MATLAB as well as those who are not yet acquainted with it.

For those of you already using MATLAB, whether for teaching or not, the earlier demo in the webinar is fairly rudimentary, since I give an introductory demo of MATLAB, including numerical calculations, visualization, finding information, and using the editor to create an M-file.

Even for those of you not new to MATLAB, you may find the latter part interesting. I show a rich example, motivated by the work of Professor Joseph M. Mahaffy of San Diego State University. The example illustrates how to estimate the physical characteristics of a spring pendulum. It's an example of how you might connect work in the laboratory with analyzing the collected data, incorporating mathematical modeling and visualization, common denominators for MATLAB users. Along the way I also use cell mode execution, a technique that allows the student to interactively test out various parameters. And I show the published report, a way to share the results without writing a separate report.

There are additional resources for those of you interested in incorporating MATLAB into your curricula. I list a few of them here.

And here are resources for your students, especially if you want to have your students gain familiarity with MATLAB in advance of your classes.


Get the MATLAB code

Published with MATLAB® 7.6

September 25th, 2008

Timing Extraction of Parts of an Array

In Sarah's blog, Dan asked about speed of removing elements. There are a number of ways of deleting elements in MATLAB. So, what's the "best" way?

Contents

Vector Data

I only work with vector, or 1-D data for this blog. And I use linear indexing (i.e., indexing with only 1 number). First let me generate some points, and perform an fft.

nPts = 1e7;
cutoff = round(nPts/2);
A = rand(nPts,1);
B = fft(A);

I want to keep a copy of B off to the side so I can do more experiments on it later. I want to be sure the copy doesn't share memory with the array being used later on, so I modify the first element (keeping it the same value).

Borig = B;
Borig(1) = Borig(1) + sin(0);

Methods That Come to Mind

Let me list the methods of paring down an array that I can think of quickly.

  • Find the values to keep and assign them to a new array.
  • Find the values to delete and set them to empty ([]) using end.
  • Find the values to delete and specify them explicitly, i.e., without using end.
  • Identify the original values and place them in a new output array.
  • Test Various Methods

    t1 = tic;
    B=B(1:cutoff);
    toc(t1)
    B = Borig;
    Borig(1) = Borig(1) + sin(0);
    t2 = tic;
    B(cutoff+1:end)=[];
    toc(t2)
    B = Borig;
    Borig(1) = Borig(1) + sin(0);
    t3 = tic;
    B(cutoff+1:nPts)=[];
    toc(t3)
    B = Borig;
    Borig(1) = Borig(1) + sin(0);
    t4 = tic;
    C = B(1:cutoff);
    toc(t4)
    Elapsed time is 0.131063 seconds.
    Elapsed time is 0.322803 seconds.
    Elapsed time is 0.327507 seconds.
    Elapsed time is 0.125420 seconds.
    

    It doesn't seem like using end vs. the actual number for doing the indexing makes nearly as much difference as whether or not I copy the elements to keep or delete the elements to remove.

    Keep Most Elements

    Let's repeat the analysis but retain most of the elements this time.

    B = Borig;
    Borig(1) = Borig(1) + sin(0);
    cutoff  = round(0.95*nPts)
    cutoff =
         9500000
    

    Test the various methods.

    t1 = tic;
    B=B(1:cutoff);
    toc(t1)
    B = Borig;
    Borig(1) = Borig(1) + sin(0);
    t2 = tic;
    B(cutoff+1:end)=[];
    toc(t2)
    B = Borig;
    Borig(1) = Borig(1) + sin(0);
    t3 = tic;
    B(cutoff+1:nPts)=[];
    toc(t3)
    B = Borig;
    Borig(1) = Borig(1) + sin(0);
    t4 = tic;
    C = B(1:cutoff);
    toc(t4)
    Elapsed time is 0.240744 seconds.
    Elapsed time is 0.510874 seconds.
    Elapsed time is 0.440639 seconds.
    Elapsed time is 0.243518 seconds.
    

    Interesing. I see roughly the same time pattern as earlier, when I saved half the elements.

    Keep Few Elements

    Finally, sticking to the vector case, keep very few of the elements this time.

    B = Borig;
    Borig(1) = Borig(1) + sin(0);
    cutoff  = round(0.05*nPts)
    cutoff =
          500000
    

    Test the various methods.

    t1 = tic;
    B=B(1:cutoff);
    toc(t1)
    B = Borig;
    Borig(1) = Borig(1) + sin(0);
    t2 = tic;
    B(cutoff+1:end)=[];
    toc(t2)
    B = Borig;
    Borig(1) = Borig(1) + sin(0);
    t3 = tic;
    B(cutoff+1:nPts)=[];
    toc(t3)
    B = Borig;
    Borig(1) = Borig(1) + sin(0);
    t4 = tic;
    C = B(1:cutoff);
    toc(t4)
    Elapsed time is 0.026702 seconds.
    Elapsed time is 0.194346 seconds.
    Elapsed time is 0.195731 seconds.
    Elapsed time is 0.026779 seconds.
    

    Commentary and Question

    I did this experiment with 10 million data elements. If I do it with orders of magnitude fewer points, the results are less dramatically different, though similar. My conclusion is that it is faster to keep the elements you want than to delete the ones you don't. However, it depends on how you reach the state of culling the data.

    Suppose I'm handed the indices of the unwanted elements. Am I better off just using these to delete values or taking the hit to identify the wanted ones and using the these to keep elements? Here's my thinking. If I'm given the unwanted indices, to create the desired ones, I need to make an array of all indices, and then deleted the unwanted ones from this larger array to get the keepers. Then I have do to the actual indexing for the values. So, if presented with indices already, whether ones to be kept or deleted, use them. If not, calculate the keepers and assign these values to the required array.

    Do you find yourself in situations where it is unnatural to calculate or specify the values you want to keep? If so, please post here because I am curious about those situations.


    Get the MATLAB code

    Published with MATLAB® 7.6

    September 17th, 2008

    Managing Non-Deployable Functions in a Compiled Application

    Guest blogger Peter Webb presents another post about building applications with the MATLAB Compiler. This week's topic: understanding the types of functions that cannot be deployed and managing those that behave differently when deployed.

    For an introduction to writing deployable code, please see the June 19th post.

    Contents

    When you're writing an application you intend to deploy with the MATLAB Compiler, you need to be aware that a subset of MATLAB's functions either cannot be deployed or behave differently when deployed.

    There are two classes of non-deployable functions: functions that are not licensed for deployment; and functions that the deployment runtime (the MATLAB Common Runtime or MCR) cannot support.

    Excluded Functions: Design Time Functions and GUIs

    The intent of the MATLAB Compiler and the Builders is to create a robust application with predictable behavior. In part, this means that the users of the deployed application should not be able to modify it. Preventing users from modifying deployed applications has several benefits:

    • Predictable performance and results: The application will always work just as it did when it was deployed.
    • Easier to support: The support organization does not have to cope with user-introduced bugs.
    • Protection of intellectual property: Making the code unreadable (and hence, unmodifiable) means users cannot copy the algorithms.
    • Increased security: An application distributed in binary, encrypted form is more tamper-resistant than one for which the source code is available.

    To prevent the modification of deployed applications, the Compiler includes only those M-files the application actually uses, encrypts the M-files (this also protects the intellectual property they contain), creates MEX authorization files, and truncates the MATLAB path. But editing M-files and modifying the MATLAB path is not the only way to change a MATLAB application. Certain MATLAB and toolbox functions can themselves change an application's behavior. For example, the MATLAB debugger can interactively modify the contents of variables and change the order in which the interpreter calls functions. Similarly, the figure property editor can modify figure callback strings. The Compiler prevents these types of modifications by refusing to compile the functions that perform them; these functions are excluded from the generated CTF archive.

    In general the Compiler excludes those functions used when developing an application. Most of these design time functions fall into one of two categories: those that permit the application to change the way it works, like dbstop and propedit; and those that are complete mini-applications in and of themselves, like imtool and guide. This includes, but is not limited to:

    • All types of debugging functions.
    • Most of the toolbox functions ending in tool, such as imtool and sptool.
    • Functions that generate or edit M-code directly or indirectly; this includes, for example, the Neural Network Toolbox network training functions such as train.

    Many of these restrictions have no workaround: debugging functions and functions which generate M-code simply cannot be deployed. There's no workaround for the debugging functions because they are builtins, and there's no effective way to debug MATLAB code using M-files. Functions which generate M-code will not work in a deployed application because they have no way of encrypting the generated M-code, which the MCR requires. You can, of course, write your own code to compensate for other types of excluded functions, but keep in mind that cloning MATLAB or toolbox functions violates your license agreement, which we don't encourage.

    The Compiler creates a list of excluded functions in the output directory. This list, mccExcludedFiles.log, which is often quite long, contains the full paths to the M-files that the Compiler removed from the list produced by depfun and a short explanation for why each function cannot be deployed.

    We publish a list of deployable functions; this list may change with each release of MATLAB and the deployable toolboxes.

    Unsupported Functions: Differences in the Execution Environment

    Certain MATLAB functions cannot be deployed because they act on objects that are not present in a deployed application. For example, since deployed applications have no command window, functions that modify the command window can't be deployed.

    Use of these functions in deployed application will result in mlint warnings in the MATLAB editor and runtime warnings in the deployed application. For example, calls to help generate this text when deployed:

    Warning: The HELP function cannot be used in compiled applications.

    You can prevent your deployed applications from displaying warnings like this via the isdeployed function. For more details, see the previous blog posting about path management in deployed applications.

    There are currently four non-deployable MATLAB functions:

    • help: Deployed applications typically do not include MATLAB or toolbox help files, so there's no need for a help function.
    • helpwin: helpwin displays help text in a new window; like help it is of very little use in deployed applications because deployed applications do not contain MATLAB's help files.
    • keyboard: Since a deployed component may be part of a graphical application with no keyboard input stream, the MATLAB Compiler does not not support deployment of the keyboard function.
    • savepath: A deployed application's path is fixed and unchangeable, which means there's no need to save the path.

    The versions of these functions that issue warnings override the versions in MATLAB because of their location in toolbox/compiler/deploy which always appears first on the path used by mcc and depfun.

    Functions that Work Differently

    Where the differences between MATLAB and the deployed environment are not insurmountable, the MATLAB Compiler and the Builder tools offer versions of MATLAB functions that have been modified to work in deployed applications.

    Eight important MATLAB functions work differently in deployed applications:

    • clc: In MATLAB, clc clears the command window. Since deployed applications do not display a command window, clc clears the shell or DOS window in which the application was started.
    • fopen: In a deployed application, fopen looks for the specified file in the unpacked CTF archive before looking for it in the current directory. This means that fopen will preferentially open files shipped with application, even if there's a file in the current directory with the same name. Note that preferring files contained in the CTF archive is most often exactly what you want the deployed fopen to do. Please don't be tempted to use a full path to work around this behavior as that may create further problems when the application is deployed. For more details on why, see the previous blog posting in this series.
    • home: In MATLAB, home returns the prompt to the upper left of the command window. In a deployed application, it moves the prompt to the upper left of the shell or DOS window from which the application was started. Unlike clc, home does not clear the existing text in the window.
    • input: The deployed version of input reads from the shell or DOS command window instead of the MATLAB command window. If there is no command window available, input will hang or fail.
    • loadlibrary: The version of loadlibrary which operates on C-style header interfaces is not available in deployed applications; but the Compiler does support loadlibrary with M-file prototypes.
    • pause: Like input, pause (without a timeout argument) reads from the shell or DOS command window in a deployed application. If no such window is available pause will pause forever.
    • print: A full accounting of the differences between printing in a deployed application and printing in MATLAB is beyond the scope of this posting. While you can still use print for generating bitmaps or JPEGs from your figures, for hardcopy you must use deployprint instead. For example:
            if isdeployed
                deployprint
            else
                print
            end
    • printdlg: Deployed applications only support the single argument version of printdlg, i.e., printdlg(fig).

    You'll find the code for the deployment-specific versions of these functions in toolbox/compiler/deploy. You can control if these functions get called in deployed applications using the isdeployed function. In the unlikely event that you need to change how these functions work, you can overload them by creating a function of the same name in your top-level application directory. The Compiler will set the path of the generated application so that your M-file gets called instead of the one in toolbox/compiler/deploy.

    Finally, please note that functions deployed as the main routine of a standalone executable may need to process their input arguments differently when deployed. When making a standalone executable from a MATLAB function that takes parameters, the MATLAB Compiler generates code to pass any user-supplied command line arguments to the compiled MATLAB function. However, the command line arguments are passed to the MATLAB function as strings, even if they are numeric. Therefore, any M-function serving as the entry point for a compiled MATLAB program should expect (or at least test for) string input. For example:

     function y = addten(x)
       if isstr(x)
           x = str2num(x);
       end
       y = x + 10;

    Deploying Your Applications

    Deploying your applications will be easier if you follow these guidelines:

    • Use isdeployed to protect non-deployable code from being called by deployed applications.
    • Remember that a deployed application has no MATLAB command window (or any of the other desktop tools, like the editor).
    • Don't change the path or rely on the current directory; deployed applications are supposed to be less flexible than MATLAB applications (this makes them more secure and easier to support).

    Most of the information in this series of postings will eventually end up in the documentation for the MATLAB Compiler. Please comment on anything you feel was unclear, and ask any questions you think were unanswered. And feel free to contact me if you have other questions about the MATLAB Compiler or the Builders or post your thoughts here.


    Get the MATLAB code

    Published with MATLAB® 7.6

    September 8th, 2008

    Finding Patterns in Arrays

    Recently, my colleague Jeff asked me if I would look at some code he wrote to find a pattern of numbers in a larger array. Without looking at his code, I asked if he had tried using strfind, despite his data not being strings. He found that it solved his problem and was faster than his M-file. In the meantime, I wanted to see what it took for me to write a simple algorithm I was thinking of in an M-file. I show and discuss the results here.

    Contents

    Simple Test Data

    Let me start off with really simple test data to be sure all algorithms are getting the right answers.

    a = [0 1 4 9 16 4 9];
    b = double('The year is 2003.');

    First Algorithm : findpattern

    Here's the first findpattern algorithm.

    type findpattern
    function idx = findpattern(in_array, pattern)
    %FINDPATTERN  Find a pattern in an array.
    %
    %    K = FINDPATTERN(ARRAY, PATTERN) returns the starting indices
    %    of any occurences of the PATTERN in the ARRAY.  ARRAY and PATTERN
    %    can be any mixture of character and numeric types.
    %
    %    Examples:
    %        a = [0 1 4 9 16 4 9];
    %        b = double('The year is 2003.');
    %        findpattern(a, [4 9])         returns [3 6]
    %        findpattern(a, [9 4])         returns []
    %        findpattern(b, '2003')        returns 13
    %        findpattern(b, uint8('2003')) returns 13
    %
    %    See also STRFIND, STRCMP, STRNCMP, STRMATCH.
    
    % Algorithm:
    % Find all of the occurrences of each number of the pattern.
    %
    % For an n element pattern, the result is an n element cell array.  The
    % i-th cell contains the positions in the input array that match the i-th
    % element of the pattern.
    %
    % When the pattern exists in the input stream, there will be a sequence
    % of consecutive integers in consecutive cells.
    
    % As currently implemented, this routine has poor performance for patterns
    % with more than half a dozen elements where the first element in the
    % pattern matches many positions in the array.
    
    locations = cell(1, numel(pattern));
    for p = 1:(numel(pattern))
        locations{p} = find(in_array == pattern(p));
    end
    
    % Find instances of the pattern in the array.
    idx = [];
    for p = 1:numel(locations{1})
        
        % Look for a consecutive progression of locations.
        start_value = locations{1}(p);
        for q = 2:numel(locations)
            
            found = true;
            
            if (~any((start_value + q - 1) == locations{q}))
                found = false;
                break;
            end
            
        end
        
        if (found)
            idx(end + 1) = locations{1}(p);
        end
        
    end
    

    You'll notice that Jeff chooses to store derived information on the pattern being present in a cell array, and then looks for consecutive locations.

    Here are some results using findpattern. First I set f to be a function handle to the function in question. Then I can reuse the same code for the other cases simply by redefining the function.

    f = @findpattern
    t(4) = false;
    t(1) = isequal(f(a, [4 9]), [3 6]);
    t(2) = isempty(f(a, [9 4]));
    t(3) = isequal(f(b, '2003'),13);
    t(4) = isequal(f(b, uint8('2003')),13);
    AOK = all(t==true)
    f = 
        @findpattern
    AOK =
         1
    

    My Homebrew Algorithm : findPattern2

    Here's my own algorithm. The idea here is to find possible pattern locations first, and winnow them out, marching through the pattern, which I am assuming is generally smaller, and often a lot smaller, than the data.

    type findPattern2
    function start = findPattern2(array, pattern)
    %findPattern2 Locate a pattern in an array.
    %
    %   indices = findPattern2(array, pattern) finds the starting indices of
    %   pattern within array.
    %
    %   Example:
    %   a = [0 1 4 9 16 4 9];
    %   patt = [4 9];
    %   indices = findPattern2(a,patt)
    %   indices =
    %        3     6
    
    % Let's assume for now that both the pattern and the array are non-empty
    % VECTORS, but there's no checking for this. 
    % For this algorithm, I loop over the pattern elements.
    len = length(pattern);
    % First, find candidate locations; i.e., match the first element in the
    % pattern.
    start = find(array==pattern(1));
    % Next remove start values that are too close to the end to possibly match
    % the pattern.
    endVals = start+len-1;
    start(endVals>length(array)) = [];
    % Next, loop over elements of pattern, usually much shorter than length of
    % array, to check which possible locations are valid still.
    for pattval = 2:len
        % check viable locations in array
        locs = pattern(pattval) == array(start+pattval-1);
        % delete false ones from indices
        start(~locs) = [];
    end
    

    Get results and time it.

    f = @findPattern2
    t(1) = isequal(f(a, [4 9]), [3 6]);
    t(2) = isempty(f(a, [9 4]));
    t(3) = isequal(f(b, '2003'),13);
    t(4) = isequal(f(b, uint8('2003')),13);
    AOK = all(t==true)
    f = 
        @findPattern2
    AOK =
         1
    

    Using strfind

    Next I test using the same data with strfind. Despite its name, strfind can happily handle non-character datatypes, and particularly doubles and integers.

    f = @strfind
    t(1) = isequal(f(a, [4 9]), [3 6]);
    t(2) = isempty(f(a, [9 4]));
    t(3) = isequal(f(b, '2003'),13);
    t(4) = isequal(f(b, uint8('2003')),13);
    AOK = all(t==true)
    f = 
        @strfind
    AOK =
         1
    

    Use Case and Performance

    Jeff described the problem he was solving in more detail. He has a file with multiple images in it, with data stored as uint8. The images are separated by a particular bit pattern. Let me show you one of the images in the sequence, after processing and extracting the frames.

    load forloren
    image(X(:,:,:,17)), axis off
    whos X
      Name      Size               Bytes  Class    Attributes
    
      X         4-D             26726400  uint8              
    
    

    Now let me show and time finding the pattern in the raw data. The data contain 29 images.

    clear
    load imrawdata
    whos
    pattern = [254 255 0 224];
    f = @()findpattern(rawdata, pattern);
    tfind = timeit(f);
    f = @()findPattern2(rawdata, pattern);
    tfind(2) = timeit(f);
    f = @()strfind(rawdata, pattern);
    tfind(3) = timeit(f)
      Name         Size                   Bytes  Class    Attributes
    
      rawdata      1x1259716            1259716  uint8              
    
    tfind =
          0.80941     0.011273     0.019194
    

    Puzzle and Next Steps

    In the case of the larger dataset, strfind is not the fastest algorithm, though I found with much smaller data, strfind outperformed findPattern2. Some possible reasons why findPattern2 is the fastest of the three algorithms:

  • it is not as general purpose and does no error checking,
  • it was only written to work on vectors,
  • it does nothing to handle cases where |NaN|s might be involved.
  • If I find out why, I will let you know. In the meantime, if you have any thoughts to add, please comment here.


    Get the MATLAB code

    Published with MATLAB® 7.6

    August 25th, 2008

    Piecewise Linear Interpolation

    John D'Errico is back today to talk about linear interpolation.

    Contents

    Introduction

    You saw in my previous blog that high order polynomials can have some problems. Why not go to the opposite extreme? Use a piecewise version of linear interpolation? I like to call it connect-the-dots, after the child's game of that name. This is really the simplest interpolation of all.

    In MATLAB, given a list of points, sampled from some functional relationship in one dimension, how would we perform piecewise linear interpolation? There are really two steps.

    For any point u, given a set of (x,y) pairs with a monotonic vector x (by monotonic, I mean that x(k) < x(k+1) ), first find the index k, such that

    Second, perform the linear interpolation to predict the value of y at x=u, between the pair of points (x(k),y(k)) and (x(k+1),y(k+1)).

    Each data point in the list of points becomes a point where the slope of the piecewise linear interpolant changes to a new value. However, the function is still continuous across those locations. So one might call these locations "knots" because at those points consecutive polynomial segments are tied together. Knots is a common term applied to splines for these locations; breaks is a common alternative name.

    Create Some Data to Interpolate

    x = linspace(0,2*pi,10)';
    y = sin(x);
    plot(x,y,'o')
    title 'A simple trig function'
    xlabel X
    ylabel Y

    Suppose I want to interpolate this function at some intermediate point, perhaps u == 1.3?

    u = 1.3;

    histc Solves the Binning Problem

    The first step I describe above is what I call binning. histc solves that problem efficiently.

    [k,k] = histc(u,x);
    k
    k =
         2
    

    As an aside, look at the way I took the output from histc. Since I only need the second output from histc but not the first output, rather than clutter up my workspace with a variable that I did not need, the output style [k,k] returns only the information I need.

    Next, it seems instructive to dive a little more deeply into binning, so let me offer a few alternatives to histc.

    Binning - A Loop With An Explicit Test

    Just a test inside a loop suffices.

    for k = 1:(length(x)-1)
      if (x(k) <= u) && (u < x(k+1))
        break
      end
    end
    x
    k
    x =
                0
          0.69813
           1.3963
           2.0944
           2.7925
           3.4907
           4.1888
           4.8869
           5.5851
           6.2832
    k =
         2
    

    Binning - A Semi-vectorized Test

    Do the binning with a single vectorized test. This works reasonably as long as I have only a scalar value for u, so I call this only a semi-vectorized solution.

    k = sum(u>=x)
    k =
         2
    

    When I want to place multiple points into their respective bins, this sum fails.

    There are other ways to place points in bins in MATLAB, including a sort, or with hash tables. You can find several different binning methods in bindex on the file exchange. It is useful mainly to those with older MATLAB releases, because histc became available with version 5.3 and later of MATLAB.

    Fully Vectorized Binning

    Next, I may, and often do, have a list of points to interpolate. This is a common event, where I wish to more finely resample a curve that is sampled only at some short list of points. This time, let me generate 1000 points at which to interpolate the sampled function.

    u = linspace(x(1),x(end),1000)';
    
    [k,k] = histc(u,x);

    This next line handles the very last point. Recall the definition of a bin as

    Note the strict inequality on the left. So that very last point (at x(end)) might be technically said to fall into the n|th bin when I have |n break points.

    On the other hand, it is more convenient to put anything that lies exactly at the very last break point into bin (n-1).

    By the way, any piecewise interpolation should worry about points that fall fully above or below the domain of the data. Will the code extrapolate or not? Should you extrapolate?

    n = length(x);
    k(k == n) = n - 1;

    Interpolation as a Linear Combination

    The final step is to interpolate between two points. Long ago, I recall from high school what was called a point-slope form for a line. If you know a pair of points that a line passes through, as (x(k),y(k)) and (x(k+1),y(k+1)), then the slope of the line is simple to compute. An equation for our line as a function of the parameter u is just:

    I can also rewrite this in a way that I like, by defining a parameter t as

    Then the interpolant is a simple linear combination of the function values at each end of the interval.

    Do the Interpolation and Plot the Result

    See how nicely this all worked, in a fully vectorized coding style.

    t = (u - x(k))./(x(k+1) - x(k));
    yu = (1-t).*y(k) + t.*y(k+1);
    
    plot(x,y,'bo',u,yu,'r-')
    xlabel X
    ylabel Y
    title 'The connect-the-dots interpolant'

    Use interp1 Instead

    It is always better to use a built-in tool to solve your problem than to do it yourself, so I might just as well have used interp1 to accomplish this interpolation.

    yinterp1 = interp1(x,y,u,'linear');
    
    plot(x,y,'bo',u,yinterp1,'r-')
    xlabel X
    ylabel Y
    title 'The connect-the-dots interpolant, using interp1'

    In my next blog I'll begin talking about piecewise cubic interpolants. Until then please tell me your ideas here. Are there some associated topics that I should cover?


    Get the MATLAB code

    Published with MATLAB® 7.6

    August 18th, 2008

    When to Create Classes in MATLAB

    I’m pleased to introduce today’s guest blogger, Nausheen Moulana. Nausheen manages the teams responsible for the design and development of the MATLAB language. She shares her thoughts on when you might want to create classes in MATLAB.

    Contents

    The thing I like most about the MATLAB environment is the flexibility I have as a programmer to choose the function/API that I think is required for the task at hand. And what's neat is that if I don't find or like what comes "in-the-box", I can create my own.

    The enhanced object-oriented programming capabilities in the R2008a release of MATLAB add some new tools to our programming toolbox. While I can still program with existing tools such as functions and scripts, here are some situations where creating a class makes good design sense.

    Create a New Data Type

    MATLAB, like most programming languages, has a set of primitive data types with a set of operations defined on them. However, there are situations when we may need to create our own data type. For example, I might want to use structure arrays to track employee information but want to restrict the addition of new fields or modification of the values associated with existing fields.

    employee.age = 23;
    employee.name = 'Jane';
    employee.ID = 17;
    employee(2) = struct('age',39,'name','Jerry','ID',45);
    employee
    employee = 
    1x2 struct array with fields:
        age
        name
        ID
    

    The problem I face is that the fields of structure arrays can be modified easily by assigning new values to them just like I can modify any variables that store primitive data types.

    On the other hand, if the variable employee is an object, only functions I choose (as creator of the class) can modify the fields.

    The ability to control access and modification of data is referred to as encapsulation. Designing a class facilitates encapsulation thereby improving the quality of my code since objects are robust to accidental modification.

    At this point, I'd like to clarify the distinction between classes and objects. Classes are a mechanism to describe interfaces to the functionality with which users will interact whereas objects are concrete instances of a class with their own data and related operations. Classes are designed and objects are used.

    Another reason I may want to create a new type is to extend or redefine operations on an existing type. For example, I may require integer arithmetic to wrap as opposed to saturate (which is what MATLAB does) on overflow. I may, in addition, require a sqrt function that works on integers which isn't implemented in MATLAB. I can address these requirements by subclassing from the appropriate integer class in MATLAB and overwriting the arithmetic operations such as +, -, .*, ./, .\ and .^ to wrap on overflow. I can also add a method to my new integer class that computes the sqrt for integer values. Subclassing facilitates code reuse thereby eliminating the need to create new capabilities from scratch.

    NOTE: If you do choose to override an arithmetic operator like plus, be aware that you may have to overload other functions that may use plus to ensure they give the right answer.

    Using MATLAB's external interface APIs and interoperability capabilities, I can connect to external systems be they external devices such as data acquisition tools or external libraries created using Java or Component Object Model(COM). A convenient way to interact with these systems is to design a class that manages data access and modification as well as provides sufficient control over the functionality in the external system that I'd like to expose in MATLAB. By designing a class, I will be able to add new behaviors as the libraries I interface with evolve without compromising compatibility. Also, if the external system is implicitly object-oriented then designing a class is the more natural way to present it in MATLAB.

    An example in MATLAB of a class which interfaces with an external library is the timer functionality which uses Java's timer object (java.util.Timer).

    % Create a timer object and set its
    % execution mode to schedule timer events.
    t = timer('TimerFcn','figure;imagesc(magic(5))','Period',1);
    t.ExecutionMode = 'fixedrate';
    % Set name of the timer
    t.Name = 'MyTimer';
    % Set the number of times to execute the callback function.
    % In this example, the image of magic squares is drawn twice,
    % once for each of the two events.
    t.TasksToExecute = 2;
    % Start the timer.
    start(t)
    % Stop and release the timer.
    stop(t)
    delete(t)

    In this example, designing a class allowed me to reuse timer functionality available in Java while giving me the flexibility to present APIs in a manner that is familiar to users interacting with objects in MATLAB.

    Note that when using new classes written in MATLAB 7.6, I don't need to explicitly call the delete method to remove an object from memory; MATLAB does this automatically when the object is no longer used. The ability for an object to clean up, via the destructor, when MATLAB determines that the object is no longer being used is just one of the many enhancements to the object-oriented capabilities in MATLAB 7.6.

    Manage the Lifecycle of Variables

    Generally MATLAB takes care of managing memory for me. For example, create a 50x50 matrix of doubles.

    a = rand(50);
    a = 5;

    With the second assignment to a, MATLAB takes care of freeing the memory allocated in the previous statement without any user intervention. Whenever variables go out of scope, are modified, or are no longer used, MATLAB automatically releases the memory associated with these variables when these variables happen to be primitive types. However, if the variables are resources such as file handles, fonts, or represent other types, then MATLAB needs to transfer control to that object to release resources when the object is no longer used.

    In some situations I may need to take additional action to explicitly manage the lifecycle of the variables that get created. By creating class destructors (the delete method), I can ensure that memory associated with a variable is freed up when the variable is no longer being used. For example, let's look at the memmapfile class in MATLAB. The memmapfile functionality creates an object that lets me map a file to memory whereby any operations that modify the object in memory have the effect of modifying the contents of the file on disk.

    Here's how I can map a file named records.dat to a series of unsigned 32-bit integers and set every other value to zero via the data property in memory.

       m = memmapfile('records.dat', 'format', 'uint32', ...
                      'writable', true);
       m.data(1:2:end) = 0;

    Since MATLAB does not explicitly allocate the memory for the data contained in the variable m, it cannot release the memory either (i.e., in this case, unmapping the mapped region in MATLAB's memory) when the memmapfile object goes out of scope or is modified via assignment, perhaps in a situation like this.

       m = rand(4);

    If I exposed the memory mapping functionality directly, I would risk inadvertent memory abuse if users of this class do not programmatically unmap mapped regions. However, an object-oriented approach mitigates this situation since it provides the class control over the creation and destruction of objects.

    Organize Code

    Keywords in MATLAB 7.6 allow me to define my class and organize my code in a single M-file. By having properties, methods, and events defined in the same file, I can gain an insight into class data and operations relatively quickly. With methods, I also have the flexibility of separating the function definition from implementation. I do so by defining the methods in the file containing the class definition and implementing the methods in separate files. Depending on the number of methods my class has, keeping method implementation in separate files can reduce clutter in the class definition.

    Summary

    MATLAB 7.6 has significant enhancements to object-oriented capabilities that simplify the process of designing classes while providing sufficient functionality to handle your most complex class design. When programming, if you find yourself needing to do any one of the above, then you should consider designing a class. I'd love to hear when you've found creating classes to be useful. Let me know here.


    Get the MATLAB code

    Published with MATLAB® 7.6


    Loren Shure works on design of the MATLAB language at The MathWorks. She writes here about once a week on MATLAB programming and related topics.

    • Loren: Jos- You’re right about the title! Ben- Thanks for another solution. –Loren
    • Jos: Again an interesting blog! However, the title of this blog is somewhat misleading. Shouldn’t the word...
    • Ben: I think Mike’s answer is probably the most intuitive, but I would have answered it like Jos only using...
    • Pär Lansåker: It is often better to use an obvious path for understanding. I would use a combinationd of diff(A)...
    • Joe P.: Paul R. Martin refers to “a chronic problem with cutting-and-pasting MatLab figures using Mac OS X...
    • Mike Koets: If we do want to allow answers of zero (I assumed we did not want that), then we can use this code...
    • Loren: Matt- Your solution is certainly viable as well! –Loren
    • matt fig: Here’s my crack at it. I don’t think it is too abstruce. zr = findstr(A,0); mx =...
    • Loren: Mike- I think it doesn’t give the right answer for this case. >> C = [1 2 -4 0 0]; >>...
    • Mike Koets: How about this: max(c(intersect(find (c),[find(c==0)+1 find(c==0)-1]))) The “intersect(fin d(c),...

    These postings are the author's and don't necessarily represent the opinions of The MathWorks.