# Cleve's Corner

## The Ardent Titan, part 2

Even though I am one of the founders of the MathWorks, I only acted as an advisor to the company for its first five years. During that time, from 1985 to 1989, I was trying my luck with two Silicon Valley computer startup companies. Both enterprises failed as businesses, but the experience taught me a great deal about the computer industry, and influenced how I viewed the eventual development of MATLAB. The second startup was known as Ardent Computer for most of its brief existence.

### Contents

#### MathWorks and Ardent

As soon as we could, we ported MATLAB to the Titan, and MathWorks established a working business relationship with Ardent Computer. On a trip to the West Coast, Jack Little and Jim Tung from MathWorks visited Ardent. In a meeting at Ardent, I gave Jack and Jim my Ardent business card and I gave the Ardent folks my MathWorks business card. The Ardent Titan eventually became the only computer in the world to ever have MATLAB delivered with its bundled software.

#### Matrix Laboratory

MATLAB and the Titan were excellent companions. The Titan's computational power came from its vector processor and, back then, MATLAB was still "Matrix Laboratory". We could demonstrate the Linpack Benchmark with one MATLAB command, or with one mouse click. The Linpack Benchmark provided the starting point of a deeper discussion of the capabilities of MATLAB.

MATLAB ran on top of Ardent's math libraries. Recall that Linpack was more than just a benchmark. Linpack and Eispack were the Fortran matrix libraries that got us started in this whole business. Randy Allen, who was one of Ken Kennedy's students, had come to Ardent to write vectorizing Fortran and C compilers. Vectorizing the Linpack code was easy because we had written it with vectoriziation in mind. Vectorizing the Eispack code is harder, both because the algorithms are more complex and because the coding style did not anticipate vector operations.

We also needed to have vectorized versions of the transcendental math functions. I had worked on the trig and exponential functions before Ardent because I had a research project at the University of New Mexico sponsored by IBM and a consulting gig at Convex that both involved vectorized math libraries. That work carried over readily to the MIPS chip that we were using on the Titan.

I had avoided the power function, $x^y$. That's harder because it involves two arguments. Which is the vector? I had rationalized not thinking about it saying by "Nobody ever wants to call the power function on a vector." But then one day we had a visit from a potential customer -- a large, international bank who shall remain nameless except to say their headquarters is in San Francisco. They were impressed with how easy it was to do their mortgage portfolio simulations in MATLAB, but we were concerned about performance. It turns out that most of the time was being spent in the power function on a vector. Fortunately, it was interest calculations of the form $s^{(n/365)}$ where $s$ is a scalar and $n$ is an integer vector containing numbers of days. That was easy to vectorize and add to the library.

#### Dore

Gustave Dore was a nineteenth century French engraver who illustrated works by Milton, Lord Byron, and Edgar Allen Poe, among others. Dore Avenue is an exit off the Bayshore Freeway on the way from Silicon Valley to the SFO Airport. When Michael Kaplan drove past Dore, the exit, it reminded him of Dore, the artist, and he decided to use the acronym for his Dynamic Object Rendering Environment.

I had met Michael in Oregon when he was trying to put Dore on the Intel Hypercube. That seemed like a good idea because ray tracing is computationally intensive and embarrassingly parallel. But the iPSC was pretty hopeless because, among other things, it didn't yet have a graphics output device. But Dore and the Titan were made for each other. The Titan had hardware support for all the tasks that Dore required, as well as a large frame buffer and a fast, high resolution display. Dore became the cornerstone of Ardent's graphics software environment.

In 1988 the graphics in MATLAB itself was still pretty basic. We did not yet have three dimensional color in our shipping product, and would not have it until 1992, with MATLAB version 4.0. So by linking MATLAB to Dore on the Titan we were able to anticipate capabilities that would not be available on other machines for another three or four years.

The September, 1988, issue of Computer, published by the IEEE Computer Society, included an extensive technical article by Ardent engineers about the Titan. The graphic on the cover shown above was chosen by the journal's editors. It is a generalization of the Moebius strip and was computed in MATLAB and rendered by Dore. Today we can create this graphic in MATLAB itself by applying shading and lighting to a surf plot

For me the even more exciting graphic is the vibrating membrane, an animation of the solution to the wave equation that underlies the MathWorks logo. This is my recreation in an animated .gif of something I first saw in Michael Kaplan's darkened office at Ardent sometime in 1988. It was the first time I had ever seen the membrane moving on a computer screen and it nearly brought tears to my eyes.

Much of what I know about computer graphics, I learned by accessing Dore from MATLAB while I was at Ardent. And many of the demos that Ardent used to show off the Titan at trade shows and customers visits involved MATLAB driving Dore. It was a terrific marriage -- while it lasted.

#### Competition

It took longer to bring the Titan to market than the original business plan called for. The machine was ultimately more expensive than the original business plan called for. It was bigger and heavier. It was noisier. And it required 220 volts.

It turned out to be harder to vectorize and much harder to parallelize most programs than everybody had hoped. The big-time applications on the big-time supercomputers like the Crays had been doing it for years, but most scientific and engineering applications that had been running on Vaxes and Sun workstations had not. We were beginning to see that Linpack performance was not so strongly correlated with the performance of more complicated programs as the machine architecture grew more complex. This is certainly true today.

Forest Baskett is a friend of mine. He has been at Stanford, at Los Alamos, and for a while he ran a Palo Alto research lab for DEC. He is now a venture capitalist in The Valley. But for a long time, including the time I was at Ardent, he was a VP at Silicon Graphics. In early 1989, he gave a seminar talk at Stanford about RISC with a deliberately provocative title like "Vector Computing is Dead". In the question period, I marched to the front of the seminar room and triumphantly produced my own overhead transparency with the latest performance numbers from the Titan. But Forest was right. We were the last of a dying breed.

#### Stellar

But lurking 2,686 miles to the east of Mountain View, California, in Waltham, Massachusetts, was Stellar Computer Corporation, founded by William Poduska. Poduska was a darling of the "Massachusetts Miracle", a lynchpin of Governor Michael Dukakis' unsuccessful campaign for president in 1988. Poduska had previously founded Prime Computer and Apollo Computer, which were both tremendous successes. (MathWorks recently acquired a landmark building in Natick that has previously been occupied by Prime Computer, as well as Carling Black Label and Boston Scientific. The street behind the building is named Prime Parkway.)

Poduska apparently had now had the same idea as Al Michels. We weren't sure of the details, but we knew that Stellar was developing a machine that would compete head on with ours, and that they would probably announce it at about the same time.

One Sunday before the machines were announced, the Boston Globe ran a long feature piece about the two companies. It was a East Coast versus West Coast thing. They pointed out Poduska's earlier successful startups, his support of the Boston Ballet and Boston Lyric Opera, and his membership on their boards. They called him a White Knight. They didn't know as much about Michels. They said that Ardent Computer was a "Silicon Valley boot camp".

The battle was underway.

#### Graphics, Dore and MATLAB

The Titan was more than just a fast vector computer. It also had powerful, integrated graphics hardware, supported by sophisticated graphics software known as Dore, for Dynamic Object Rendering Environment. Running on top of all of this was MATLAB, which I will tell you about in part two.

Tom Diede, Carl F. Hagenmaier, Glen S. Miranker, Jonathan J. Rubinsein, William S. Worley, Jr., The Titan Graphics Supercomputer Architecture, Computer, IEEE Computer Society, September, 1988, vol. 21, no. 9, pp. 13-28, <http://doi.ieeecomputersociety.org/10.1109/2.14344>

Get the MATLAB code

Published with MATLAB® R2013b

## The Intel Hypercube, part 2, reposted

The first posting earlier today was truncated. Here is the complete post. Even though I am one of the founders of the MathWorks, I only acted as an advisor to the company for its first five years. During that time, from 1985 to 1989, I was trying my luck with two Silicon Valley computer startup companies. Both enterprises failed as businesses, but the experience taught me a great deal about the computer industry, and influenced how I viewed the eventual development of MATLAB. This post continues the discussion of the Intel Personal Supercomputer

### Contents

#### Matrix Computation on the iPSC

As soon as I arrived at my new job in Oregon, I began to write programs for matrix computation on the iPSC, even though it would be a few months before we had a working machine. The programming language was Fortran, with calls to the operating system for message passing between the nodes of the Cube.

Computations were initiated and controlled from the front end Cube Manager. The matrices did not come from the Manager. They were test matrices, generated on the Cube, and distributed by columns. The actual numerical values rarely mattered. We were primarily interested in timing experiments for various operations.

We soon had communication routines like "gsend", global send, that sent a vector from one node to all the other nodes, and "gsum", global sum, that formed the sum of vectors, one from each node. These routines made use of the hypercube connectivity, but otherwise we could regard the nodes as fully interconnected.

As I described in my blog two weeks ago, Gaussian elimination would proceed in the following way. At the k -th step of the elimination, the node that held the k -th column would search it for the largest element. This is the k -th pivot. After dividing all the other elements in the column by the pivot to produce the multipliers, it would broadcast a message containing these multipliers to all the other nodes. Then, in the step that requires most of the arithmetic operations, all the nodes would apply the multipliers to their columns.

#### MATLAB and the iPSC

In 1985 and 1986 MATLAB was brand new. It had nothing in the way of parallel computing, and wouldn't for many years. So MATLAB could not run on the iPSC. The hardware on one node of the iPSC was the same as an IBM PC/AT, so in principle it could support MATLAB, but the bare bones operating system on the node could not, and there was no disc attached directly to a node.

The Cube Manager could not run MATLAB either. The Manager had an Intel 80286 CPU, but it ran Xenix, not Microsoft DOS, and MathWorks did not produce a release of MATLAB for this operating system.

Fortunately, our group had several Sun workstations. Acquiring them had created a bit of a stir within Intel because the Suns were based on Motorola chips and, at the time, Motorola was the competition in the microprocessor business. But we were independent of Mother Intel, so the Motorola ban did not apply to us. I could run MATLAB on the Sun on my desk, and using Sun's version of Unix, do a remote login to the iPSC Cube Manager. We did not yet have Mex files, so MATLAB could not communicate directly with the Cube. I ran timing experiments on the Cube, sent the results back to a file on the Manager, read the file into MATLAB, then modeled and plotted the data.

#### Knoxville Conference

In August, 1985, Mike Heath, who was then at Oak Ridge National Lab, organized a conference in Oak Ridge, Tennessee, on Hypercube Multiprocessors. Oak Ridge was going to be the next customer, after Yale, to receive an iPSC. They were also getting an nCUBE machine. The proceedings of the conference were published in 1986 by SIAM.

The Knoxville conference was the first public display of the iPSC. I gave a talk and published a paper in the proceedings entitled Matrix Computation on Distributed Memory Multiprocessors. I told the "megaflops per gallon" story about that talk in my blog about the LINPACK Benchmark last June.

The plots included in that paper are perhaps the first MATLAB plots published in any scientific paper. The entire paper was printed on a dot matrix printer and camera ready copy sent to SIAM. The plots are very crude by today's standards. Their resolution is less than a hundred dots per inch.

#### Another Look at Amdahl's Law

In the early days of parallel computing, Amdahl's law was widely cited as a fundamental limitation. It stated that if an experiment involving a fixed task was carried out on a parallel machine, then eventually the portion of the task that could not be parallelized would dominate the computation and increasing the number of processors would not significantly decrease the execution time.

But as I began to use the iPSC, it seemed sensible to vary the task by increasing the problem size as I increased the number of processors. In my Knoxville paper I wrote:

To fully utilize the system, we must consider problems involving many matrices, ..., or matrices of larger order. ... The performance is strongly dependent on the two parameters, n and p. For a given p, there is a maximum value of n determined by the amount of memory available.

$$n_{max} \approx \sqrt{pM}$$

After the Knoxville paper, I wrote a technical report at Intel Scientific Computers that appeared twice, under two slightly different titles, Another Look at Amdahl's Law, and A Closer Look at Amdahl's Law. I lost my own copies of the report in a flood of my home office several years ago and none of the friends that I've asked can find their copies. If anybody reading this blog happens to have a copy, I would appreciate hearing from them.

I was able to find an overhead transparency made around that time that has the following MATLAB plot. The graph does not have good annotation. It should follow earlier graphs in a series. It is similar to graphs in the Knoxville/SIAM paper, although it involves a larger machine with more processors

Each dashed line shows how the megaflop rate, f, for the LU decomposition of a matrix of a fixed size, varies as the number of processors, p, is increased. The first line is for a matrix of order n = 100. This line is almost level at p = 16. This is the order for the original LINPACK benchmark and is a good example of Amdahl's Law. For a 100-by-100 matrix, increasing the number of processors beyond 16 does not provide any significant parallel speedup.

The line with the circles filled in to make black dots is n = 1000. We are still seeing significant speedup when we reach the maximum number of processors, p = 128. The top circle, with no dashed line connected to it, is a matrix of order n = 1959. With room for only 30K double precision words per node, this is the largest matrix that can be distributed across the full machine.

The upper envelope of these curves begins to approach the $45^\circ$ line that represents perfect speedup. It previews the point that I would later make in the now lost technical report. If you increase the problem size to make use of the increased memory available with an increased number of processors, you can increase parallel efficiency and at least partially mitigate the losses modeled by Amdahl's Law. This phenomenon has come to be known as weak speedup or weak scaling.

#### Embarrassingly Parallel

The term "embarrassingly parallel" is widely used today in parallel computing. I claim to be responsible for inventing that phrase. I intended to imply that it was embarrassing because it was so easy -- there was no parallel programming involved. Quoting a Web page by Ian Foster, Wikipedia defines an "embarrassingly parallel" problem as "one for which little or no effort is required to separate the problem into a number of parallel tasks. This is often the case where there exists no dependency (or communication) between those parallel tasks."

Wikipedia says that the etymology of the phrase "embarrassingly parallel" is not known, but they do reference my Knoxville Hypercube paper. Here is what I wrote.

One important way in which LINPACK and EISPACK will be used on such machines is in applications with many tasks involving matrices small enough to be stored on a single processor. The conventional subroutines can be used on the individual processors with no modification. We call such applications "embarrassingly parallel" to emphasize the fact that, while there is a high degree of parallelism and it is possible to make efficient use of many processors, the granularity is large enough that no cooperation between the processors is required within the matrix computations.

#### Mandelbrot Set

The cover of the August, 1985, issue of Scientific American featured an image that had a lasting impact on the world of computer graphics, parallel computing, and me -- the Mandelbrot set. An article by A. K. Dewdney entitled simply Computer Recreations was subtitled "A computer microscope zooms in for a look at the most complex object in mathematics". (By the way, Dewdney would later replace the legendary Martin Gardner and Douglas Hofstadter in authoring Scientific American's mathematics column.)

It was easy to write a parallel program for the iPSC to compute the Mandelbrot set. The code fit on one overhead transparency. And it was fast! The trouble was we couldn't display the results. It would take forever to send the image to the front end. Even then we didn't have any high resolution display. It would be a couple of years -- and access to a serious graphics supercomputer -- before I could do the Mandelbrot set justice.

#### Legacy

I left Intel Scientific Computers in Oregon to join another startup in Silicon Valley itself in 1987. I'll tell you about that adventure in my next post.

The iPSC/1 that I have been describing was not a commercial success. It wasn't really a supercomputer. Look at the graph above. Three megaflops on the LINPACK benchmark with a half million dollar machine in 1985 just didn't cut it. And a half megabyte of memory per node was not nearly enough. We knew that, and by the time I left, iSC had introduced models with more memory, and with vector processors, at the expense of fewer nodes.

There was very little software on the iPSC/1. The operating system on the Cube was minimal. The development environment on the Manager was minimal. And while I was there, I was in charge of the group that would have developed applications, but I'm not sure it occurred to us to actually do that. We just did demos and math libraries.

I think I can say that the iPSC/1 was an important scientific milestone. It played a significant role in research at a number of universities and laboratories. Intel Scientific Computers went on to build other machines, based on other Intel chips, that were commercially successful, but they didn't involve me, and didn't involve MATLAB.

I learned a lot. At the time, and to some extent even today, many people thought of MATLAB as just "Matrix Laboratory". So, to get a "Parallel MATLAB", one just had to run on top of a parallelized matrix library. I soon realized that was the wrong level of granularity. On the iPSC, that would mean running MATLAB on the Cube Manager, generating a matrix there, sending it over the ethernet link to the Cube, doing the parallel computation on the Cube, and eventually sending the result back to the front end.

My experience with the iPSC in 1985 taught me that sending messages from MATLAB on a front end machine to a parallel matrix library on a back end machine is a bad idea. I said so in my "Why There Isn't A Parallel MATLAB" Cleve's Corner in 1995. But other people were still trying it with their versions of Parallel MATLAB in 2005.

Today, we have parallelism at several levels with MATLAB. There is automatic fine-grained multithreading in the vector and matrix libraries that most users never think about. There may also be many threads in a GPU on machines that happen to have one. But the real legacy of the iPSC is the clusters and clouds and MATLAB Distributed Computing Server. There, we are finding that, by far, the most popular feature is the parfor construct to generate embarrassingly parallel jobs.

#### References

Cleve Moler, Matrix Computation on Distributed Memory Multiprocessors, in Hypercube Multiprocessors, Proceedings of the First Conference on Hypercube Multiprocessors, edited by Michael T. Heath, Oak Ridge National Laboratory, ISBN 0898712092 / 9780898712094 / 0-89871-209-2, SIAM, 181-195, 1986. <http://www.amazon.com/Hypercube-Multiprocessors-1986-Michael-Heath/dp/0898712092>

Cleve Moler, Another Look at Amdahl's Law, Technical Report TN-02-0587-0288, Intel Scientific Computers, 1987.

Cleve Moler, A Closer Look at Amdahl's Law, Technical Report TN-02-0687, Intel Scientific Computers, 1987.

A. K. Dewdney, Computer Recreations, Scientific American, 1985, <http://www.scientificamerican.com/media/inline/blog/File/Dewdney_Mandelbrot.pdf>

Get the MATLAB code

Published with MATLAB® R2013b

## The Intel Hypercube, part 2

### The Intel Hypercube, part 2

Get the MATLAB code

Published with MATLAB® R2013b

## The Intel Hypercube, part 1

Even though I am one of the founders of the MathWorks, I only acted as an advisor to the company for its first five years. During that time, from 1985 to 1989, I was trying my luck with two Silicon Valley computer startup companies. Both enterprises failed as businesses, but the experience taught me a great deal about the computer industry, and influenced how I viewed the eventual development of MATLAB. The first of these startups developed the Intel Hypercube.

### Contents

#### Caltech Cosmic Cube

In 1981, Caltech Professor Chuck Seitz and his students developed one of the world's first parallel computers, which they called the Cosmic Cube. There were 64 nodes. Each node was a single board computer based on the Intel 8086 CPU and 8087 floating point coprocessor. These were the chips that were being used in the IBM PC at the time. There was room on the board of the Cosmic Cube for only 128 kilobytes of memory.

Seitz's group designed a chip to handle the communication between the nodes. It was not feasible to connect a node directly to each of the 63 other nodes. That would have required $64^2 = 4096$ connections. Instead, each node was connected to only six other nodes. That required only $6 \times 64 = 384$ connections. Express each node's address in binary. Since there are $2^6$ nodes this requires six bits. Connect a node to the nodes whose addresses differ by one bit. This corresponds to regarding the nodes as the vertices of a six-dimensional cube and making the connections along the edges of the cube. So, these machines are called "hypercubes".

The graphic shows the 16 nodes in a four-dimensional hypercube. Each node is connected to four others. For example, using binary, node 0101 is connected to nodes 0100, 0111, 0001, and 1101.

Caltech Professor Geoffrey Fox and his students developed applications for the Cube. Among the first were programs for the branch of high energy physics known as quantum chromodynamics (QCD) and for work in astrophysics modeling the formation of galaxies. They also developed a formidable code to play chess.

#### Silicon Forest

The first startup I joined wasn't actually in Silicon Valley, but in an offspring in Oregon whose boosters called the Silicon Forest. By 1984, Intel had been expanding its operations outside California and had developed a sizeable presence in Oregon. Gordon Moore, one of Intel's founders and then its CEO, was a Caltech alum and on its Board of Trustees. He saw a demo of the Cosmic Cube at a Board meeting and decided that Intel should develop a commercial version.

Two small groups of Intel engineers had already left the company and formed startups in Oregon. One of the co-founders of one of the companies was John Palmer, a former student of mine and one of the authors of the IEEE 754 floating point standard. Palmer's company, named nCUBE, was already developing a commercial hypercube.

Hoping to dissuade more breakoff startups, Intel formed two "intrepreneurial" operations in Beaverton, Oregon, near Portland. The Wikipedia dictionary defines intrapreneurship to be "the act of behaving like an entrepreneur while working within a large organization." Justin Rattner was appointed to head one of the new groups, Intel Scientific Computers, which would develop the iPSC, the Intel Personal Supercomputer.

UC Berkeley Professor Velvel Kahan, was (and still is) a good friend of mine. He had been heavily involved with Intel (and Palmer) on the development of the floating point standard and the 8087 floating point chip. He recommended that Intel recruit me to join the iPSC group, which they did.

At the time in 1984, I had been chairman of the University of New Mexico Computer Science Department for almost five years. I did not see my future in academic administration. We had just founded The MathWorks, but Jack Little was quite capable of handling that by himself. I was excited by the prospect of being involved in a startup and learning more about parallel computing. So my wife, young daughter, and I moved to Oregon. This involved driving through Las Vegas and the story that I described in my Potted Palm blog post.

#### Distributed Arrays

As I drove north across Nevada towards Oregon, I was hundreds of miles from any computer, workstation or network connection. I could just think. I thought about how we should do matrix computation on distributed memory parallel computers when we got them working. I knew that the Cosmic Cube guys at Caltech had broken matrices into submatrices like the cells in their partial differential equations. But LINPACK and EISPACK and our fledgling MATLAB stored matrices by columns. If we preserved that column organization, it would be much easier to produce parallel versions of some of those programs.

So, I decided to create distributed arrays by dealing their columns like they were coming from a deck of playing cards. If there are p processors, then column j of the array would be stored on processor with identification number mod(j, p).

Gaussian elimination, for example, would proceed in the following way. At the k -th step of the elimination, the node that held the k -th column would search it for the largest element. This is the k -th pivot. After dividing all the other elements in the column by the pivot to produce the multipliers, it would broadcast a message containing these multipliers to all the other nodes. Then, in the step that requires most of the arithmetic operations, all the nodes would apply the multipliers to their columns.

This column oriented approach could be used to produce distributed memory parallel versions of the key matrix algorithms in LINPACK and EISPACK.

#### Intel Personal Supercomputer

Introducing in 1985, the iPSC was available in three models, the d5, d6, and d7, for 5, 6, and 7-dimensional hypercubes. The d5 had 32 nodes in the one cabinet pictured here. The d6 had 64 nodes in two of these cabinets, and the d7 had 128 nodes in four cabinets. The list prices ranged from 170 thousand dollars to just over half a million.

Each node had an Intel 80286 CPU and an 80287 floating point coprocessor. These were the chips used in the IBM PC/AT, the "Advanced Technology" personal computer that was the fastest available at the time. There was 512 kB, that's half a megabyte, of memory. A custom chip handled the hypercube communication with the other nodes, via the backplane within a cabinet and via ethernet between cabinets.

It was possible to replace half the nodes with boards that were populated with memory chips, 4 megabytes per board, to give 4.5 megabytes per node. That would turn one cabinet into a d4 with a total of 72 megabytes of memory. A year later another variant was announced that had boards with vector floating point processors.

A front end computer called the Cube Manager was an Intel-built PC/AT microcomputer with 4 MBytes of RAM, a 140 MByte disk, and a VT100-compatible "glass teletype". The Manager had direct ethernet connections to all the nodes. We usually accessed it by remote login from workstations at our desks. The Manager ran XENIX, a derivative of UNIX System III. There were Fortran and C compilers. We would compile code, build an executable image, and download it to the Cube.

There was a minimal operating system on the cube, which handled message passing between nodes. Messages sent between nodes that were not directly connected in the hypercube interconnect would have to pass through intermediate nodes. We soon had library functions that included operations like global broadcast and global sum.

#### The Lights

If you look carefully in the picture, you can see red and green LEDs on each board. These lights proved to be very useful. The green light was on when the node was doing useful computation and the red light was on when the node was waiting for something to do. You could watch the lights and get an idea of how a job was doing and even some idea of its efficiency.

One day I was watching the lights on the machine and I was able to say "There's something wrong with node 7. It's out of sync with the others." We removed the board and, sure enough, a jumper how been set incorrectly so the CPU's clock was operating at 2/3 its normal rate.

At my suggestion, a later model had a third, yellow, light that was on when then the math coprocessor was being used. That way one could get an idea of arithmetic performance.

#### First Customer Ship

In the computer manufacturing business, the date of First Customer Ship, or FCS, is a Big Deal. It's like the birth of a child. Our first customer was the Computer Science Department at Yale University and, in fact, they had ordered a d7, the big, 128-node machine. When the scheduled FCS date grew near, the machine wasn't quite ready. So we had Bill Gropp, who was then a grad student at Yale and a leading researcher in their computer lab, fly out from Connecticut to Oregon and spend several days in our computer lab. So it was FCS all right, but of the customer, not of the equipment.

#### Star Wars

It was during this time that President Ronald Reagan's administration had proposed the Strategic Defense Initiative, SDI. The idea was to use both ground-based and space-based missiles to protect the US from attack by missiles from elsewhere. The proposal had come to be known by its detractors as the "Star Wars" system.

One of the defense contractors working on SDI believed that decentralized, parallel computing was the key to the command and control system. If it was decentralized, then it couldn't be knocked out with a single blow. They heard about our new machine and asked for a presentation. Rattner, I, and our head of marketing went to their offices near the Pentagon. We were ushered into a conference room with the biggest conference table I had ever seen. There were about 30 people around the table, half of them in military uniforms. The lights were dim because PowerPoint was happening.

I was about halfway through my presentation about how to use the iPSC when a young Air Force officer interrupts. He says, "Moler, Moler, I remember your name from someplace." I start to reply, "Well, I'm one of the authors of LINPACK and ..." He interrupts again, "No, I know that... It's something else." Pause. "Oh, yeh, nineteen dubious ways!" A few years earlier Charlie Van Loan and I had published "Nineteen dubious ways to compute the exponential of a matrix." It turns out that this officer had a Ph.D. in Mathematics and had been teaching control theory and systems theory at AFIT, the Air Force Institute of Technology in Ohio. So for the next few minutes we talked about eigenvalues and Jordan Canonical Forms while everybody else in the room rolled their eyes and looked at the ceiling.

#### Preview of part 2

In my next post, part 2, I will describe how this machine, the iPSC/1, influenced both MATLAB, and the broader technical computing community.

#### Reference

Caltech Cosmic Cube

Get the MATLAB code

Published with MATLAB® R2013b

## Complex Step Differentiation

Complex step differentiation is a technique that employs complex arithmetic to obtain the numerical value of the first derivative of a real valued analytic function of a real variable, avoiding the loss of precision inherent in traditional finite differences.

### Contents

#### Stimulation

Many months ago I met Ricardo (Pax) Paxson, head of our Computational Biology group, in the MathWorks cafeteria.

• Pax: Hey, Cleve, you ought to blog about that complex step algorithm you invented.
• Me: What are you talking about?
• Pax: You know, differentiation using complex arithmetic. It's hot stuff. We're using it in SimBiology.
• Me: I don't remember inventing any complex step differentiation algorithm.
• Pax: Well, an old paper of yours is referenced. Lots of people are using it today.

Then last February I met Joaquim Martins, a Professor of Aerospace Engineering at the University of Michigan, during the SIAM CS&E meeting. He gave me three papers he had written about the complex step method that he and his colleagues were using to compute derivatives of simulations that couple computational fluid dynamics with structural finite-element analysis for aircraft design applications

Joaquim's papers refer to a paper on numerical differentiation using complex arithmetic that James Lyness and I had published in the SIAM Journal of Numerical Analysis in 1967. That's almost 50 years ago.

#### Lyness and Moler

James Lyness was a buddy of mine. We met at the ETH in Zurich when I was visiting there on my postdoc in 1965-66. He was an Englishman a few years older than I, visiting the ETH at roughly the same time. After that visit, he spent the rest of his professional career at Argonne Laboratory. He didn't work on LINPACK and EISPACK, but I would see him at Argonne when I visited there to work on those packages. The photo that I often show of the four LINPACK authors lounging on the back of Jack Dongarra's car was taken during a party at the Lyness home.

That year in Zurich was very productive research wise. James had lots of interesting ideas and we published several papers. One of them was "Numerical Differentiation of Analytic Functions". It introduces, I guess for the first time, the idea of using complex arithmetic on functions of a real variable. But the actual algorithms are pretty exotic. They are derived from Cauchy's Integral Theorem and involve things like Moebius numbers and the Euler Maclaurin summation formula. If you asked me to describe them in any detail today, I wouldn't be able to.

The complex step algorithm that I'm about to describe is much simpler than anything described in that 1967 paper. So, although we have some claim of precedence for the idea of using complex arithmetic on real functions, we certainly cannot claim to have invented today's algorithm.

The complex step algorithm was first described in a brief paper by William Squire and George Trapp in SIAM Review in 1998. Martins, together with his colleagues and students, offers an extensive guide in the papers I refer to below.

#### The Algorithm

We're concerned with an analytic function. Mathematically, that means the function is infinitely differentiable and can be smoothly extended into the complex plane. Computationally, it probably means that it is defined by a single "one line" formula, not a more extensive piece of code with if statements and for loops.

Let $F(z)$ be such a function, let $x_0$ be a point on the real axis, and let $h$ be a real parameter. Expand $F(z)$ in a Taylor series off the real axis.

$$F(x_0+ih) = F(x_0)+ihF'(x_0)-h^2F''(x_0)/2!-ih^3F^{(3)}/3!+...$$

Take the imaginary part of both sides and divide by $h$.

$$F'(x_0) = Im(F(x_0+ih))/h + O(h^2)$$

Simply evaluating the function $F$ at the imaginary argument $x_0+ih$, and dividing by $h$, gives an approximation to the value of the derivative, $F'(x_0)$, that is accurate to order $O(h^2)$. We might as well choose $h=10^{-8}$. Then the error in the approximation is about the same size as the roundoff error involved in storing a double precision floating point value of $F'(x_0)$.

Here, then, is the complex step differentiation algorithm in its entirety:

$$h = 10^{-8}$$

$$F'(x_0) \approx Im(F(x_0+ih))/h$$

#### An Example

Here is one of my favorite examples. I used it in the 1967 paper and it has become a kind of standard in this business ever since.

$$F(x) = \frac{\mbox{e}^{x}}{(\cos{x})^3+(\sin{x})^3}$$

   F = @(x) exp(x)./((cos(x)).^3 + (sin(x)).^3)

F =
@(x)exp(x)./((cos(x)).^3+(sin(x)).^3)


Here's a plot over a portion of the $x$-axis, with a dot at $\pi/4$.

   ezplot(F,[-pi/4,pi/2])
axis([-pi/4,pi/2,0,6])
set(gca,'xtick',[-pi/4,0,pi/4,pi/2])
line([pi/4,pi/4],[F(pi/4),F(pi/4)],'marker','.','markersize',18)


Let's find the slope at the dot, that's $F'(\pi/4)$.

The complex step approximation where I have not yet fixed the step size h is

   Fpc = @(x,h) imag(F(x+i*h))/h;


The centered finite difference approximation with step size h is

   Fpd = @(x,h) (F(x+h) - F(x-h))/(2*h);


Compare them for a sequence of decreasing h.

   format long
disp(['           h             complex step     finite differerence'])
for h = 10.^(-(1:16))
disp([h Fpc(pi/4,h) Fpd(pi/4,h)])
end

           h             complex step     finite differerence
0.100000000000000   3.144276040634560   3.061511866568119
0.010000000000000   3.102180075411270   3.101352937655877
0.001000000000000   3.101770529535847   3.101762258158169
0.000100000000000   3.101766435192940   3.101766352480162
0.000010000000000   3.101766394249620   3.101766393398542
0.000001000000000   3.101766393840188   3.101766393509564
0.000000100000000   3.101766393836091   3.101766394841832
0.000000010000000   3.101766393836053   3.101766443691645
0.000000001000000   3.101766393836052   3.101766399282724
0.000000000100000   3.101766393836052   3.101763290658255
0.000000000010000   3.101766393836053   3.101785495118747
0.000000000001000   3.101766393836053   3.101963130802687
0.000000000000100   3.101766393836052   3.097522238704187
0.000000000000010   3.101766393836053   3.108624468950438
0.000000000000001   3.101766393836052   3.108624468950438
0.000000000000000   3.101766393836053   8.881784197001252


As predicted, complex step has obtained its full double precision result half way through the sequence of h's. On the other hand, the finite difference method never achieves more than half double precision accuracy and completely breaks down by the time h reaches the last value.

#### Symbolic Toolbox

Since we have access to the Symbolic Toolbox, we can get the exact answer. Redefine x and F to be symbolic.

   syms x
F = exp(x)/((cos(x))^3 + (sin(x))^3)

F =
exp(x)/(cos(x)^3 + sin(x)^3)


Differentiate and evaluate the derivative at $\pi/4$.

   Fps = simple(diff(F))
exact = subs(Fps,pi/4)
flexact = double(exact)

Fps =
(exp(x)*(cos(3*x) + sin(3*x)/2 + (3*sin(x))/2))/(cos(x)^3 + sin(x)^3)^2
exact =
2^(1/2)*exp(pi/4)
flexact =
3.101766393836051


This verifies the result we obtained with the complex step.

#### Error Plot

Now that we now the exact answer, we can plot the error from computing the first derivative by complex step and finite differences.

   h = 10.^(-(1:16)');
errs = zeros(16,2);
for k = 1:16
errs(k,1) = abs(Fpc(pi/4,h(k))-exact);
errs(k,2) = abs(Fpd(pi/4,h(k))-exact);
end
loglog(h,errs)
axis([eps 1 eps 1])
set(gca,'xdir','rev')
legend('complex step','finite difference','location','southwest')
xlabel('step size h')
ylabel('error')


This confirms that complex step is accurate for small step sizes whereas the finite difference approach never achieves full accuracy.

#### References

Lyness, James N., and Moler, Cleve B., Numerical Differentiation of Analytic Functions, SIAM J. Numerical Analysis 4, 1967, pp. 202-210. epubs.siam.org/doi/abs/10.1137/0704019

Squire, William, and Trapp, George, Using complex variables to estimate derivatives of real functions, SIAM Review 40, 1998, pp. 110-112. epubs.siam.org/doi/abs/10.1137/S003614459631241X

Martins, Joaquim R. R. A., Sturdza, Peter, and Alonso, Juan J., The Complex-Step Derivative Approximation, ACM Transactions on Mathematical Software 29, 2003, pp. 245-262, Martins2003CSD.pdf

Martins, Joaquim R. R. A., A Guide to the Complex-Step Derivative Approximation, complex step guide

Kenway, Gaetan K.W., Kennedy, Graeme J., and Martins, Joaquim R. R. A., A scalable parallel approach for high-fidelity steady-state aeroelastic analysis and adjoint derivative computations, AIAA Journal (In Press), kenway2013a.pdf

Get the MATLAB code

Published with MATLAB® R2013b

## Iterated Powers Fractal

Iterating powers of complex numbers can produce cycles. Colormaps of the lengths of these cycles produce fractal images.

### Contents

#### UWO Poster

In my blog a month ago I mentioned a poster produced at the University of Western Ontario about the Lambert W function. And in my blog two weeks ago about the tower of powers

$$z^{z^{z^z}}$$

I said that if the base $z$ is less that $\mbox{e}^{-\mbox{e}}$, which is about 0.066, then the iteration evaluating the tower settles into a stable two-cycle. Both of these posts lead to today's blog where $z$ moves in the complex plane.

#### The Iteration

The base of the iteration is a real or complex number $z$ that stays fixed throughout the iteration. Start with $y_0 = 1$ and simply iterate

$$y_{k+1} = z^{y_k}$$

If $z$ is too large, then $y_k$ will quickly blow up. As we saw in my last blog, this happens for real $z$ larger than $\mbox{e}^{1/\mbox{e}}$, which is about $1.444$.

For some $z$, the iterates $y_k$ approach a fixed point. An example is $z = \sqrt{2}$. The $y_k$ converge to $2$. We regard fixed points as cycles of length 1.

#### An Example

Here is an example of a complex $z$ that produces a cycle of length 7, $z = 2.5+4i$.

   format short
z = 2.5+4i
y = 1
for k = 1:7
y = z^y;
disp(y)
end

z =
2.5000 + 4.0000i
y =
1
2.5000 + 4.0000i
-0.6503 + 0.5363i
0.2087 + 0.0366i
1.2845 + 0.3528i
-1.4011 + 4.9361i
7.6883e-04 - 3.4362e-05i
1.0012 + 0.0007i


After 7 steps, we're not quite back at y = 1, but we're pretty close. Repeating the iteration a few dozen times brings us to a stable cycle of length 7 near this initial transient. Here is a picture.

#### Cycle Length

Finding cycle length involves saving the iterates on a stack and comparing each new iterate with the previous ones. The internal parameters of the function are the tolerance in the comparison, the stack size, and the maximum number of iterations. The cycle length is declared to be infinite if either the iterates go off to infinity or no match is found before the iteration limit is reached.

Here is the function.

   type cycle

function c = cycle(z)
% CYCLE  Cycle length of iterated power
% c = cycle(z)
tol = 1.e-4;   % Tolerance
N = 40;        % Height of stack
M = 2000;      % Maximum number of iterations
S = zeros(N,1);   % Stack
S(:) = NaN;
y = 1;
k = 0;
c = Inf;
while k < M
y = z^y;
k = k+1;
if isinf(y)
break
elseif any(abs(y-S) < tol)
c = min(find(abs(y-S) < tol));
break
end
S(2:N) = S(1:N-1);
S(1) = y;
end
end



Let's test the function on a few cases where we already know the answer.

   cycle(sqrt(2))  % Should be 1
cycle(1.5)      % Should be Inf
cycle(.05)      % Should be 2
cycle(2.5+4i)   % Should be 7

ans =
1
ans =
Inf
ans =
2
ans =
7


#### Colormaps

Good colormaps are an important aspect of generating good fractal images. I am using the colormaps from the Mandelbrot chapter of Experiments with MATLAB. These take small chunks of the colormaps from MATLAB itself and repeat them several times. The maps are called jets, hots, and cmyk and are included in the mandelbrot.m code. Here is one of them.

   type jets

function cmap = jets(m)
c = jet(16);
e = ones(m/16,1);
cmap = kron(e,c);
end



#### Generate an Image

To generate a fractal image, put a grid over a region in the complex region, compute the cycle length for each grid point, and then use the MATLAB image function and one of the colormaps to produce a pseudocolor image. (This is an embarrassingly parallel task and has for many years been used to demonstrate parallel computers. Load balancing is a challenge. But I digress ...)

#### First Image

The image at the top of this post shows a large chunk of the complex plane. With the cmyk colormap, the infinite cycle lengths come out black. I can see that with a spy plot on my machine where I have saved the data.

   load view1
spy(isinf(u))
set(gca,'ydir','n')


While I am it, let's check the density of Inf's.

   nnz(isinf(u))/prod(size(u))

ans =
0.2602


Over a quarter of the grid points have infinite cycle lengths.

#### Second Image

Zoom in on a smaller region in the plane, and use the jets colormap with its columns flipped. This is typical of the structure we see everywhere in this fractal, at all magnification levels. They are certainly not as rich as the Mandelbrot set, but I still them interesting.

#### Third Image

Zoom in farther, and use the hots colormap. The large brown region in the northwest corner is not infinite cycle length. It happens to be cycle == 16.

Here is the code that generates this image. It takes about 5 minutes to run on my laptop, which is a couple of years old.

   type view3

x0 = 2.3825;
y0 = 3.7227;
d = 1.e-5;
nhalf = 200;
v = d*(-nhalf:nhalf);
[x,y] = meshgrid(x0+v,y0+v);
z = x+i*y;
u = zeros(size(z));

n = size(z,1)
tic
for k = 1:n
for j = 1:n
u(k,j) = cycle(z(k,j));
end
k
end
toc

image(u)
axis square
colormap(hots(128))
set(gca,'ydir','n')
set(gca,'xtick',[1,n/2,n])
set(gca,'ytick',[1,n/2,n])
set(gca,'xticklabel',{min(x(:)),num2str(x0),max(x(:))})
set(gca,'yticklabel',{min(y(:)),num2str(y0),max(y(:))})
xlabel('x')
ylabel('y')



#### Reference

The Lambert W Function, Poster, Ontario Research Centre for Computer Algebra, <http://www.orcca.on.ca/LambertW>

Get the MATLAB code

Published with MATLAB® R2013b

## Tower of Powers

I first investigated the tower of powers

$$z^{z^{z^z}}$$

when I was in junior high school and was about 14 or 15 years old. I return to it every few decades. This is another one of those episodes.

### Contents

#### Associativity

Does

$$2^{3^4}$$

mean

$${(2^3)}^4 = 8^4 = 2^{12}, \ \mbox{or}, \ 2^{(3^4)} = 2^{81}$$

It makes a big difference.

Addition and multiplication are associative. By that we mean

   2*3*4

ans =
24


could have been computed from either

   (2*3)*4

ans =
24


or

   2*(3*4)

ans =
24


They're both the same.

But subtraction, division, and exponentiation are not associative. MATLAB associates from left to right. So

   format rat
2/3/4

ans =
1/6


was computed by

   (2/3)/4

ans =
1/6


and not by

   2/(3/4)

ans =
8/3


#### Mathematical convention

The usual mathematical convention for exponentiation is right associativity.

$$a^{b^c} = a^{(b^c)}$$

This is because left associativity can be replaced by multiplication.

$${(a^b)}^c = a^{bc}$$

So, my tower of powers is

$$z^{z^{z^z}} = z^{(z^{(z^z)})}$$

And, I want to make this tower infinitely high.

#### WS Algorithm

I've been stretching my typographical luck with superscripts and parentheses here. Let's settle down.

For a given value of $z$, let $y$ denote the infinite extension of the tower at the end of the last paragraph. Then, since it's infinite, and associated to the right, we have

$$y = z^y$$

Now that's at least easier to typeset.

There is an obvious way to try to compute it. I call this the WS or World's Simplest algorithm.

$$y_1 = 1$$

$$\mbox{for} \ n = 1, ..., \ \ y_{n+1} = z^{y_n}$$

Of course, this is better known as the Fixed Point algorithm

$$y_{n+1} = f(y_n)$$

for the function

$$f(z) = z^y$$.

#### Two experiments

Let's try two different values of $z$, $1.25$ and $1.50$.

   format long
z = 1.25
y = 1
for n = 1:30
y = z^y;
disp(y)
end
snapnow

z =
1.250000000000000
y =
1
1.250000000000000
1.321714079300705
1.343034993578008
1.349439873733169
1.351369882459350
1.351952000918108
1.352127625454628
1.352180615675240
1.352196604529415
1.352201428918186
1.352202884606055
1.352203323838621
1.352203456370664
1.352203496360284
1.352203508426572
1.352203512067399
1.352203513165966
1.352203513497443
1.352203513597461
1.352203513627640
1.352203513636746
1.352203513639493
1.352203513640323
1.352203513640573
1.352203513640648
1.352203513640671
1.352203513640678
1.352203513640680
1.352203513640681
1.352203513640681

   format long
z = 1.5
y = 1
for n = 1:15
y = z^y;
disp(y)
end
snapnow

z =
1.500000000000000
y =
1
1.500000000000000
1.837117307087384
2.106203352148880
2.349005318611934
2.592025704907509
2.860441497460648
3.189324761899238
3.644283987904574
4.382546732246116
5.911914873330858
10.990982932689437
86.181891743310985
1.499263005586202e+15
Inf
Inf


It converges for $z = 1.25$, but not for $z = 1.5$. Something important is happening between $1.25$ and $1.5$.

#### sqrt(2)

Let's try $z = \sqrt{2}$

   format long
z = sqrt(2)
y = 1
for n = 1:30
y = z^y;
disp(y)
end

z =
1.414213562373095
y =
1
1.414213562373095
1.632526919438153
1.760839555880029
1.840910869291011
1.892712696828511
1.926999701847101
1.950034773805818
1.965664886517319
1.976341754409703
1.983668399303822
1.988711773413954
1.992190882947058
1.994594450712102
1.996256666265859
1.997407001141337
1.998203477508703
1.998755133084593
1.999137310119392
1.999402118324998
1.999585622935682
1.999712796329641
1.999800935492971
1.999862023757784
1.999904364443337
1.999933711582101
1.999954052897822
1.999968152149245
1.999977924873871
1.999984698747096
1.999989394007813


It's converging to $y = 2$, because that's a fixed point.

$$(\sqrt{2})^2 = 2$$

So the crucial point is between $1.4$ and $1.5$.

#### Highest tower

I remember being fascinating when I first learned that the largest possible value of $z$ is a value just a little larger than $\sqrt{2}$. It is the surprising quantity

$$\bar{\mbox{e}} = \mbox{e}^{1/\mbox{e}}$$

   ebar = exp(1)^exp(-1)

ebar =
1.444667861009766


This works because $\bar{\mbox{e}}$ is another fixed point.

$$\bar{\mbox{e}}^{\mbox{e}}=(\mbox{e}^{1/{\mbox{e}}})^{\mbox{e}}=\mbox{e}$$

Let's see why $\bar{\mbox{e}}$ is the largest possible value of $z$.

#### Analysis

Analysis of the WS algorithm is provided by the mean value theorem.

$$f(y_n) = f(y_{n-1}) + f'(\xi_n)(y_n-y_{n-1})$$

for some $\xi_n$ between $y_n$ and $y_{n-1}$.

In our case, $f'(z) = \ln{z}\ z^y$, so

$$y_{n+1} = y_n + (\ln{z})(z^{\xi_n})(y_n-y_{n-1})$$

This tells us that

$$|y_{n+1}-y_n| \le C |y_n-y_{n-1}|$$

for the quantity

$$C = \mbox{max} |\ln{z}||z^y|$$

where the maximum is taken over the $y$ s and $z$ s in a region including the iterates. If $C$ is less than 1, the iterates get closer together, and so the iteration is a contraction that converges.

Let's do a numerical search by making a contour plot of $(\ln{z})(z^y)$. Use green for the region where $C$ is less than $1$, and red and blue for the regions where it is larger than 1.

   [Z,Y] = ndgrid(.01:.01:2.0,.01:.01:4.0);
R = log(Z).*Z.^Y;
contourf(Z,Y,R,[-5:4:-1 1:20:21]);
colormap([0 0 .75;0 .75 .25;1 0 0]);
e = exp(1);
line([ebar ebar],[e e],'marker','.','markersize',24,'color','yellow')
set(gca,'xtick',[.01 1/e 1 e^(1/e) 2])
set(gca,'xticklabel',{'0','1/e','1','e^(1/e)','2'})
set(gca,'ytick',[.01 1 2 e 4])
set(gca,'yticklabel',{'0','1','2','e','4'})
xlabel('z'), ylabel('y')
axis square


#### Stable Towers

If we stay in the green region, where $C$ is less than 1, the WS iteration converges, and the tower is stable. That includes all values of $z$ in the interval

$$1/\mbox{e} \le z \le \mbox{e}^{1/\mbox{e}}$$

#### Towers Topple

The crucial value that I encountered in junior high is the yellow dot, $z=\bar{\mbox{e}}=\mbox{e}^{1/\mbox{e}}$ with $y=\mbox{e}$. As we approach the dot from the stable region, $f'(z)$ approaches $f'(\bar{\mbox{e}})=1$ and the WS algorithm converges incredibly slowly.

To the right of $z=\bar{\mbox{e}}$, the iteration encounters values of $y$ in the red danger region where C is larger than 1, so it diverges rapidly and the tower collapses quickly, as we saw at $z=1.5$.

#### Small z

Values of $z$ less than $1/\mbox{e}$, which is about $0.37$, are a little more delicate. We might be in the blue region, or might not. It turns out that values of $z$ greater than $\mbox{e}^{\mbox{-e}}$, which is about $0.066$, still encounter values of $y$ larger enough to keep the iteration in the safe green region, so it converges.

But now try $z = 0.05$ for a few iterations.

   z = 0.05
y = 1
for k = 1:20, y = z^y; disp(y), end

z =
0.050000000000000
y =
1
0.050000000000000
0.860891659331735
0.075849745545982
0.796741072475616
0.091921259400909
0.759290007184149
0.102834993569945
0.734866735033395
0.110641061786732
0.717881331845427
0.116416584607138
0.705567440560545
0.120791283462031
0.696381005849372
0.124161635028397
0.689385252311344
0.126791198828070
0.683975974940914
0.128862555680671
0.679744887323325


It doesn't look like it wants to converge. Run 300 more iterations without any output.

   for k = 1:300, y = z^y; end


Now check on how we're doing after a total of 320 iterations.

   for k = 1:10, y = z^y; disp(y), end

   0.137359395737956
0.662660838900549
0.137359395737956
0.662660838900549
0.137359395737956
0.662660838900549
0.137359395737956
0.662660838900549
0.137359395737956
0.662660838900549


It has settled into a pattern, oscillating back and forth between two values. This is a simple example of a period doubling bifurcation.

This is the behavior for all $z$ less than $\mbox{e}^{\mbox{-e}}$. The values of $y$ encountered put us in the blue region. The fixed point iteration is not a contraction. It settles into a stable two-cycle. The distance between the two limit points increases as $z$ gets smaller.

A tower based on a value of $z$ less than $\mbox{e}^{\mbox{-e}}$ will have its odd numbered floors one height and its even numbered floors another.

#### Enter Lambert W

Here is another way to look at the tower of powers. It involves the Lambert W function from my previous blog.

$$y = z^y = \mbox{e}^{y\ln{z}}$$

$$y \mbox{e}^{-y\ln{z}} = 1$$

$$-y \ln{z} \ \mbox{e}^{-y\ln{z}} = -\ln{z}$$

$$-y \ln{z} = W(-\ln{z})$$

$$y = \frac{W(-\ln{z})}{-\ln{z}}$$

So, the Lambert W function provides a value of the infinite tower for any value of $z$. I didn't know this when I was in junior high school.

#### The Tower function

Let's try this formulation for a few values of $z$, including some smaller than $\mbox{e}^{\mbox{-e}}$.

  T = @(z) Lambert_W(-log(z))./(-log(z))
z = [1.25 sqrt(2) ebar 1.5 e^(-e) 0.05 10.^(-(3:3:15))]';
disp('         z                    T(z)')
disp([z T(z)])

T =
@(z)Lambert_W(-log(z))./(-log(z))
z                    T(z)
1.250000000000000   1.352203513640681
1.414213562373095   2.000000000000000
1.444667861009766   2.718281787975775
1.500000000000000                 NaN
0.065988035845313   0.367879441171442
0.050000000000000   0.350224852743194
0.001000000000000   0.219513151627722
0.000001000000000   0.141526856553182
0.000000001000000   0.107583717152717
0.000000000001000   0.087971500166559
0.000000000000001   0.074997053328098


#### Plot

And now a plot.

  ezplot(T,[0 ebar])
axis([0 ebar 0 e])
title('Tower of Powers')


#### Unstable

The value of $y$ obtained from the Lambert W formulation is a fixed point of $f(z)$, but a possibly unstable one. If we have a value of $z$ less than $\mbox{e}^{\mbox{-e}}$ and we start the fixed point algorithm at the value of $y$ given by the Lambert W formula exactly, it will stay there. But if we start with any other value of $y$, the iterates will eventually find their way to the pair of bifurcation attractors.

Get the MATLAB code

Published with MATLAB® R2013b

## The Lambert W Function

The Lambert W function deserves to be better known. It pops up in all sorts of places. And our MATLAB function for evaluating the function is a beautiful use of the Halley method.

### Contents

#### Johann Lambert

Johann Henrich Lambert was born in 1728 in Mulhouse, which was then in Switzerland, but is now in France. He was a contemporary of the famous Swiss mathematician Leonard Euler. He had incredibly wide ranging interests. He was the first to prove that $\pi$ was irrational. He introduced the hyperbolic trig functions. He made such important contributions to optics that a photometric unit for luminance is named in his honor. He also made contributions in physics, astronomy, logic, and philosophy.

Lambert did not explicitly describe the function that is now named after him. He considered various equations, including ones of the form $x = q + x^m$. Euler followed up with papers about Lambert's work. The ideas involved in analyzing such equations underlie what we today call the Lambert W function.

#### The Maple Connection

In the early 1990s developers of the Maple symbolic algebra software, including Rob Corless, Gaston Gonnet, D.E.G. Hare, and David Jeffrey, came to realize that this function had been rediscovered many times over the years. They christened the function "W", published a note in the Maple Technical Newsletter, and began to develop a bibliography. Don Knuth joined the group by contributing applications involving enumerations of trees. In 1996 the five of them wrote a 32 page paper that has become the definitive reference on the Lambert W function. Their bibliography contains 72 entries.

#### An elementary function

Let's plot an elementary function, using $w$ as the independent variable. The function is $w \mbox{e}^w$.

   f = @(w) w.*exp(w);
ezplot(f,[-4 1])
xlabel('w')
axis([-4 1 -.5 1])
axis square


You can see the function reaches a global minimum at $w = -1$. The value of this minimum plays an important role in this story, so let's give it a name.

$$\bar{\mbox{e}} = -1/\mbox{e}$$

   ebar = -1/exp(1)
line([-1,-1],[ebar,ebar],'marker','.','markersize',18,'color','black')
set(gca,'ytick',[ebar,0,0.5,1])
set(gca,'yticklabel',{'-1/e','0','0.5','1'})

ebar =
-0.367879441171442


#### Functional inverse

The Lambert W function, $W(x)$, is the functional inverse of $w \mbox{e}^w$. In other words, $w$ = W($x$) is the solution of the equation

$$x = w \mbox{e}^w$$

The graph of $W(x)$ is conceptually obtained by labeling the vertical axis of the above graph with an 'x' and then interchanging the horizontal and vertical axes. (You can sneak a peak at the actual plot below, generated after we see how to compute the function.)

The point $x = \bar{\mbox{e}} = -1/\mbox{e}$ is crucial. On the interval $\bar{\mbox{e}} <= x <= 0$ there are two real solutions and so $W(x)$ is double valued. One solution is increasing, is defined for all $x >= \bar{\mbox{e}}$, is known is the primary branch, and is denoted by $W_0(x)$. The other solution is decreasing, is defined only for negative $x$, has a pole at 0, and is denoted by $W_{-1}(x)$. (There are other, complex-valued, branches, $W_k(x)$, which we are not considering here.)

#### Halley's method

Halley's method is a method for finding a zero of a real-valued function with a continuous and easily computed second derivative. It is named after the British astronomer who is better known for discovering a comet.

Newton's method approximates a function locally by a linear function with the same slope and steps to the zero of that approximation. Halley's method approximates the function locally by a hyperbola with the same slope and curvature and steps to the nearest zero of that approximation. The iteration is

$$w_{n+1}=w_n-f(w_n)/\left[f'(w_n)-\frac{f(w_n)f''(w_n)}{2f'(w_n)}\right]$$

You can see Newton's method by ignoring the second term in the square brackets. You can see that Halley's method is particularly efficient if the ratio $f''(w)/f'(w)$ happens to take a simple form. But we do not regard Halley's method as a general purpose method because it does require knowledge of the second derivative.

#### Application to Lambert W

Computing $W(x)$ for a given value of $x$ requires solving the equation

$$f(w) = w \mbox{e}^w - x = 0$$

for $w$. We can use Halley's method because the required derivatives are readily available.

$$f'(w) = w \mbox{e}^w + \mbox{e}^w = \mbox{e}^w (w + 1)$$

$$f''(w) = w \mbox{e}^w + w \mbox{e}^w + \mbox{e}^w = \mbox{e}^w (w + 2)$$

Note that

$$f''(w)/f'(w) = (w+2)/(w+1)$$

This gives us the following iterative step for computing Lambert W at a given value of $x$.

$$f_n = w_n \mbox{e}^{w_n} - x$$

$$w_{n+1}=w_n-f_n/\left[\mbox{e}^{w_n}(w_n+1)-(w_n+2)f_n/(2 w_n+2)\right]$$

#### Starting values

Various people have written programs before me to compute Lambert W using Halley's method. They have paid quite a bit of attention to starting values that reduce the eventual number of iterations. I have decided to take a different approach. I am willing to have the computation take a few more iterations if the program can be simplified. In fact, I use the simplest possible initial values.

For the primary branch, start with $w = 1$ for any $x$. Do not try to approximate $W(x)$ at all. The iteration turns out to converge to double precision floating point accuracy in about a half dozen iterations for any $x$, except values very near the branch point at $\bar{\mbox{e}}$. More accurate starting values could cut the number of iterations in half, but I prefer the simpler code.

For the lower branch, start with any value less than $-1$. I have chosen $w = -2$, but I don't think it matters very much. Again, about a half dozen iterations are required unless $x$ is very close to the branch point at $\bar{\mbox{e}}$ or the pole at $0$.

#### Stopping values

Halley's method is cubically convergent near a simple root, which this is. Once we're close, we get closer very fast. If two successive iterates agree to 1.0e-8, then you can be pretty sure the second one is accurate to 1.0e-16. I could probably even replace 1.e-8 by eps^(1/3) = 6e-6.

#### Lambert_W.m

Here is the entire M-file for function Lambert_W. This will be available from the MATLAB Central File Exchange in due course. (Added 9/16/2013: See this link)

   type Lambert_W

function w = Lambert_W(x,branch)
% Lambert_W  Functional inverse of x = w*exp(w).
% w = Lambert_W(x), same as Lambert_W(x,0)
% w = Lambert_W(x,0)  Primary or upper branch, W_0(x)
% w = Lambert_W(x,-1)  Lower branch, W_{-1}(x)
%
% See: http://blogs.mathworks.com/cleve/2013/09/02/the-lambert-w-function/

% Copyright 2013 The MathWorks, Inc.

% Effective starting guess
if nargin < 2 || branch ~= -1
w = ones(size(x));  % Start above -1
else
w = -2*ones(size(x));  % Start below -1
end
v = inf*w;

% Haley's method
while any(abs(w - v)./abs(w) > 1.e-8)
v = w;
e = exp(w);
f = w.*e - x;  % Iterate to make this quantity zero
w = w - f./((e.*(w+1) - (w+2).*f./(2*w+2)));
end



#### Plot Lambert W

We can now use this code to produce a plot of both branches of the Lambert W function.

% Primary branch

x = ebar:.01:1;
plot(x,Lambert_W(x,0))

% Lower branch

hold on
x = ebar:.01:0;
plot(x,Lambert_W(x,-1),'r')
hold off

% Annotation

axis([-.5 1 -4 1])
axis square
line([ebar ebar],[-1 -1],'marker','.','markersize',18,'color','black')
set(gca,'xtick',[ebar 0 .5 1])
set(gca,'xticklabel',{'-1/e','0','0.5','1'})
set(gca,'ytick',-4:1)
xlabel('x')
ylabel('w')
title('Lambert W')
legend({'W_0','W_{-1}'},'location','southeast')


#### Acknowledgements

Thanks to Peter Costa for reawakening my interest and to Rob Corless for keeping me up to date as always.

#### Lambert W Poster

Follow this link to a poster about the Lambert W function produced by the group at the University of Western Ontario led by Corless and Jeffrey.

Corless, R., Gonnet, G., Hare, D., Jeffrey, D., Knuth, Donald (1996). "On the Lambert W function". Advances in Computational Mathematics (Berlin, New York: Springer-Verlag) 5: 329-359. <http://link.springer.com/article/10.1007%2FBF02124750>

Corless, R., Gonnet, G., Hare, D., Jeffrey, D., Knuth, Donald (1996). "On the Lambert W function". PDF Preprint. https://cs.uwaterloo.ca/research/tr/1993/03/W.pdf

Wikipedia article. <http://en.wikipedia.org/wiki/Lambert_W_function>

NIST Digital Library of Mathematical Functions. <http://dlmf.nist.gov/4.13>

Halley's Method: Dahlquist, G., Bjorck, A. Numerical Methods in Scientific Computing, Volume I, page 648, SIAM, Philadelphia, 2008.

Get the MATLAB code

Published with MATLAB® R2013b

## Backslash

The backslash operator has come to represent both the matrix origins and the usability of MATLAB.

### Contents

#### T-shirts

At the recent SIAM Annual Meeting in San Diego we offered T-shirts at the MathWorks booth that featured the iconic MATLAB statement

x = A\b;

They prompted a friend to ask me "When you were developing MATLAB, how did you come to choose the backslash character for the solution of linear equations." The answer takes us back to the beginnings because solving linear equations is certainly the first nontrivial operation MATLAB had to be able to do.

#### Father of ASCII

Robert W. (Bob) Bemer was an early computer pioneer at Rand Corporation and at IBM who was a prominent participant in industry standards committees. Among his many accomplishments, he was one of the first people to propose an approach to the Y2K problem. His 1960 design of the character set for the IBM STRETCH computer included the backslash, as well as curly braces and square brackets. He later added the escape character.

Bemer was an advocate of the backslash character because it could be combined with the forward slash to form the logic operators /\ and \/, which were part of the ALGOL 60 programming language. In the early 1960s, Bemer was instrumental in having backslash, as well as the braces, brackets, and escape, included in the American Standard Code for Information Interchange character set. That's ASCII for short and is, of course, is the character set that all computers use today.

When Bemer proposed backslash to the ASCII committee, there was not immediate acceptance. But he says, "So I asked for a character more important to have. After much discussion they could not agree on a better candidate."

#### BCD and EBCDIC

It took more than a decade for the ASCII character set to be universally adopted. In the mid to late 1970s we still had to deal with computers and computer terminals that had different character sets. Many central main frames had operating systems with 6-bit character sets and so could not handle both upper and lower case. Different machines had different sets of special characters.

The IBM main frames at Argonne National Laboratory, where we were working on EISPACK and LINPACK, initially had a 6-bit, upper case only character set called BCD, for Binary Coded Decimal. New machines brought with them 8-bit EBCDIC, which had upper and lower case and was a kind of transition from BCD to ASCII. Other big machines from other manufacturers, at other national laboratories and at university computer centers, had other operating systems and other character sets. It was a veritable zoo.

#### Choosing backslash

But, perhaps surprisingly, all these systems happened to have one thing in common -- they all included backslash among their collection of special characters. Thinking about it now as I write this blog, I can only surmise that it was because of Bob Bemer and in anticipation of the emerging ASCII standard.

Since matrix multiplication is not commutative, solving the linear system $Ax = b$ is not the same thing as solving $xA = b$. I needed two different symbols for the two different solution operations. You can think of the process as like dividing by $A$, so it was just natural to choose the backward slash and the forward slash for the two symbols. This gives us the two different MATLAB statements

x = A\b;

x = b/A;

The first of these is by far the more important.

#### Kurt Hensel, 1928

Nick Higham somehow found a paper written in 1928 by a famous German mathematician, Kurt Hensel, "Uber den Zusammenhang zwischen den Systemen und ihren Determinanten". Hensel introduces the backslash notation for the solution of a system of simultaneous linear equations in the same way we have, in fact with the same letters, $A$, $X$, and $B$. He then uses the notation once on the fourth page of the paper and never again. As the title indicates, most of the paper is about determinants.

As far as we know, no one else used backslash in this way until I did fifty years later.

#### 19th Century notation

Nick also describes notation devised in the 19th century, before matrices were called matrices. Nick found that Arthur Cayley suggested

$$\frac{B |}{| A} \ \ \mbox{and} \ \ \frac{| B}{A |}$$

for $BA^{-1}$ and $A^{-1}B$ in the context of groups. Fortunately, this notation did not catch on.

#### Acknowledgement

Thanks to Nick Higham for help with this week's post.

R. M. Bemer, "How ASCII Got Its Backslash", <http://www.trailing-edge.com/~bobbemer/BACSLASH.HTM>

K. Hensel, "Uber den Zusammenhang zwischen den Systemen und ihren Determinanten", Journal fur die reine und angewandte Mathematik (Crelle's Journal). 1928, volume 159, Pages 246-254. <http://resolver.sub.uni-goettingen.de/purl?PPN243919689_0159/dmdlog27>

N. J. Higham, "Cayley, Sylvester, and early matrix theory", Linear Algebra Appl., 428:39-43, 2008. <http://eprints.ma.man.ac.uk/954/01/covered/MIMS_ep2007_119.pdf>

Get the MATLAB code

Published with MATLAB® R2013b

Cleve Moler is the author of the first MATLAB, one of the founders of MathWorks, and is currently Chief Mathematician at the company. He is the author of two books about MATLAB that are available online. He writes here about MATLAB, scientific computing and interesting mathematics.

These postings are the author's and don't necessarily represent the opinions of MathWorks.