My erstwhile colleague Steve Eddins recently retired from MathWorks after a long and illustrious career. And once he was finally free of the office, what did he long to do? Maybe a few rounds of golf... read more >>

]]>I know all this because Steve has already set up a personal MATLAB blog: Matrix Values.

I was especially interested in his second post, Initialize a MATLAB Toolbox. In it, he describes the process of first adapting, then automating our toolbox design guidelines. You may remember a post about those guidelines that we did last year: What Is a Toolbox? New Guidelines for Authors.

Take a look at Steve's post. He does a nice job summarizing the toolbox design document and then talking you through the parts that he finds valuable as well as the parts he doesn't need to worry about. And since he is a self-avowed automation junkie, he wasn't going to stop until he wrote some tooling to help him out. Steve puts it like this:

As I learned during my MathWorks career, a set of recommended practices is much more likely to take hold if it can be automated.

Amen to that. And I have good news! You too can take advantage of his helpful automation, because he's sharing it with the world in general and you in particular. You can find the tool he wrote, inittbx, on the File Exchange and GitHub.

Steve uses many of our guidelines, and also throws in a few wrinkles of his own. I like the checklist file, CHECKLIST.md. He says it "reminds me of all the steps needed to complete the job." It's a lighter weight approach than opening a bunch of issues for yourself.

So how does it work? Well, I'm about to start work on my magnum opus, a Sudoku solver called the Sudokunator. All I have to do is ...

`> inittbx("Sudokunator")`

and I'm on my way with beautiful standards-compliant folders and files, all ready to go. Now all I have to do is, you know, write the darn code.

So thanks, Steve, for making yourself useful even as you ride off into the sunset!

]]>Recently I've seen a number of fractal images created with Newton's method. For instance, Daniel Pereira had two spectacular animations in the recent Flipbook Mini-Hack contest. Rotating Newton's... read more >>

]]>Here's another example of the kind of picture I'm talking about.

How is such an image made? What does it represent? It's one of those things where complexity is hiding behind something that, on its face, seems straightforward.

To motivate this, consider this question: How do you find the roots of this polynomial?

x^{3} - 2·x^{2} - 11·x + 12.

That is, for what values of x is this equation true? x^{3} - 2·x^{2} - 11·x + 12 = 0

There are a number of ways to answer this question. If you’re good at factoring polynomials, you might recognize that this is (x-4)(x-1)(x+3). Another answer is to use a little MATLAB magic. MATLAB represents polynomials as vectors containing the coefficients ordered by descending powers. Here is that same polynomial in vector form.

>> p = [1 -2 -11 12]

Once in the right form, there’s a command ROOTS that will answer our question directly.

>> roots(p) ans = -3.0000 4.0000 1.0000

But let's say that your polynomial won't factor so easily, and you don't have a computer handy. Let’s imagine you're Isaac Newton and you want to find the roots of this polynomial. Now what?

As it happens, Newton is credited with a nifty iterative method for finding the roots in a situation like this. You start with a guess, and you use the slope of the curve at that point to make a better guess. Keep repeating this process until you get as close as you want to zero.

Let's start with a guess at -1.9.

x=-1.9000, f(x)=18.8210 x=-4.4331, f(x)=-65.6622 x=-3.4335, f(x)=-14.2877 x=-3.0585, f(x)=-1.6770 x=-3.0013, f(x)=-0.0364 x=-3.0000, f(x)=-0.0000

If we start at -1.9, we end up finding the root at -3 in around six iterations. But wait! There are three roots to this polynomial. It turns out that different starting locations will converge to different roots. So for every starting guess, there is a corresponding root. This is where things get interesting.

This iterative process maps an initial guess to a root. You can think of the resulting map as being divided into basins. Different basins drain into different roots. We can color these basins according to the root to which that starting point will eventually converge. Additionally, we'll keep track of how many iterations it takes to converge (that is, to reach the same tolerance). The curve in the top graph is colored according its corresponding root. Each colored region represents a different basin.

Curiously, these basins aren't always well defined. As we zoom into the boundaries between two basins, we keep finding smaller and smaller regions that converge to different roots. This back-and-forth convergence continues no matter how much you zoom in. This phenomenon is sometimes called a fractal basin boundary.

Now that we've demonstrated this for the real number line, you may wonder: can we generalize to the complex plane? Yes we can, because MATLAB is awesome! Since complex numbers are a native data type, you can use exactly the same code as in the one dimensional case.

So let's examine a more complicated polynomial. This one has complex roots. We'll use the process shown above to find the root basins on the complex plane.

What are the roots of this polynomial?

8·x^{7} + x^{6} + 3·x^{5} + 7·x^{4} + 2·x^{3} + 4·x^{2} + 5·x + 6

Since it is a seventh order polynomial, there are seven roots to this equation. Every point on the complex plane maps to one of them. So every single pixel in this plot is the result of Newton's method. But the borders between these regions are demented fractals.

Look how simple the code is! There's only one line right in the middle that's doing all the heavy lifting.

p = [8 1 3 7 2 4 5 6]; % Make anonymous functions for the polynomial and its derivative. f = @(x) polyval(p,x); df = @(x) polyval(polyder(p),x); span = linspace(-1.5,1.5,300); [x,y] = meshgrid(span,span); z = x + 1i*y; % Apply Newton's method for k = 1:50 z = z - f(z)./df(z); end % Use ANGLE as a proxy to display the root imagesc(span,span,angle(z)) axis square % Show the roots hold on plot(roots(p), ... LineStyle="none", ... Marker="pentagram", ... MarkerFaceColor="red", ... MarkerSize=18) hold off

And how many iterations did it take at each point to converge to that root? That's the picture we started with at the top of the post.

Now that you know the basics, it’s easy to make some eye-popping animations. Here's one of my own additions to the recent Flipbook contest.

If you want to learn more, take a look at Cleve's post on this topic: Fractal Global Behavior of Newton’s Method.

]]>The MATLAB AI Chat Playground is ready for you to experiment with Generative AI, answer questions, and write initial draft MATLAB® code. All of us on the community team share a guiding principle. We... read more >>

]]>All of us on the community team share a guiding principle. We wake up every day thinking about how we can help our users accelerate their pace of learning and building. ChatGPT, Generative AI, and large-language model (LLM) technology have been sweeping forces, and we see this amazing technology as a tool that, if applied correctly, can help users find answers, get initial drafts of code, and help break down complex engineering problems.

We spent this year listening to you, experimenting with Generative AI ourselves, and building a space for our users to experiment. We have been working with our top community users and refining the experience, and I am thrilled to announce that we are now opening the MATLAB AI Chat Playground to all users.

To get access right now, go to the MATLAB Central community and select AI Chat Playground.

The AI Chat Playground is optimized to assist with MATLAB-related input, including toolboxes for deep learning, statistics and machine learning, optimization, control systems, and signal processing. Currently, the system has limited knowledge of Simulink and other MathWorks products.

Open the AI Chat Playground in a desktop browser.

The playground features a chat panel next to a lightweight MATLAB code editor. Use the chat panel to enter natural language prompts to return explanations and code. You can keep chatting with the AI to refine the results or make changes to the output.

Try one of the onboarding prompts if you don’t know where to start.

The AI replies with an explanation, some code, and follow-up prompts to keep the conversation going. At any point, you can enter your own input and explore an endless number of directions. Every time I use the playground, I learn about features of MATLAB, ways to interface with functions, or new concepts. I am always picking up new things.

If you want to try the MATLAB code, select the insert icon next to the code and then click the run button. This will execute the code and show the results. If you want to copy the code and bring it into the MATLAB editor or Visual Studio Code, select the copy button.

Please rate the output of the AI with the thumbs up or thumbs down buttons. This is an important signal for us to figure out how to improve the output and refine the experience over time.

Here are some use cases for the playground that others have found helpful.

A good place to start is to ask basic technical computing questions.

- What's an eigenvalue?
- How do I customize a line plot?

If you don’t have a specific question, you can also use the AI Chat Playground to generate ideas on how to do something.

- Give me some ideas on how to visualize large amounts of rainfall data.
- My data is noisy. What can I do to fix it?

A powerful way to use the playground is to get MATLAB code. Try entering some high-level instructions.

- Create some data, construct a grid of query points, interpolate the grid, and plot the results.
- Roll two six-sided dice 1000 times and plot the sum of each roll.

I look forward to seeing what you learn and build with the MATLAB AI Chat Playground. Please share your experiences here and in the Generative AI discussion area of the MATLAB Central community. It’s important that we inspire each other and collectively learn how to harness this powerful technology. *We are all in this together.*

Today we're launching a new contest here on MATLAB Central: the MATLAB Flipbook Mini Hack. Why flipbook? It's an animation contest in which each animation gets exactly 48 frames, so it's like... read more >>

]]>Today we're launching a new contest here on MATLAB Central: the MATLAB Flipbook Mini Hack. Why flipbook? It's an animation contest in which each animation gets exactly 48 frames, so it's like flipping through a 48-page pad of paper with slightly different drawings on each page. Check it out: entries are already pouring in!

In this post, I want to demonstrate some coding techniques that can make your animations easier. Suppose, for example, you wanted to animate a dot moving in a circle. Here the variable f is the frame number. It will take on the values from 1 to 48.

f = 1:48;

We want to use f to help us make an dot move in a circle. As f increments up to 48, we'll make the variable theta go from 0 to 2 pi. A perfect job for INTERP1! The INTERP1 function is a convenient way to do a linear map from one domain to another.

theta = interp1([0 48],[0 2*pi],f);

plot(f,theta,'.-',MarkerSize=12)

Let's pretty up this plot.

xlim([1 48])

ylim([0 2*pi])

set(gca,XTick=0:8:48)

set(gca,YTick=pi*(0:0.5:2),YTickLabel={"0","\pi/2","\pi","3\pi/2","2\pi"})

xlabel("Frame Number")

ylabel("Theta")

grid on

This shows you how theta varies across all frames. It's all pretty straightforward, but I just love how INTERP1 washes away the hassle of working out the linear mapping.

Now it's time to build one frame of our animation. So instead of looking at all values of f, we'll just consider one. Appropriately, we'll start with frame 1.

f = 1;

Now we use our interpolation trick to map frame number into theta.

theta = interp1([0 48],[0 2*pi],f);

With theta defined, it's time to plot our moving dot. A little trig will put the dot in the right place.

plot(cos(theta),sin(theta),".", ...

MarkerSize=24)

Not much to look at so far! Let's make a circular race track that doesn't move so we can follow the action more clearly.

t = linspace(0,2*pi);

line(cos(t),sin(t))

And we'll square up the axis and reset the limits so the dot isn't bumping into the edges.

axis square

axis([-2 2 -2 2])

Now we have the code that we put into the contest entry page. This code will create any of the frames 1 through 48 on demand. When you push the "Create Animation" button, the contest machinery does the rest of the magic. It generates all 48 frames in succession and stitches them into a single GIF image.

Here is the result.

There you have it: a two-second 48-frame movie masterpiece! I call it "Cutting a Hole in the Wall with a Blue Laser". Or "Blue Polar Bear Circles the North Pole". What would you call it?

That was a simple way to get started, but I feel we can do better. Let's do something more interesting with this circular motion.

First set up a landscape.

% Set up a landscape

peaks

shading flat

material dull

axis vis3d off

Nice! Now we're going to create a light and move it around above this landscape. We can use our interpolation trick.

% Just showing frame 1 here

f = 1;

% Map onto theta

theta = interp1([0 48],[0 2*pi],f);

% Set up the light

ht = 4*sin(theta) + 4;

pos = [2*cos(theta) 2*sin(theta) ht];

light(Position=pos)

% The actual light is invisible, so we'll mark its position with a yellow dot

line(pos(1),pos(2),pos(3), ...

Marker=".",MarkerSize=24,Color="y")

set(gcf,Color=0.8*[1 1 1])

Now as f changes, the light orbits around the landscape in three dimensions. Here's the final animation. Pretty cool, eh? It's certainly better than our first animation.

I have found this contest to be a lot of fun, and very addictive. Maybe you will to. I encourage you to give it a try. Now that you know the INTERP1 trick, you have the secret to making your images dance.

]]>I like to play Cody because I always learn something. After I've solved a problem, I spend a little time with the Solution Map looking at other solutions. It doesn't take long to find something... read more >>

]]>For instance, have you ever heard of the function ISSORTED? It's a simple function, but it was new to me. Let me tell you how I first learned about it. Along the way, I'll tell you about a new trick for finding good Cody solutions.

Let's consider Cody Problem 10, which is about monotonically increasing vectors. Here's the problem statement: Return true if each element in the input vector is larger than its predecessor.

At the moment I write this, Problem 10 has over 67,319 solutions. Quick! Which one is the best? It's a daunting request. But the best solution out of a set that big is probably pretty good. Certainly worth a look. But which one is it?

We can start with a simple distinction: of all these solutions, only 20,813 are correct. So that's progress! The best solution is likely to be in this subset. Which of these correct answers is the best? Cody uses a metric, called "size," to rank the solutions. A solution's size is the number of nodes in the parse tree for a given solution, and lower is considered better. It's a useful metric, but far from perfect. We all know that the shortest solutions aren't always the best solutions. Indeed, because of some weird hacks that people have worked out (*cough cough*, regexp), the shortest solution is often a terrible solution in real life.

If size isn't a great metric, can we come up with a better one? In general, it's difficult, maybe impossible, to come up with an automated system that can figure out which solution is the best. What criteria are you using? Size? Speed? Efficiency? Readability? And how do you weight each of these? Would you sacrifice readability for the sake of speed, or vice versa? Everybody has a different notion of goodness.

Ultimately, what most of us want to know is this: *Which solution do most people prefer?* It's a utilitarian answer that relies on the wisdom of the crowd. We're acknowledging that judging code is subjective, and we're asking our friends and neighbors to apply their subjectivity, biases and all. This turns out to work pretty well.

Now that we've decided to make a subjective judgment system, we have to work through some practical issues. How do you get people to give fair and distinct scores to thousands of solutions? Here is the key insight: whereas it's difficult to grade individual entries consistently, it is generally straightforward to compare two entries and say which one is better. And once we've built up enough binary comparisons, we can use statistics to work out scores for all the solutions. Here we rely on techniques (the Bradley-Terry algorithm, in this case) developed for ranking players in sports and games. Think about how a collection of chess or tennis matches can be aggregated into one global ranking.

A few months ago, Cody added a crowd-scoring feature based on this head-to-head comparison approach. Now we can tell you not only which code is the smallest, but also which code is preferred by the people who are playing.

Let's look at a set of solutions to Problem 10 that have all been ranked. Here is a histogram by size. The median size is 37.

Things get more interesting when we plot crowd score vs. size. Crowd score is what we're calling the subjective ranking applied by players like you. This number is the output of the Bradley-Terry algorithm, and it's been normalized here so that the best crowd score is 100.

There is an obvious skew to the distribution: preferred solutions tend to be shorter.

We should note here that the great majority of solutions have not been given a crowd score yet, so we can't expect the results to be comprehensive and consistent. Similar entries may receive different ratings. Nevertheless, the plot is instructive. It extracts good entries from the vast amount of background noise.

The red arrow indicates the current crowd favorite:

In English we might say: in all cases, the difference between two successive elements should be positive. Compact and easy to read! This use of DIFF is a nice example of vectorized code. You don't need a loop to go through all the elements of x. But you can if you want to. Here is a longer, less preferred solution.

It still gets the correct answer, but the code is longer (and arguably, harder to read) because it explicitly loops through the vector x.

Here is a plot that distinguishes between those solutions that use the vectorized DIFF approach and those that use the looping LENGTH approach. Vectorization leads to shorter, and generally more preferred code.

There are many variants on the DIFF theme. For instance, this one is slightly more complicated than the leader.

Here's one that is still vectorized, but it does the DIFF calculation explicitly.

There are some other intriguing entries from the list of preferred solutions. I like this one. Here the function UNIQUE is enforcing the problem criteria: if the vector is already in the order that would be returned by UNIQUE, then you're good.

And finally, here's the one from farther down the list that took me by surprise.

I didn't even know about the ISSORTED function, but it's been around for a long time, and it offers the shortest, most direct solution to the problem.

Which one of these is your favorite, and why?

I love solution tourism in Cody. Where else can you be exposed to so many novel functions and solution methods? Even so, the full list of Cody solutions can feel like an impenetrable jungle. Where do you start? That's why we introduced solution voting: to find the treasures hidden in the thicket.

And if you like this feature, remember that it's only as good as the votes going into it. Please spend a little time voting on answers yourself. It only takes a little effort, and everybody wins!

]]>Today's guest article is by Wesley Hamilton, a STEM Outreach engineer here at MathWorks. See his earlier piece on solving Sudoku puzzles. In this blog post we'll explore the fascinating world of... read more >>

]]>In this blog post we'll explore the fascinating world of continued fractions, like $ \sqrt{2} = 1 + \frac{1}{2 + \frac{1}{2 + \frac{1}{2 + \cdots}}} $. While these objects have been studied for several centuries, their presence in modern math classes is relatively rare (unless you're studying number theory, probably). One of the historical uses of these objects has been computing rational approximations of irrational numbers, such as the 5th century approximations for π of $ \frac{22}{7} $ and $ \frac{355}{113} $ by Chinese astronomer Zu Chongzhi. Here, we'll focus on something even less prevalent: their computational aspects. The goal is to start exploring statistical properties of continued fractions, generating conjectures, and then pointing to several topics that can serve as computational playgrounds to get one's hands dirty and explore even more.

Before defining what a continued fraction even is, we'll start by defining a few "important" numbers to use as examples throughout. Next we'll define continued fractions and compute some by hand, before introducing the computational algorithm to compute these things. Afterwards we'll explore their statistical properties, before discussing some conjectures and neat facts, and round out the discussion by including links to other interesting computational topics to explore.

Table of Contents

So that we have a rich class of examples, let's set up a few "important" numbers to use throughout the rest of this blog. Some of the numbers are important in some area(s) of math, though the whole collection should give us a nice, diverse set of examples to explore with. Also some of the names used are likely not standard, but describe how the numbers are constructed fairly well.

First, let's set up some parameters that'll determine the precision we'll be using throughout, along with a random number seed to ensure reproducibility no matter which machine (and when) these examples are run.

d = 10000; %set the computational precision "very" high

rng(1); %set the seed for the random number generator

And with that, let's set up our collection of numbers. For those interested in learning about each number, check out the code block at the end of this post. If you're content with knowing these are just being used as examples, feel free to skip seeing the definitions.

[num1, num2, num3, num4, num5, num6] = generateExamples(d);

So what exactly is a continued fraction? As the name implies, a continued fraction is a fraction that continues. What are fractions that don't continue? Most fractions you might think of, such as $ \frac{11}{13} $.

This isn't the only way to write this fraction though. One way, using Egyptian fractions, is $ \frac{11}{13} = \frac{1}{2} + \frac{1}{3} + \frac{1}{78} $, which we can verify using MATLAB's Symbolic Math Toolbox (which will also be used significantly later on):

eFrac = sym(1/2) + sym(1/3) + sym(1/78)

Another way is $ \frac{11}{13} = \frac{1}{1 + \frac{1}{ 5 + \frac{1}{2}}} $. We can check this just as we did above:

cFrac = 1/(1 + 1/(5 + sym(1/2)))

This is what we call a finite (simple) continued fraction: continued fraction because it continues, finite because there are finitely many terms, and simple because the numerators that appear are all 1. We also note that all of the coefficients we want should be positive integers; variations with negative integers also exist and the resulting continued fractions will have different properties.

This begs two natural questions:

- Are there continued fractions that have infinitely many terms?
- Are there continued fractions with numerators other than 1?

Yes and yes, which we'll demonstrate by example:

- The golden ratio $ \phi = \frac{1 + \sqrt{5}}{2} = 1+ \frac{1}{1 + \frac{1}{1 + \cdots}} \approx 1.618... $,
- $ \frac{4}{\pi} = 1 + \frac{1^2}{2 + \frac{3^2}{2 + \frac{5^2}{2 + \cdots}}} \approx 1.2732... $.

In this post we'll focus on infinite continued fractions, which we'll refer to as continued fractions for simplicity. Likewise, we won't touch on non-simple continued fractions, but rest assured they exist and often express interesting patterns and are fascinating mathematical objects to explore; for simplicity, whenever we say continued fraction we're talking about a simple continued fraction.

So how does one compute the continued fraction for a number? The general algorithm is a sort of Euclid's algorithm, wherein we take off the integer portion of a number, invert the fraction, and repeat. Let's see how this works with $ \frac{11}{13} $:

- $ \frac{11}{13} $ has no integer part, so we have $ \frac{11}{13} = 0 + \frac{11}{13} $.
- Next we invert the fractional part of the last expression, giving $ \frac{11}{13} = 0 + \frac{1}{\frac{13}{11}} $.
- For the newest "denominator", we repeat the process of taking out an integer part: since $ \frac{13}{11} = 1 + \frac{2}{11} $, we get $ \frac{11}{13} = 0 + \frac{1}{1 + \frac{2}{11}} $.
- Having isolated the integer part, we now invert the new fractional part $ \frac{2}{11} $, giving $ \frac{11}{13} = 0 + \frac{1}{1 + \frac{1}{\frac{11}{2}}} $.
- Since $ \frac{11}{2} = 5 + \frac{1}{2} $, we substitute this in to get the continued fraction $ \frac{11}{13} = 0 + \frac{1}{1 + \frac{1}{5 + \frac{1}{2}}} $.

It shouldn't be too surprising that any finite continued fraction corresponds to a rational number (just apply the algorithm backwards from the end of the continued fraction), but it may not be obvious that every rational number has a finite continued fraction. As a comparison, the decimal representation of $ \frac{1}{2} $ is $ 0.5 $, while the decimal representation of $ \frac{1}{3} $ is $ 0.333... $. It turns out that for continued fractions, there will be infinitely many coefficients if the number is irrational, and likewise if the number is irrational its continued fraction will have infinitely coefficients.

We've already seen one example of an infinite continued fraction, $ \phi = 1+ \frac{1}{1 + \frac{1}{1 + \cdots}} $. There are two ways to find the coefficients: one involves the algorithm from above, while the other leverages the fact that ϕ is a solution to $ 0 = \phi^2 - \phi - 1 $. Let's see both methods in action:

- Using the approximation $ \phi =1.618033988749895 $, we get $ \phi = 1 +0.618033988749895 $. Inverting the fraction gives $ \phi = 1 + \frac{1}{1.618033988749894} $, since $ \frac{1}{0.618033988749895} \approx 1.618033988749894 $. Notice that this new denominator is incredibly close to our original approximation for ϕ, and only differs in the last digit we've recorded. Continuing on, we get $ \phi = 1 + \frac{1}{1 + \frac{1}{\frac{1}{0.618033988749894}}} $, etc.
- The second method doesn't rely on approximations, but includes some of the observations in the last method. In particular, we saw that $ \frac{1}{\phi - 1} \approx \phi $, which upon rearranging gives $ 1 = \phi^2 - \phi $, which is precisely the equation $ 0 = \phi^2 - \phi - 1 $. What we'll do now is start with the quadratic equation for ϕ and rearrange it to get us the continued fraction fairly easily. So, starting with $ 0 = \phi^2 - \phi - 1 $, we rearrange to get $ \phi^2 = \phi + 1 $, and divide by ϕ to end up with $ \phi = \frac{\phi + 1}{\phi} $. Cleaning up the right-hand side of this last equality gives $ \phi = 1 + \frac{1}{\phi} $. What this expression suggests is that, since we have $ \phi = $ something, let's plug that something in to the left-hand side: $ \phi = 1+ \frac{1}{1 + \frac{1}{\phi}} $. Repeating this again gives $ \phi =1 + \frac{1}{1 + \frac{1}{1+\frac{1}{\phi}}} $, and so on.

This second method can be used for a lot of roots of quadratic equations, though for other irrational numbers there may or may not be some nice properties we can exploit. Before discussing how the general algorithm works in general, and implementing it in MATLAB, we'll see one more example of finding a continued fraction.

Let's find the continued fraction for $ \sqrt{2} $. First, recall that $ \sqrt{2} $ is a solution to the quadratic equation $ x^2 - 2 = 0 $. We can try to write this in the same form that worked for ϕ, namely $ x = \frac{2}{x} $, but this won't actually get us any closer. Instead, we're going to add 0 to $ \sqrt{2} $ and hope for the best: $ \sqrt{2} = \sqrt{2} + 1 - 1 = 1 + \left( \sqrt{2} - 1\right) $. This step is the first step of the general algorithm, since $ \sqrt{2} \approx 1.414 $, so we've successfully isolated the integer part. Next we invert the fractional part to get $ \frac{1}{\sqrt{2}-1} = \frac{\sqrt{2}+1}{(\sqrt{2}-1)(\sqrt{2}+1)} = \sqrt{2} + 1 $. Thus, our continued fraction should look like $ \sqrt{2} = 1 + \frac{1}{1 + \sqrt{2} } $. At this point we have the recurrence we need, and we can keep substituting $ \sqrt{2} $ back in on the left-hand side to get $ \sqrt{2} = 1 + \frac{1}{1 + 1 + \frac{1}{1 + \sqrt{2}} } = 1 + \frac{1}{2 + \frac{1}{1 + \sqrt{2}}} = 1 + \frac{1}{2 + \frac{1}{2 + \frac{1}{ 1 + \sqrt{2}}}} $, and so on.

As you may have noticed, writing out a continued fraction can get fairly unwieldly. Since we're working with simple continued fractions, the only coefficients we have to keep track of are the "integer pieces" in each level of the continued fraction. For $ \sqrt{2} $ the integer piece is 1, but all of the subsequent coefficients are 2. So that we don't have to write out $ \sqrt{2} = 1 + \frac{1}{ 2 + \frac{1}{ 2 + \frac{1}{2 + \cdots} } } $ each time, we'll use the condensed notation $ \sqrt{2} = [1; 2,2,...] $. The semicolon just indicates that the first coefficient is "special" since it's the actual integer part. As another example, $ \phi =1 + \frac{1}{1 + \frac{1}{1+\frac{1}{1 + \cdots}}} = [1; 1, 1, ...] $.

As a third example, $ \pi = [3; 7, 15, 1, 292, 1, 1,...] = 3 + \frac{1}{7 + \frac{1}{15 + \frac{1}{1 + \frac{1}{292 + \frac{1}{1 + \cdots}}}}} $. Next, we'll implement the algorithm for computing continued fractions and verify this last representation.

More generally, if $ a_0, a_1, a_2, ... $ are positive integers, then the continued fraction $ a_0 + \frac{1}{a_1 + \frac{1}{a_2 + \cdots}} $ is written $ [a_0; a_1, a_2, ...] $.

As a quick reminder, the algorithm to find the coefficients for a continued fraction involves iteratively (1) taking out the integer part of a quantity, (2) inverting the fractional part, and repeating this indefinitely; in the case of using a computer, we repeat this process until our level of precision is met or exceeded. Let's see what this looks for ϕ and π.

As a test case to verify the algorithm, and its accuracy, we'll start with ϕ's continued fraction coefficients. Recall that $ \phi = [1; 1, 1, ...] $, so as soon as the algorithm returns a coefficient that is not 1, we'll know that's around the limitation of what we can accurately compute.

phi = vpa((1 + sqrt(sym(5)))/2,d) % define phi as a symbolic variable

tempR = phi; % tempR is the number we have at the latest stage of the algorithm

numTerms = 100; % let's start with 1000 coefficients computed

%initialize an array of zeros so we have space allocated at the start for

%everything

A = zeros(1,numTerms);

%the main for loop where we compute the coefficients

for ii = 1:numTerms

%compute the integer piece of wherever we're at in the process

A(ii) = floor(tempR); % floor finds the nearest integer below tempR

%set up to get the next integer piece, etc.

tempR = 1/(tempR - floor(tempR)); % remove the integer piece and invert the fraction

end

% let's see what was computed

A

The 96th coefficient is 2, not 1, so with this precision we're getting ~100 coefficients accurately. While this may not seem like too many coefficients, using the continued fraction with these 95 coefficients would let us compute ϕ to within about 40 digits. This follow's from Loch's theorem, since for ϕ it takes 2.39 continued fraction coefficients to approximate 1 digit of accuracy for ϕ.

Next, let's do the same with $ \pi = [3; 7, 15, 1, 292, 1, 1,...] $.

symPi = vpa(sym(pi),d)

tempR = symPi; % tempR is the number we have at the latest stage of the algorithm

numTerms = 100; % let's start with 100 coefficients computed

%initialize an array of zeros so we have space allocated at the start for

%everything

A = zeros(1,numTerms);

%the main for loop where we compute the coefficients

for ii = 1:numTerms

%compute the integer piece of wherever we're at in the process

A(ii) = floor(tempR); % floor finds the nearest integer below tempR

%set up to get the next integer piece, etc.

tempR = 1/(tempR - floor(tempR)); % remove the integer piece and invert the fraction

end

% let's see what was computed

A

Just as expected! In case there's any doubt, the continued fraction coefficients for π can be externally verified using the Online Encyclopedia of Integer Sequences as A001203.

Since we'll use the algorithm over and over, it's been encoded in the function getCF available at the end of this post.

We can also visualize these coefficients using bar graphs, where the bars are only used a quick graphical aid to get a sense of the distribution, and in particular how large, the coefficients can get. Here we're using a higher precision than above since we're going to start going big with the number of coefficients we compute.

piCF = getCF(vpa(sym(pi),d),100);

% first 10 coefficients

bar(piCF(1:10))

title("First 10 continued fraction coefficients of \pi")

%first 100 coefficients

bar(piCF)

title("First 100 continued fraction coefficients of \pi")

Before getting to the core of this blog post - statistical properties of continued fraction coefficients - let's see what some of the other numbers we defined at the start of this post look like.

tic

num1CF = getCF(num1,100);

toc

bar(num1CF)

title("First 100 continued fraction coefficients of the Fibonacci constant")

tic

num2CF = getCF(num2,100);

toc

bar(num2CF)

title("First 100 continued fraction coefficients of Champernowne's constant")

tic

num3CF = getCF(num3,100);

toc

bar(num3CF)

title("First 100 continued fraction coefficients of the Prime constant")

tic

num4CF = getCF(num4,100);

toc

bar(num4CF)

title("First 100 continued fraction coefficients of the Squares constant")

tic

num5CF = getCF(num5,100);

toc

bar(num5CF)

title("First 100 continued fraction coefficients of Liouville's constant")

tic

num6CF = getCF(num6,100);

toc

bar(num6CF)

title("First 100 continued fraction coefficients of a random decimal")

Some of the coefficients get quite large, but most tend to stay closer (relatively speaking) to 1. While many of these coefficients may seem random, they have some pretty remarkable statistical properties as we'll explore next.

Since we've mentioned statistics, let's start by taking means of the coefficients and seeing what pops out. In particular, since we've been constructing continued fractions one coefficient at a time, we're going to take running means of the coefficients and see what their long-term behaviour is. Recall that the mean of numbers $ a_1, a_2, ..., a_n $ is $ \frac{a_1 + \cdots + a_n}{n} $.

We'll start by considering the running means of π's coefficients.

Means = zeros(1,100); % initialize an array to hold the means

for ii = 1:100

Means(ii) = mean(piCF(1:ii));

end

Means

Let's visualize these results with a line plot:

plot(Means)

The plot suggests that the means are tending towards the value 10, and inspecting the array of means tells us that the actual, apparent, limiting value is about 8. Let's do this same procedure for the other continued fractions we computed to use as examples and plot them all on the same graph.

Means = zeros(7,100); % initialize an array to hold the means

% this time 7 rows since we have pi and 6 other examples

% add the means for pi

for ii = 1:100

Means(1,ii) = mean(piCF(1:ii));

end

% add the means for the Fibonacci constant

for ii = 1:100

Means(2,ii) = mean(num1CF(1:ii));

end

% add the means for Champernowne's constant

for ii = 1:100

Means(3,ii) = mean(num2CF(1:ii));

end

% add the means for the Prime constant

for ii = 1:100

Means(4,ii) = mean(num3CF(1:ii));

end

% add the means for the Squares constant

for ii = 1:100

Means(5,ii) = mean(num4CF(1:ii));

end

% add the means for Liouville's constant

for ii = 1:100

Means(6,ii) = mean(num5CF(1:ii));

end

% add the means for random decimal

for ii = 1:100

Means(7,ii) = mean(num6CF(1:ii));

end

% plot all lines together

plot(Means')

ylim([0 100])

We can generate even more examples, and evidence for conjectures, by constructing more random decimals and plotting their running means.

numExperiments = 100; % the number of random numbers to generate

Means = zeros(numExperiments,100); % initialize an array to hold the means

tic % to time the computation

for jj = 1:numExperiments

%copying the code from above that generates a random decimal

tempDigits = randi(10,[1 d]) - 1; %generate an array of randomly selected digits between 1 to 9

temp = num2str(tempDigits); %convert the array to a string

temp = strrep(temp," ",""); %remove empty space in the string

randDec = vpa("0."+temp,d);

% compute its CF

randCF = getCF(randDec,100);

% compute its running means

for ii = 1:100

Means(jj,ii) = mean(randCF(1:ii));

end

end

toc % display how much time elapsed for the data generation

plot(Means')

ylim([0 100])

For most of the continued fractions, the means seem to tend to a value between 5 and 20, and all seem to be greater than 2.5. This is interesting, and with more coefficients we may generate more evidence for this observation - we may even generate a

Conjecture: the running means of (most) continued fractions coefficients is at least 2.5

(and prove it, later on).

Here though we'll pivot, and instead of exploring means, we'll explore geometric means of the coefficients. For numbers $ a_1, a_2, ..., a_n $, their geometric mean is the quantity $ (a_1a_2 \cdots a_n)^{1/n} $.

Let's start by computing the running geometric means of the continued fraction coefficients of π.

Geo = ones(1,99); % initialize the array to store geometric means

% we're taking 1 less than the size of the CF since we're ignoring the

% initial integer part

%the main for loop, where we loop through and compute the geometric means

%of the sequences

for ii = 2:length(piCF)

Geo(ii-1) = prod(nthroot(piCF(2:ii),ii)); %note the indexing

end

Geo

plot(Geo)

The coefficients seem to converge to a value just below 3. Let's see what the other example numbers indicate.

Geo = ones(7,99); % initialize an array to hold the geometric means

% this time 7 rows since we have pi and 6 other examples

% add the geometric means for pi

for ii = 1:100

Geo(1,ii) = prod(nthroot(piCF(2:ii),ii));

end

% add the geometric means for the Fibonacci constant

for ii = 1:100

Geo(2,ii) = prod(nthroot(num1CF(2:ii),ii));

end

% add the geometric means for Champernowne's constant

for ii = 1:100

Geo(3,ii) = prod(nthroot(num2CF(2:ii),ii));

end

% add the geometric means for the Prime constant

for ii = 1:100

Geo(4,ii) = prod(nthroot(num3CF(2:ii),ii));

end

% add the geometric means for the Squares constant

for ii = 1:100

Geo(5,ii) = prod(nthroot(num4CF(2:ii),ii));

end

% add the geometric means for Liouville's constant

for ii = 1:100

Geo(6,ii) = prod(nthroot(num5CF(2:ii),ii));

end

% add the geometric means for random decimal

for ii = 1:100

Geo(7,ii) = prod(nthroot(num6CF(2:ii),ii));

end

% plot all lines together

plot(Geo')

ylim([0 10])

We can also run the same experiment as above, generating a bunch of random decimals and seeing what their running geometric means are.

numExperiments = 100; % the number of random numbers to generate

Geo = zeros(numExperiments,99); % initialize an array to hold the geometric means

tic % to time the computation

for jj = 1:numExperiments

%copying the code from above that generates a random decimal

tempDigits = randi(10,[1 d]) - 1; %generate an array of randomly selected digits between 1 to 9

temp = num2str(tempDigits); %convert the array to a string

temp = strrep(temp," ",""); %remove empty space in the string

randDec = vpa("0."+temp,d);

% compute its CF

randCF = getCF(randDec,100);

% compute its running geometric means

for ii = 2:100

Geo(jj,ii-1) = prod(nthroot(randCF(2:ii),ii));

end

end

toc % display how much time elapsed for the data generation

plot(Geo')

ylim([0 10])

Remarkably, the running geometric means tend towards a value between 2 and 3 much more noticeably than the running means did. Of course with more continued fraction coefficients we'd get more data, and hopefully stronger evidence, for a conjecture, like

Conjecture: the running geometric means of (most) continued fraction coefficients converge to 2.5.

The value 2.5 was chosen as the visual mean of the lines in our experiments. So the question remains: is our conjecture true?

Unfortunately it's not actually true. The actual value is $ \approx 2.68545... $

This result is known as Khinchin's constant, and was actually conjectured and proven in the 1930s. What's even wilder is the exact value is known, and is the infinite product$ \prod_{r=1}^\infty \left( 1 + \frac{1}{r(r+2)}\right)^{\log_2 r} $. The proof of this goes well beyond this post and involves more advanced mathematical techniques developed to study dynamical systems, but the linked Wikipedia article is a great starting point for intrepid explorers curious to learn more.

How does this connect to the exploration of means? Well, the AM-GM inequality tells us that the arithmetic mean (the usual mean) is always greater than or equal to the geometric mean of the same set of numbers. Since Khinchin's constant is what most running geometric means converge to, the running (arithmetic) means of continued fraction coefficients should converge to values greater than Khinchin's constant.

This discussion ignored many (arguably) more standard topics when it comes to continued fractions, like:

- convergents, or what you get when you truncate a continued fraction after finitely many terms,
- rates of convergence for convergents, or why $ \frac{22}{7} $ and $ \frac{355}{113} $ are excellent approximations of π,
- the fact that continued fractions are the "best" rational approximants for numbers,
- MATLAB's built-in rational approximation tools, which can find continued fractions with possibly negative exponents,
- Loch's theorem, which informs how many coefficients (or levels) of a continued fraction you need for another digit of accuracy (~1 coefficient per digit),
- how roots of quadratic polynomials have periodic continued fractions (the coefficients repeat), and are the only kinds of irrational numbers with periodic coefficients,

and so on. Indeed, when looking at convergents a result similar to Khinchin's constant exists called Levy's constant.

There even exist (non-simple) continued fraction expansions for some well-known functions, like

$ \sin(x) = \frac{x}{1 + \frac{x^2}{2\cdot 3 - x^2 + \frac{2\cdot 3 x^2}{4\cdot 5 - x^2 + \frac{4\cdot 5 x^2}{6\cdot 7 - x^2 + ...}} } } $. How this expression is obtained goes far beyond the discussion here, but hopefully serves as motivation to keep exploring these fascinating objects (the Wikipedia page on Euler's continued fraction might be a good starting point).

Happy exploring!

function [num1, num2, num3, num4, num5, num6] = generateExamples(d)

%%%%%%%%%%

%Number - Fibonacci's Constant

%0.1123581321..., constructed by concatenating the Fibonacci numbers 1, 1, 2, 3, 5...

numTerms = 70; %set the number of terms in the Fibonacci sequence to use

tempDigits = zeros(1,numTerms); %generate an empty array to store the first d terms of the Fibonacci sequence

tempDigits(1) = 1;%initialize the initial condition

tempDigits(2) = 1;%initialize the initial condition

%the main for loop where we fill in the Fib. sequence

for ii = 3:numTerms

tempDigits(ii) = tempDigits(ii-1) + tempDigits(ii-2);

end

temp = num2str(tempDigits); %convert the array to a string

temp = strrep(temp," ",""); %remove empty space in the string

num1 = vpa("0."+temp,d); % tell MATLAB to use Variable Precision Arithmetic (vpa) for high precision

%%%%%%%%%%

%Number 2 - Champernowne's Constant

%0.1234567891011..., constructed by concatenating the digits 1, 2, 3, ...

tempDigits = 1:d; %generate an array of the first d digits

temp = num2str(tempDigits); %convert the array to a string

temp = strrep(temp," ",""); %remove empty space in the string

num2 = vpa("0."+temp,d); % tell MATLAB to use Variable Precision Arithmetic (vpa) for high precision

%%%%%%%%%%

%Number 3 - Prime Constant

%0.235711..., formed by concatenating the prime numbers 2, 3, 5, 7, 11...

tempDigits = primes(d); %generate an array of the primes below d

temp = num2str(tempDigits); %convert the array to a string

temp = strrep(temp," ",""); %remove empty space in the string

num3 = vpa("0."+temp,d); % tell MATLAB to use Variable Precision Arithmetic (vpa) for high precision

%%%%%%%%%%

%Number 4 - Squares Concatenated Constant

%0.1491625..., formed by concatenating the integers squared 1, 4, 9, 16...

tempDigits = 1:d; %generate an array of the first d digits

tempDigits = tempDigits.^2; %square each of the terms in the array (hence the .^)

temp = num2str(tempDigits); %convert the array to a string

temp = strrep(temp," ",""); %remove empty space in the string

num4 = vpa("0."+temp,d); % tell MATLAB to use Variable Precision Arithmetic (vpa) for high precision

%%%%%%%%%%

%Number 5 - Liouville's Constant

%0.1100010000000000000000010... formed by adding a 1 in the decimal place and zeros elsewhere.

numTerms = 8; %number of factorial terms to use

LiouvilleIndeces = factorial(1:numTerms); %first identify where the ones should go

tempDigits = zeros(1,max(LiouvilleIndeces)+1); %generate an array of the first d digits

tempDigits(LiouvilleIndeces) = 1; %square each of the terms in the array (hence the .^)

temp = num2str(tempDigits); %convert the array to a string

temp = strrep(temp," ",""); %remove empty space in the string

num5 = vpa("0."+temp,d); % tell MATLAB to use Variable Precision Arithmetic (vpa) for high precision

%%%%%%%%%%

% Number 6 - Random Decimal

% A number whose decimal digits are all uniformly chosen between

tempDigits = randi(10,[1 d]) - 1; %generate an array of randomly selected digits between 1 to 9

temp = num2str(tempDigits); %convert the array to a string

temp = strrep(temp," ",""); %remove empty space in the string

num6 = vpa("0."+temp,d); % tell MATLAB to use Variable Precision Arithmetic (vpa) for high precision

end

function A = getCF(r,numTerms)

% get the continued fraction coefficients

%

% the input r is the number we want the CF of

% numTerms is however many coefficients we want to compute

%

tempR = r; % tempR is the number we have at the latest stage of the algorithm

% if numTerms isn't specified, default to computing 10 coefficients

if ~exist('numTerms','var')

numTerms = 10;

end

%initialize an array of zeros so we have space allocated at the start for

%everything

A = zeros(1,numTerms);

%the main for loop where we compute the coefficients

for ii = 1:numTerms

%compute the integer piece of wherever we're at in the process

A(ii) = floor(tempR);

%set up to get the next integer piece, etc.

tempR = 1/(tempR - floor(tempR)); % remove the integer piece and invert the fraction

end

end

The File Exchange provides access to more than 46,000 free MATLAB projects. For 33,854 of these projects, the authors included the word "toolbox" in the title. "Toolbox" is clearly a useful... read more >>

]]>Let's try to clarify things here. Consider the words project and toolbox. What is the difference?

A project is the entirety of the thing that you work on. These days, it often lives in a GitHub repository (so "project" and "repo" are sometimes used interchangeably). A toolbox is the thing you distribute to people who will then install and use it. So toolboxes derive from projects, but in general they aren't the same thing. A simple working project might be identical with its distributable toolbox. But as you add tests and other supporting material that won't ship with the toolbox, the project becomes the larger containing entity.

Here's how I sometimes picture it: the project is the garage and the toolbox is the car.

Now let me put on top of this picture some of the related words we use.

That's the basic idea of what a toolbox is. But let's zoom in a little. What should a modern standard toolbox look like?

For years we've been fielding requests about what constitutes a good toolbox. People want to know: How should I organize my toolbox code? How can I avoid the common pitfalls of distributing code? How can I encourage others to collaborate with me?

To answer these questions, we recently rolled out guidance on MATLAB Toolbox Best Practices. You can read all the details as published in this GitHub repo:

**mathworks/toolboxdesign: Best practices for creating MATLAB toolboxes.**

Or if you prefer, you can browse through this simple example instance of a standard toolbox, the Arithmetic Toolbox.

**mathworks/arithmetic: Example toolbox created to showcase MATLAB Toolbox Best Practices.**

Here is a short list of things we recommend.

- use GitHub
- include a README
- make it a MATLAB Project
- put all the "working code" into a toolbox/ subfolder
- package with MLTBX files
- create examples with Live Scripts (MLX files)
- create tests that can be run using GitHub Actions

We don't claim you have to do all these things in order to be a good person. These are conventions that we recommend. If you follow this advice (or even just some of it), it will be easier for others to find and use your code. The tools you create will have more reach and more impact.

I'll close by observing that this is an early version of a living standard, so you can expect it to evolve over time. In fact, we hope you will help us evolve it by opening issues or participating in discussions. We hope it will be useful to you as you make and share tools in MATLAB.

]]>Today's guest article is by Hans Scharler. Live Interview on Discord Recently, I had the privilege of engaging in a captivating conversation with Heather Gorr, Ph.D., a senior MATLAB Product Manager... read more >>

]]>Recently, I had the privilege of engaging in a captivating conversation with Heather Gorr, Ph.D., a senior MATLAB Product Manager and Data Science expert at MathWorks. Our discussion covered a broad range of topics, from her daily work routine to her personal journey into the world of data science.

I held the interview live on the MATLAB Discord Server. I even kicked off the AMA with a Lex Fridman-inspired podcast introduction.

Today we're privileged to be joined by Heather, a force in the world of data science, a Senior MATLAB Product Manager at MathWorks. Trained as a materials science engineer and physicist, Heather's intellectual journey is one marked by a fierce hunger for knowledge and an inexhaustible curiosity that sees no boundaries. She has shown us that the path from the academic world to the tech industry isn't a straight line, but a beautiful winding road of discovery and continuous learning. Today, Heather stands not only at the edge of today's technological innovation, but also at the dawn of tomorrow's potential, creating a space where technology and human ingenuity dance in a never-before-seen ballet. As we dive into this conversation, remember that it is through the insights of brilliant minds like Heather's that we get a glimpse into the fascinating future that lies ahead of us.

Buckle up, and let's go on this remarkable journey together.

Here, I'll share nine insights that emerged from our conversation—each one offering a unique perspective on the field of data science and Heather's approach to her work.

I asked Heather to describe one day from the past two weeks to give us some insight into what she may face on a typical day. Heather happened to choose the same day as the Discord AMA. Heather's description of that day reveals the diverse set of tasks and responsibilities that come with her role at MathWorks. From collaborating on a gorilla facial recognition project to hosting webinars and engaging in live streams, her role is dynamic and versatile, offering intellectual stimulation at every turn.

Heather's drive to embrace a wide array of projects and tasks can be traced back to her insatiable curiosity and pursuit of knowledge. She puts an emphasis on continuous learning as it is a critical facet of the data science field, which is always evolving and expanding.

Why MathWorks? Teaching others, engaging with users, discovering cool stuff, and there's always something new to learn.

- Heather Gorr

Heather started studying physics, spent time in Nashville being a musician, and then found her passion in engineering and data science. This career journey serves as a reminder that career paths are not always straightforward—it's okay, and even beneficial, to take time to find what truly motivates and interests you. Heather now plays in a band and gigs out almost every weekend in New England.

Heather joined Mathworks in January 2013. An early but defining moment in her career was responding to a call to action. Someone at MathWorks needed help on a deep learning project and Heather volunteered to help out and apply her experience from her Ph.D. work. That someone turned out to be Cleve Moler, the creator of MATLAB. Heather's journey is all about responding to opportunities. Her response to Cleve Moler's call to action and the resulting projects, collaborations, and learning experiences illustrate the potential value of seizing opportunities. Being open to unexpected opportunities can lead to rewarding experiences that stretch far beyond your comfort zone.

During the interview, Heather opened up about her dyslexia and ADHD and how she navigated academic and professional challenges providing an important perspective on the intersection of personal experiences and professional growth. Personal experiences, including the challenges we face, can shape our professional journey in significant ways. She felt it was important to share personal struggles and to be real about who we are.

Heather's role at MathWorks involves a mix of highly technical work, such as working on machine learning projects, and communication-focused tasks, such as teaching and hosting webinars. This balance underscores the importance of both technical expertise and communication skills in a data science role.

I asked Heather about a specific machine learning project that she worked on and she told us about a gorilla facial recognition project. What was fascinating about this project is that Heather was not trained specifically for this project, but she was prepared to handle it by applying her knowledge from other deep learning projects. Heather's work on the gorilla facial recognition project showcases the innovative applications of data science and machine learning and how important it is to work on diverse projects. You never know what you will be working on and how your experience will be impactful.

Heather frequently mentions collaborations in her work, such as her projects with Cleve Moler and others at MathWorks. She wanted to highlight the importance of collaborative work in data science, where different team members can bring unique perspectives and skills to solve complex problems.

Heather made it clear that she loved teaching others. Her teaching experience and her enthusiasm for sharing her work through webinars and blog posts highlight her passion for education. It's a reminder of the importance of knowledge sharing and mentorship, which not only helps others learn but also fosters a strong, collaborative community.

My conversation with Heather Gorr not only shed light on her work and philosophy but also provided unique insights into the ever-evolving world of data science. Her journey, experiences, and perspectives have a lot to offer to anyone interested in the field. To follow Heather's work and contributions, you can find her on LinkedIn or visit her MathWorks community profile.

]]>Today's guest article is by Wesley Hamilton, a STEM Outreach engineer here at MathWorks. Wesley's roles involve popularizing the use of MATLAB and Simulink for younger folk, especially towards... read more >>

]]>Sudoku is a popular puzzle in which puzzlers fill in a 9x9 grid with the digits 1-9 such that a number of conditions are met: each 3x3 subgrid, row, and column must contain all digits. While Sudoku is usually referred to as a logic puzzle, it is actually possible to recast these puzzles as a (mathematical) constrainted optimization problem, wherein the optimal solution gets interpreted as the inputs for a completed Sudoku puzzle. What's crazy about this formulation though is that the function we optimize will be the zero function, so that any input is optimal; all of the legwork is in satisfying the optimization problem's constraints. Interested? Read on to learn more!

The presentation and code in this blog are adapted from a workshop developed by Dr. Tim Chartier, based on code developed in collaboration with Dr. Amy Langville of the College of Charleston and her students.

Table of Contents

Rules and History of Sudoku

Solving Sudoku Puzzles, part 1

Linear Programming

Sudoku as a Linear Program

Box constraints

Row constraints

Column constraints

Subgrid constraints

Solving Sudoku Puzzles, part 2

Ordering of decision variables

Box constraints

Row constraints

Column constraints

Subgrid constraints

The givens

Further directions

Just the code

Solving Sudoku Puzzles, part 1

Linear Programming

Sudoku as a Linear Program

Box constraints

Row constraints

Column constraints

Subgrid constraints

Solving Sudoku Puzzles, part 2

Ordering of decision variables

Box constraints

Row constraints

Column constraints

Subgrid constraints

The givens

Further directions

Just the code

The classic version of Sudoku most people are familiar with consists of a 9x9 grid divided into nine 3x3 subgrids. The grid will start with some boxes filled in with the digits 1-9, and the goal of the puzzle is to fill in each box with one of the 9 digits in a way that the following three conditions are satisfied:

- Every row of the grid contains each of the digits 1-9, and each digit only appears once,
- Every column of the grid contains each of the digits 1-9, and each digit only appears once,
- Every 3x3 subgrid contains each of the digits 1-9, and each digit only appears once.

Here is an example of a 9x9 Sudoku puzzle, retrieved from websudoku.com on May 3, 2023.

One variation of the puzzle is the 4x4 version, in which a 4x4 grid is split into four 2x2 subgrids and the three conditions are modified so that only the digits 1-4 are used. This version is often used as an easier version than the 9x9, and makes visualizing what happens in the computational side of things a bit more tractable as well.

Here is an example of a 4x4 Sudoku puzzle, generated by Dr. Tim Chartier, MoMath's 2022–2023 Distinguished Visiting Professor for the Public Dissemination of Mathematics.

Sudoku was popularized in Japan in the late 1980s, before grabbing worldwide attention in the early 2000s. Unbeknownst to many, very early versions of Sudoku appeared in France in the late 1800s as puzzle-variations of magic squares; see the Wikipedia page and its references for more information.

The next section walks through the basics of solving Sudoku puzzles. If you're already very familiar with Sudoku, feel free to skip down to the text right after the completed 4x4 puzzle.

So how does one solve a Sudoku puzzle? In practice, we iteratively apply the three conditions to identify which numbers go in which boxes until all boxes are filled in. In harder puzzles we may be forced to make a choice between two equally likely entries; in these situations we'll often try writing in one of the options and seeing what other numbers can get filled in based on that, and then either run into a contradiction, or solve the board.

Let's see this process in action with Tim's 4x4 puzzle. For clarity, the three conditions we'll be iteratively checking are:

- Every row of the grid contains each of the digits 1-4, and each digit only appears once,
- Every column of the grid contains each of the digits 1-4, and each digit only appears once,
- Every 2x2 subgrid contains each of the digits 1-4, and each digit only appears once.

First we'll focus on the upper-right 2x2 subgrid and identify where we may be able to fill in a number, in either the red circle or the blue square.

Applying condition 3., we know that the digits in these empty grid boxes have to be 2 and 3. So which digit goes where? By condition 2., each column of the grid can only contain each digit exactly once, and the fourth column (containing the blue square) already contains a 2, so it is not possible to write the digit 2 in the blue square (similarly, by condition 1. the row with the blue square already has the digit 2 somewhere in the row). Thus, the digit 2 goes in the square with the red circle.

Applying condition 3. to the same subgrid, we conclude the remaining grid box must contain the digit 3.

Continuing in this way we'll eventually be able to fill in every box; for example, the next move might be writing in the digit 4 in the only remaining box in the second row. Feel free to try solving this 4x4 on your own, and check your solution against ours:

Awesome! But what if we want to have MATLAB solve the puzzle for us? An initial approach might be to code (in some way) these three constraints and have our computer iterate through each row, column, and subgrid writing in digits one-by-one. What happens if there are two digits that might work in a grid box? Are there other logical steps that might occur in a harder puzzle that we can't anticipate?

Instead of taking this approach, we'll take a very different, and possibly cleverer, approach making use of some deep and powerful computational mathematics known as Linear Programming.

Linear programming is, simply put, an optimization problem in which the objective function and constraints are all linear. Let's break down what each of these pieces mean:

- an optimization problem is a problem in which one seeks the maximum (or minimum) of an objective function, i.e. find a value for x that maximizes a given function $ f(x) $;
- an objective function is the function we want to maximize or minimize. We could just call it a function, but often in optimization problems multiple functions come in to play, including functions that add constraints to the problem, in addition to the function we're optimizing;
- a linear objective function is an objective function that is linear, i.e. takes the form $ f(x) = a_1 x_1 + \cdots + a_n x_n $, for a vector $ x = \pmatrix{x_1 \cr \vdots \cr x_n} $. Writing $ a = \pmatrix{a_1 \cr \vdots \cr a_n} $, this function can also be expressed $ f(x) = a^t x $, where the t superscript means we're taking the matrix transpose of a vector.
- a constraint is a rule for which values of x are feasible, i.e. x must be non-negative, or if x is a vector then all of the components of x must be non-negative;
- a linear constraint is a constraint that's written in the form of a linear equation. For example, if we're working with a vector $ x = \pmatrix{x_1 \cr x_2 \cr x_3} $, then a linear constraint might look like $ x_1 - x_2 + 3 x_3 \leq 0 $. If we write $ b=\pmatrix{1 \cr -1 \cr 3} $, then this linear constraint can also be written $ b^t x \leq 0 $, where the t superscript means we're taking the matrix transpose of a vector.

Linear constraints can actually come in two forms: linear inequalities ($ Ax\leq b $) and linear equalities ($ A_{eq} x = b_{eq} $). Both kinds of constraints The kinds of linear programs we'll work with here can all be expressed in the following form:

find a vector x that maximizes $ f(x) = a^t x $ subject to $ Ax \leq b $, $ A_{eq} x = b_{eq} $, and $ x\geq 0 $,

where $ a, b, b_{eq} $ are given vectors, and $ A, A_{eq} $ are given matrices encoding the linear constraints. Note that if $ x = \pmatrix{x_1 \cr \vdots \cr x_n} $, then the notation $ x\geq 0 $ means $ x_i \geq 0 $ for $ i=1, ..., n $. The components $ x_i $ of the vector x are often called decision variables, since these are the values the linear program needs to decide, or find values for.

In this post we'll be using MATLAB's intlinprog function, which you can treat as a black box linear program solver that finds integer solutions to a linear program. This solver automatically selects which algorithm to use and works to find a solution, so the only work we have to do is write down the linear program in a way it can understand. We'll do that next.

Keep in mind that we're using a matrix multiplication formulation to express the constraints. For background on matrix multiplication and basic linear algebra concepts, check out our interactive courseware module on Matrix Methods of Linear Algebra.

To set up a Sudoku puzzle as a linear program, we need to identify a few things:

- What are the variables that we're solving for?
- What are the constraints we want to impose?
- What is the function we want to maximize?

Why not identify the function before the constraints? Well, turns out we don't actually care what the function is, which will be clear by the end of this section.

So what should the variables be? For a 4x4 grid we have 16 total boxes to fill in, and an initial approach might be to use variables $ x_1, ..., x_{16} $ where each $ x_i $ should take a value between 1 and 4. While reasonable, there's actually a computationally easier approach.

Instead of using the variables to tell us which digit is in each box, let's use variables to tell us whether or not a digit is in a box. By this we mean using variables $ x_{ijk} $, where the pair $ (i,j) $ tells us which box we're looking at, and k specifies which digit we care about. If $ x_{ijk} = 0 $, then digit k should not be written in box $ (i,j) $, while if $ x_{ijk} = 1 $, then digit k belongs in box $ (i,j) $.

As an example, consider the 4x4 grid we considered earlier (with some of the box indeces added in the top-left corners for clarity):

To number the grid boxes, $ (1,1) $ will be the top-left box, $ (2,1) $ will be the box beneath it, and $ (1,2) $ will be the box to the right of $ (1,1) $. In other words, the first index i in $ (i,j) $ counts how many rows down we go, and the index j counts how many columns to the right we go.

So with this notation, we have the digit 4 in the $ (1,4) $ box, the digit 1 in the $ (2,3) $ box, and the digit 2 in the $ (3,4) $ box at the start (among others). In particular, the digits $ 1,2,3 $ do not appear in the $ (1,4) $ box. Using our variables to encode this information, we'd have $ x_{1,4,4}=1 $ and $ x_{1,4,1} = x_{1,4,2} = x_{1,4,3} = 0 $. At the start of the puzzle we did not know which digit went in the $ (1,3) $ box, so at the start $ x_{1,3,k} $ is not specified for $ k=1,...,4 $. Using the rules of Sudoku, we identified that a 2 belongs in that box, so by using the constraints we concluded that $ x_{1,3,2} = 1 $ and $ x_{1,3,1}=x_{1,3,3} = x_{1,3,4} = 0 $. Now that we have our variables specified, we're in a position to specify the constraints.

So how do we encode the Sudoku constraints into this framework? Keep in mind there are three constraints from the original puzzle which we'll need to encode in turn. There's actually a fourth constraint we need to encode, as discussed just above, in which each box can only have a single digit written in it.

The first constraint we need to encode is that each box can only have one digit written in it. So, for the $ (i,j) $ box, as soon as the digit 1 is written in it (for example), we get $ x_{i,j,1} = 1 $ and $ x_{i,j,2} = x_{i,j,3} = x_{i,j,4} = 0 $. The trick here is that each variable should take the value of either 0 or 1 (and nothing else), so if we sum them up and set the sum equal to 1, that ensures only one of the variables can take the value 1. Written out explicitly, for each of the boxes $ (i,j) $ we require:

$ x_{i,j,1} + x_{i,j,2} + x_{i,j,3} + x_{i,j,4} = 1 $.

The first of the puzzle constraints is that each row of the grid has each of the digits, and that each digit only appears once. Since we're looking at the rows, we're going to be focusing on sets of variables $ x_{i,1,k}, x_{i,2,k}, x_{i,3,k}, x_{i,4,k} $ for $ i = 1,...,4 $ and $ k=1,...,4 $. The trick here is the same as above, in that each variable should take the value of either 0 or 1 (and nothing else), so if we sum them up then, in a solved puzzle, we should get exactly 1:

$ x_{i,1,k} + x_{i,2,k} + x_{i,3,k} + x_{i,4,k} = 1 $.

As soon as one of the $ x_{i,j,k} $ takes the value 1, the rest are forced to be 0. This ensures that, in the ith row, the digit k only appears once. This constraint should hold for every possible row i and possible digit k.

The column constraint is set up exactly the same way as the row constraint, except here we're summing over the row indeces instead of the column indeces:

$ x_{1,j,k} + x_{2,j,k} + x_{3,j,k} + x_{4,j,k} = 1 $,

and this equality should hold for every possible column j and digit k.

The last of the Sudoku puzzle constraints to implement is the subgrid rule, in which each of the four digits is present in each of the 2x2 subgrids. As a specific example to guide our work, consider the subgrid in the top-left consisting of the boxes $ (1,1), (1,2), (2,1), (2,2) $. The subgrid constraint specifies that each digit k appears just once among these four boxes, so we'll actually have four constraints for this subgrid:

$ x_{1,1,k} + x_{1,2,k} + x_{2,1,k} + x_{2,2,k} = 1 $, $ k=1,...,4 $.

There are four of these subgrids though, so we also need to incorporate those constraints:

$ x_{1,3,k} + x_{1,4,k} + x_{2,3,k} + x_{2,4,k} = 1 $, $ k=1,...,4 $,

$ x_{3,1,k} + x_{3,2,k} + x_{4,1,k} + x_{4,2,k} = 1 $, $ k=1,...,4 $,

$ x_{3,3,k} + x_{3,4,k} + x_{4,3,k} + x_{4,4,k} = 1 $, $ k=1,...,4 $.

That's a lot! Good thing we don't have to work with all of these equations by hand and have MATLAB to do all the heavy lifting for us!

The last thing we need to do is specify what our objective function is, the thing we're actually minimizing. The thing is, though, we've already encoded everything we want to keep track of through the variables and constraints, so there isn't any need for a specific function. This means that, for our purposes, we're free to set the function to be $ f(x) = 0 $. In this case, any vector x maximizes the function, so all of the work goes into finding a vector x that satisfies the constraints. If this still seems weird, you're not alone; if it helps, you're welcome to just have faith that we'll get a solution using this weird function, that's perfectly fine.

To summarize our work in this section, we're solving the following linear program:

find a vector $ x = \pmatrix{x_{1,1,1}\cr x_{1,1,2}\cr \vdots \cr x_{4,4,4}} $ that maximizes the function $ f(x) = 0 $, subject to the constraints

- $ x_{i,j,1} + x_{i,j,2} + x_{i,j,3} + x_{i,j,4} = 1 $ for $ i = 1,...,4 $ and $ j=1,...,4 $, (box constraint)
- $ x_{i,1,k} + x_{i,2,k} + x_{i,3,k} + x_{i,4,k} = 1 $ for $ i = 1,...,4 $ and $ k=1,...,4 $, (row constraint)
- $ x_{1,j,k} + x_{2,j,k} + x_{3,j,k} + x_{4,j,k} = 1 $ for $ j = 1,...,4 $ and $ k=1,...,4 $, (column constraint)
- $ x_{1,1,k} + x_{1,2,k} + x_{2,1,k} + x_{2,2,k} = 1 $ for $ k=1,...,4 $, (top-left subgrid constraint)
- $ x_{1,3,k} + x_{1,4,k} + x_{2,3,k} + x_{2,4,k} = 1 $ for $ k=1,...,4 $, (top-right subgrid constraint)
- $ x_{3,1,k} + x_{3,2,k} + x_{4,1,k} + x_{4,2,k} = 1 $ for $ k=1,...,4 $, (bottom-left subgrid constraint)
- $ x_{3,3,k} + x_{3,4,k} + x_{4,3,k} + x_{4,4,k} = 1 $ for $ k=1,...,4 $. (bottom-right subgrid constraint)

Note that all of this work was just for the 4x4 version of Sudoku; how would we modify everything for the original 9x9 version? Take a minute to think about it, and then read on for a solution... Afterwards we'll implement everything in MATLAB and actually solve some puzzles.

For the 9x9 version, the straightforward changes are that the sums should include 9 terms everywhere, and the indeces $ i,j,k $ should each range from 1 to 9 everywhere. The more-involved modification is that there will be nine subgrid constraints instead of four, since the 9x9 grid contains nine 3x3 subgrids. The top-left 3x3 subgrid will have the constraint

$ x_{1,1,k} + x_{1,2,k} + x_{1,3,k} + x_{2,1,k} + x_{2,2,k} + x_{2,3,k} + x_{3,1,k} + x_{3,2,k} + x_{3,3,k} = 1 $, for $ k = 1, ..., 9 $.

The other subgrid constraints can be written out in a similar (slightly cumbersome) way.

In this section, we'll encode the constraints we described above and actually solve a Sudoku puzzle using linear programming.

There's one other set of constraints that we need to impose, that are independent of our framework and the rules of Sudoku. Remember that a Sudoku puzzle starts with some of the grid boxes filled in, which we can specify by a matrix G specifying the row number, column number, and digit of each of the initial entries. As a concrete example, for our 4x4 example we have $ G = \pmatrix{1 & 4 & 4 \cr 2 & 1 & 2\cr 2 & 3 & 1\cr 3 & 1 & 3 \cr 3 & 4 & 2} $; the first row is $ (1\, 4\, 4) $ since we start with the digit 4 in the first row and fourth column, etc.

The first thing to do is specify the size of the Sudoku grid where working with:

n=4; % for 4 by 4 sudoku problem

%n=9; % for 9 by 9 sudoku problem

Keep in mind that the vector of variables, specifying which digits appear where in the grid, will have $ n^3 $ entries: $ n\times n $ entries corresponding to each box in the grid, and another n for the n possible digits. One thing we need to be careful about is what order our decision variables appear in our vector. Three (plus one) reasonable options are:

- Order the variables by incrementing the digit index first, then the column index, then the row index: $ x_{1,1,1}, x_{1,1,2}, ..., x_{1,1,4}, x_{1,2,1}, x_{1,2,2}, ... $ This enumeration will list all the digit design variables first before going to a new box.
- Order the variables by incrementing the row index first, then the column index, then the digit index: $ x_{1,1,1}, x_{2,1,1}, ..., x_{4,1,1}, x_{1,2,1}, x_{2,2,1}, ... $ This enumeration will iterate through the entire grid by going down columns then across rows, before increment the digit index and iterating through again.
- Order the variables by incrementing the column index first, then the row index, then the digit index: $ x_{1,1,1}, x_{1,2,1}, ..., x_{1,4,1}, x_{2,1,1}, x_{2,2,1}, ... $ This is similar to the previous possible enumeration, except we traverse the grid along rows first, then along columns, then along digits.
- Order the variables randomly.

The fourth option isn't actually reasonable, since you'll have significantly more bookkeeping to manage (it is, technically, possible to implement though!). Here we'll take option two, so that every set of $ n^2 $ components of the decision vector x correspond to the same digit enumerated along columns and then rows.

Namely, for our implementation here, x(1) tells us whether or not digit 1 is in the $ (1,1) $ box, x(n^2+1) tells us if the digit 2 is in the $ (1,1) $ box, etc. Likewise, x(2) tells us if the digit 1 is in the $ (2,1) $ box, x(3) tells us if the digit 1 is in the $ (3,1) $ box, x(n^2+2) tells us if the digit 2 is in the $ (2,1) $ box, etc.

Extra Challenge: implement this Sudoku solver using the other two (reasonable) methods of ordering the design variables. Extra bonus points for implementing the random ordering approach.

Next let's encode the constraints as a matrix $ A_{eq} $, so that in our linear program we'll have $ A_{eq} x = b $ as our constraint. The matrix $ A_{eq} $ should be a $ C\times n^3 $ matrix, where C is the number of constraints. The vector b will be a vector of 1s, being the right-hand side of each of our constraints.

What is C in this case? The constraints we identified are:

- for each of the $ n^2 $ grid boxes we have a single constraint, totalling $ n^2 $ constraints,
- for each of the n rows, and each of the n digits, we have a constraint, totalling $ n^2 $ constraints,
- for each of the n columns, and each of the n digits, we have a constraint, totalling $ n^2 $ constraints,
- for each of the n subgrids, we have n constraints telling us each digit must appear in that subgrid, totalling $ n^2 $ constraints,
- we start with $ |G| = \# $of rows of G entries already filled in, totalling $ |G| $ constraints.

Summing these numbers up, we get $ C = 4n^2 + G $. In our approach, we'll define $ A_{eq} $ with $ C = 4n^2 $ first and fill in the first four constraints in the above bulleted list. Once finished, we'll add $ |G| $ more rows to $ A_{eq} $ and incorporate the givens. So let's start by initializing $ A_{eq} $:

Aeq=zeros(4*n^2, n^3);

Here's a sketch of the matrix we'll be filling in during the rest of this section:

At the top we've indicated how the decision variables are ordered: the first $ n^2 $ columns correspond to whether or not digit 1 is written in the boxes $ (1,1), (2,1), ..., (n,n) $, the next $ n^2 $ columns correspond to digit 2, and so on, exactly as we specified when we decided to use option 2 for ordering above. The rows are split based on the constraints we're encoding as well as the given entries of the initial Sudoku puzzle.

In the next few subsections we'll iterate through this matrix and write-in each of constraints. The subsections showcase what the matrix will look like (i.e. where the 0's and 1's go), before displaying the MATLAB code that accomplishes this (with plenty of comments).

%% Fill in Aeq for the n^2 constraints requiring that each element

% of the matrix must contain an integer 1:n

for i=1:n % for each row (of the Sudoku grid)

for j=1:n % for each column (of the Sudoku grid)

for k=1:n % increment through each digit

% the (i,j) grid box is associated to row (i-1)*n+j of A_eq

% incrementing the digit k comes with n^2 entries along the array

% though the box index (i,j) stays the same as we incorporate all

% digits

Aeq((i-1)*n+j,(i-1)*n+j + (k-1)*n^2)=1;

end

end

end

% if we had ordered the design variables so that we increment through

% digits first, we could replace this code with

%

% for i=1:n^2

% Aeq(i,(i-1)*n+1:i*n)=ones(1,n);

% end

%% Fill in Aeq for the n^2 constraints requiring that each row

% can have only 1 of each integer 1:n

for k=1:n % for each digit

for i=1:n % fix the row of the Sudoku grid

for j=1:n % increment the column indeces we're summing up

% note that we've already added the n^2 box constraints

% so updating A_eq starts at the n^2 row

Aeq(n^2+i+(k-1)*n,1+(i-1)+(j-1)*n+(k-1)*n^2)=1;

end

end

end

% if we had ordered the design variables so that we increment through

% rows first, we could replace this code with

%

% for i=1:n^2

% Aeq(n^2+i,(i-1)*n+1:i*n)=ones(1,n);

% end

%% Fill in Aeq for the n^2 constraints requiring that each column

% can have only 1 of each integer 1:n

for i=1:n^2 % quick trick since the column constraints correspond to rows of adjacent 1s

% we've added the n^2 box constraints and n^2 row constraints

% so these constraints start on row 2*n^2

% since our variable indexing goes along columns first,

% when incorporating these constraints

Aeq(2*n^2 + i,(i-1)*n+1:i*n)=ones(1,n);

end

%% Fill in Aeq for the n^2 constraints requiring that each submatrix

% can have only 1 of each integer 1:n

% note that n is a square, so sqrt(n) is an integer

% m and l are indeces for the subgrids - m=1, l=1 is the top-left subgrid

for m=1:sqrt(n)

for l=1:sqrt(n) % for the (m,l) subgrid

for j=1:n

for k=1:sqrt(n)

% added n^2 box, n^2 row, and n^2 column constraints

% the subgrid constraints should be added starting at row 3*n^2

Aeq(3*n^2+(m-1)*sqrt(n)*n+(l-1)*n+j,(j-1)*n^2+(l-1)*sqrt(n)+...

(m-1)*sqrt(n)*n+(k-1)*n+1:...

(j-1)*n^2+(l-1)*sqrt(n)+(m-1)*sqrt(n)*n+(k-1)*n+sqrt(n))=1;

end

end

end

end

Next, we need to incorporate the information that some of the entries are already filled in. We'll do this by specifying the givens in the form of the matrix mentioned at the start of this section, and then writing this information into $ A_{eq} $ in newly added rows below the information we've already incorporated.

For this example, we'll use the same matrix G for our favourite 4x4 example:

% If any values of the Sudoku matrix are given, add these as

% constraints

% Enter each given value in matrix as triplet (row, col, integer)

% Example for 4-by-4 sudoku

Givens=[1 4 4;

2 1 2;

2 3 1;

3 1 3;

3 4 2];

With this matrix specified, let's now append rows to $ A_{eq} $ and incorporate these constraints specifying our given digits:

% turn these given elements into their appropriate position in the x vector

% of decision variables.

g=size(Givens,1);

for i=1:g % for each of the givens

% we've added in 4*n^2 constraints already, so the first of the givens

% should be added at row 4*n^2 + 1, etc.

% the column entry to write the given constraint is at

% rowId + colId*n + digitId*n^2

% based on how we've organized the decision variables

Aeq(4*n^2+i,Givens(i,1)+(Givens(i,2)-1)*n+(Givens(i,3)-1)*n^2)=1;

end

This concludes writing out $ A_{eq} $. The final matrix should resemble this sketch:

Note the 1 at the matrix element corresponding to the given $ (2,1,2) $.

Since all of the constraints are set-up to equal 1, the vector b in $ A_{eq}x = b $ should be a vector of 1s:

beq=ones(4*n^2+g,1);

With the variables and constraints in place, we're almost ready to have MATLAB generate our solution. Looking at the documentation for intlinprog, we need to explicitly specify some other pieces:

- intcon, or the variables in x that should be integers. We want all of the components of x to be integers, so this should be the array 1:numVariables,
- numVariables, or the length of the variable x ($ n^3 $ in this case),
- lb, the vector of lower bounds for each of the variables. Since each $ x_{ijk} $ is 0 or 1, lb will be a vector of 0s,
- ub, the vector of upper bounds for each of the variables. Since each $ x_{ijk} $ is 0 or 1, ub will be a vector of 1s.

Let's go ahead and specify each of these variables:

%% run matlab's "intlinprog" command

numVariables = size(Aeq,2);

intcon = 1:numVariables;

lb = zeros(1,numVariables);

ub = ones(1,numVariables);

With everything in place, let's generate a solution to our 4x4 puzzle.

[x,fval,exitflag,output]=intlinprog(zeros(n^3,1),intcon,[],[],Aeq,beq,lb,ub);

% note the two arguments of []

% these inputs correspond to inequality constraints

% since we're not using inequality constraints, we leave these inputs empty

exitflag;

output;

x

For our 4x4 example, x is a vector with $ 4^3 = 64 $ entries, each of which is 0 or 1. While this is a solution to the linear program, it's not yet helpful for our purposes; what would be nice is if we could convert this vector into a grid of digits 1 to 4, corresponding to a Sudoku solution. Let's do that now and see the result:

%% turn matlab's outputed solution vector x into sudoku terms

S=zeros(n,n); %initialize the grid of solutions

for k=1:n % for each digit

% find the entries in the solutions vector x corresponding to digit k

% that have a value of 1 (i.e., the digit should be written in that box)

subx=find(x((k-1)*n^2+1:k*n^2));

for j=1:n % loop through

row=mod(subx(j),n);

if row==0

row=n;

end

S(row,j)=k;

end

end

% Sudoku Matrix S

S

Compare this to the solution we generated by hand earlier, and see that MATLAB succeeded in finding the same solution!

Next up is to try this with your own puzzles. The code has been copied below in one section for your convenience, with the sample 9x9 puzzle shown at the start encoded in the given matrix. Check that MATLAB can solve that as well, and try your own!

Interested in exploring some other interesting questions related to this post? Here are some further directions to get you started:

- What happens if we don't add any givens to the linear program, i.e. we start with an empty Sudoku board? We should end up with a solution - is it unique? How might we interpret this solution? (The solver may take a bit longer to find the solution, but it will find a solution.)
- How might you impose other constraints to transform the Sudoku puzzle into something possibly more difficult? Two options might be: (a) requiring that each diagonal of the Sudoku grid contains each of the digits 1-9 once and only once, and (b) requiring that the middle boxes of each subgrid form their own subgrid requiring each of the digits 1-9 once and only once.
- Pick another variant of Sudoku and implement those constraints, such as if the subgrids are shaped differently.

%n=4; % for 4 by 4 sudoku problem

n=9; % for 9 by 9 sudoku problem

Aeq=zeros(4*n^2, n^3);

%% Fill in Aeq for the n^2 constraints requiring that each element

% of the matrix must contain an integer 1:n

for i=1:n

for j=1:n

for k=1:n

Aeq((i-1)*n+j,(i-1)*n+j + (k-1)*n^2)=1;

end

end

end

%% Fill in Aeq for the n^2 constraints requiring that each row

% can have only 1 of each integer 1:n

for k=1:n

for i=1:n

for j=1:n

Aeq(n^2+i+(k-1)*n,1+(i-1)+(j-1)*n+(k-1)*n^2)=1;

end

end

end

%% Fill in Aeq for the n^2 constraints requiring that each column

% can have only 1 of each integer 1:n

for i=1:n^2

Aeq(2*n^2+i,(i-1)*n+1:i*n)=ones(1,n);

end

%% Fill in Aeq for the n^2 constraints requiring that each submatrix

% can have only 1 of each integer 1:n

for m=1:sqrt(n)

for l=1:sqrt(n)

for j=1:n

for k=1:sqrt(n)

Aeq(3*n^2+(m-1)*sqrt(n)*n+(l-1)*n+j,(j-1)*n^2+(l-1)*sqrt(n)+...

(m-1)*sqrt(n)*n+(k-1)*n+1:...

(j-1)*n^2+(l-1)*sqrt(n)+(m-1)*sqrt(n)*n+(k-1)*n+sqrt(n))=1;

end

end

end

end

% If any values of the Sudoku matrix are given, add these as

% constraints

% Enter each given value in matrix as triplet (row, col, integer)

% Example for 9-by-9 sudoku

Givens=[1 1 4;

1 2 9;

1 6 5;

1 9 1;

2 2 5;

2 7 7;

2 8 2;

2 9 9;

3 3 2;

3 6 8;

3 8 4;

3 9 5;

4 1 2;

4 3 8;

4 5 4;

4 6 3;

5 2 7;

5 4 8;

5 6 6;

5 8 3;

6 4 5;

6 5 9;

6 7 1;

6 9 8;

7 1 7;

7 2 8;

7 4 6;

7 7 4;

8 1 3;

8 2 4;

8 3 9;

8 8 6;

9 1 5;

9 4 3;

9 8 1;

9 9 7];

% turn these given elements into their appropriate position in the x vector

% of decision variables.

g=size(Givens,1);

for i=1:g

Aeq(4*n^2+i,Givens(i,1)+(Givens(i,2)-1)*n+(Givens(i,3)-1)*n^2)=1;

end

% beq=rhs vector (3n^2 by 1) for equality contraints

beq=ones(4*n^2+g,1);

%% run matlab's "intlinprog" command

numVariables = size(Aeq,2);

intcon = 1:numVariables;

lb = zeros(1,numVariables);

ub = ones(1,numVariables);

[x,fval,exitflag,output]=intlinprog(zeros(n^3,1),intcon,[],[],Aeq,beq,lb,ub);

exitflag;

output;

%% turn matlab's outputed solution vector x into sudoku terms

S=zeros(n,n);

for k=1:n

subx=find(x((k-1)*n^2+1:k*n^2));

for j=1:n

row=mod(subx(j),n);

if row==0

row=n;

end

S(row,j)=k;

end

end

% Sudoku Matrix S

S

]]>

Earlier this year I wrote about solving word ladders with MATLAB. There was a lot of interest in that post, so I thought I'd share my investigations regarding another word-based app. In this script,... read more >>

]]>Earlier this year I wrote about solving word ladders with MATLAB. There was a lot of interest in that post, so I thought I'd share my investigations regarding another word-based app. In this script, I'm trying to create a puzzle modeled on the NY Times Spelling Bee game. Here is the premise: you are given seven letters, and your job is to find all possible words you can make with those seven letters. You can use letters more than once, but they must be four or more letters long, and they must all include the special center letter. Every Spelling Bee puzzle has at least one "pangram", which is a word that uses all seven letters.

As an example, suppose you were given this puzzle.

From this puzzle you could create PORE, ADORE, ADOPT, as well as the pangram OPERATED. But you couldn't create POD (too short) or PAPER (doesn't include the center letter "O").

Here I'm not so much interested in solving the puzzle. I want to write a script that can create a Spelling Bee puzzle.

We need some ground truth for all the legal words. Here's Google's 10,000 word English dictionary. It's small, but good enough for our purposes here.

url = 'https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english.txt';

wordList = webread(url);

Split the word list into strings.

words = split(string(wordList));

words = lower(words);

The Spelling Bee puzzle avoids the letter S so that it never has to deal with plurals. Let's use logical indexing to keep only the words that don't contain an S.

keep = ~words.contains("s");

words = words(keep);

Of these, keep only the words that are four letters or longer.

keep = words.strlength >= 4;

words = words(keep);

I like how these string commands are easy to read and understand! Just like for us humans, code that communicates clearly keeps its job the longest. We don't want to employ fussy, high-maintenance code.

How many words do we end up with?

fprintf("The adjusted dictionary now contains %d words\n",length(words))

We've thrown out about half the words in our dictionary.

We'll be converting from ASCII representation to alphabet number a few times, so let's invest in a little anonymous helper function. It'll make the code below a little easier to read.

% An anonymous function to translate ASCII values to A=1, B=2, ... Z=26.

ascii2letter = @(chars) abs(lower(chars))-96;

Try it out

ascii2letter('ABCXYZ')

Looks good!

Build a 26-column sparse matrix to represent unique letter usage for each word. Each word in the dictionary gets one row. Every letter that appears in the word gets a 1.

dictionaryLength = numel(words);

a = sparse(dictionaryLength,26);

for i = 1:dictionaryLength

uniqueLetters = unique(words(i).char);

a(i,ascii2letter(uniqueLetters)) = 1;

end

nnz(a)/numel(a)

At 22% full, this matrix isn't terribly sparse as sparse matrices go. But it's a fun excuse to play with SPARSE, so let's carry on. Here's a SPY plot of the matrix.

spy(a)

set(gca, ...

XTick=1:26, ...

XTickLabel=num2cell('A':'Z'), ...

XTickLabelRotation=0, ...

XAxisLocation="top", ...

PlotBoxAspectRatio=[1 1 1])

To make much out of this, we need to zoom in. Let's see how the word LABEL is encoded.

ix = 1002;

% In case you're wondering, I like to use "ix" as short for "index"

words(ix)

full(a(ix,:))

spy(a(1001:1010,:))

xline([1 2 5 12])

yline(2,Color="red")

set(gca, ...

XTick=1:26, ...

XTickLabel=num2cell('A':'Z'), ...

XTickLabelRotation=0, ...

XAxisLocation="top", ...

YTick=1:10, ...

YTickLabels=words(1001:1010))

Note that even though the L appears twice in the word, it only has a one in the matrix.

Just for fun, let's do a histogram of the letters. This can be a useful sanity check that we're on the right track.

allLetters = ascii2letter(char(join(words,'')));

histogram(allLetters)

set(gca, ...

XTick=1:26, ...

XTickLabel=num2cell('A':'Z'), ...

XTickLabelRotation=0)

As expected, E is the most commonly appearing letter. Q is the least, but for the fact that we completely wiped out S.

How many words are composed of exactly seven distinct letters? That is, how many are potential pangrams for a game?

% Summing letter-use across the rows

% Use FIND to build an index to all the words that use exactly seven letters

ix7LetterWords = find(sum(a,2)==7);

length(ix7LetterWords)

Let's look at a few of these seven-letter words.

num7LetterWords = numel(ix7LetterWords);

ix = ix7LetterWords(1:10);

disp(words(ix(1:10)))

Pick a good one as the "seed" pangram for our game.

ix = ix(9);

disp(words(ix))

How many words can be spelled from those same 7 letters? Find all the words that use nothing but the letters in the seed word

lettersInSeedWord = a(ix,:);

% Find all the words that have none of the letters from the seed word.

% We're making sure that all contributions of letters other than

% the seed letters sum to zero.

ixWordsFromSeedWord = find(sum(a(:,~lettersInSeedWord),2) == 0);

fprintf('%d words use only the seven letters appearing in the seed word.\n',numel(ixWordsFromSeedWord))

Which letters appear most frequently?

bar(sum(a(ixWordsFromSeedWord,:)))

set(gca, ...

XTick=1:26, ...

XTickLabel=num2cell('A':'Z'), ...

XTickLabelRotation=0)

We need to designate one letter as the "center letter" that must appear in every word. Let's pick the letter that appears least frequently.

vals = full(sum(a(ixWordsFromSeedWord,:)));

vals(vals==0) = inf;

[~,ixlet] = min(vals);

centerLetter = char(ixlet+96);

Remove all the words that don't have this letter

ixRequired = find(a(:,ixlet));

ix2a = intersect(ixWordsFromSeedWord,ixRequired);

This is the original seed word:

fprintf("%s\n", words(ix))

Here are the seven unique letters:

sevenLetters = unique(char(words(ix)));

fprintf("%s\n", sevenLetters);

The required center letter is:

fprintf("%s\n", upper(centerLetter));

Here are the pangrams

ix3 = find(sum(a(ix2a,:),2)==7);

for i = 1:numel(ix3)

fprintf("%s\n",words(ix2a(ix3(i))));

end

Here is the complete list of words

for i = 1:numel(ix2a)

fprintf("%s\n",words(ix2a(i)));

end

Some of those are little odd, but we have our dictionary to thank for that.

sixLetters = setdiff(sevenLetters,centerLetter);

sixLetters = sixLetters(randperm(6));

Now for the graphics! Make the plot

% The perimeter hex cells are centered from pi/6 to 11*pi/6 radians

t = 2*pi*(0:5)/6 + 2*pi/12;

r = 1.08*sqrt(3);

x = r*cos(t);

y = r*sin(t);

cla

hexplot(0,0,upper(centerLetter),"#F7DA21")

hold on

for i = 1:numel(x)

hexplot(x(i),y(i),upper(sixLetters(i)),"#E6E6E6")

end

hold off

axis equal

axis off

And there you have it! It's easy to see how valuable a good algorithmically generated game can be. Once you've worked out the logic, you just need to run it once every day and your puzzle is ready to go. Compare this with the labor that goes into a crossword puzzle. Although who knows? Maybe crossword puzzles will be yet another of those things that AI proves to be adept at.

function hexplot(x,y,letter,color)

t = linspace(0,2*pi,7);

patch(cos(t)+x,sin(t)+y,"white", ...

EdgeColor="none", ...

FaceColor=color)

text(x,y,letter, ...

FontSize=24, ...

FontWeight="bold", ...

HorizontalAlignment="center", ...

VerticalAlignment="middle")

end