# A Prime Case Study for Making MATLAB Code Go Faster

Today I'd like to welcome a guest blogger, Mike Croucher, who recently joined MathWorks as a Customer Success Engineer after a long career spent supporting academic computational research.

### Introduction

Most of my career has been dedicated to collaborating with scientists and engineers to improve their computational workflows. There are many things that could be done including encouraging the use of version control or unit testing but, by far, the most frequent request has been 'Can you make this thing I made go any faster?'
More than twenty years of being asked this simple question has led me to explore many areas of technical computing such as parallel computing, numerical analysis, HPC and cloud computing, GPU computing, mathematical optimization, automatic differentiation and multiple languages to name just a few.
At its core, the art of making a program faster comes down to iterating over two simple steps.
1. Find out exactly what is slow.
2. Try something different to see if it's any faster while still giving the right answer
As you might imagine, there's a world of complexity hidden in the above. 'Trying something different' could mean using language features to optimise the algorithm I've written, trying out parallelisation or even switching to a completely different algorithm althogether. I usually add an additional constraint to point 2 which is 'expend as little effort as possible' because I am fundamentally lazy.
MATLAB's profiler, which recently received an interface overhaul to provide flame graph views of code execution time, helps take care of the first point. Exploring the profiler would provide a rich seam of bloggable content as would exploring all of the algorithmic performance enhancements we make from release to release.
Thanks to The MATLAB Execution engine (discussed previously on this blog here and here) and various other optimisations, modern MATLAB is faster than ever and, as shown in the chart below, we are constantly making improvements in this area.
Of course there are always times when we need to optimise our code for speed and I wanted to share an example that was recently discussed internally.

### Don't Fear the Looper

Inspired by this post, a colleague wrote the following in MATLAB to count prime numbers.
type myCountPrimes.m
% Count how many primes there are up to and including max_n function res = myCountPrimes(max_n) res = 0; for i = 2:max_n if MyIsPrime(i) res = res+1; end end end % Function to check if n is prime function result = MyIsPrime(n) if n <= 1 result = false; return end q = floor(sqrt(n)); for i = 2:q if mod(n,i) == 0 result = false; return end end result = true; end
numPrimes = myCountPrimes(1000000)
numPrimes = 78498
This code is very heavy on loops - exactly the sort of thing I was trained to avoid in my early MATLAB days but thanks to the Execution Engine, it's not too bad. (See this recent post for more on the speed of loops in MATLAB)
f = @() myCountPrimes(1000000);
originalSpeed = timeit(f)
originalSpeed = 3.7878
Even so, there are many things we could try to speed up this particular piece of code. We could take advantage of the fact that every iteration of the loop in myCountPrimes is independent and use a parallel parfor loop. This is easy to do but might be heavy on computational resources. Once you realise you can leverage parfor, it becomes tempting to try to make your problem go away by throwing as many CPU cores as you possible at it. My favourite way to do this these days is to fire up a Cloud Center cluster and ramp up the core count as high as makes sense.

### Changing mod to floor

The profiler showed me that a lot of time was spent in the line
if mod(n,i) == 0
so one approach is to ask 'Can we rewrite that line to be any faster?'. It's not always obvious which alternatives might work so I often simply figure out a different way to do something and give it a try. An alternative way to write the expression mod(n,i)==0 is floor(n/i) == n/i and it turns out to be much faster! I did this in the floorMyCountPrimes function, checked that it gives the right result, and demonstrated that it is indeed much faster.
numPrimes = floorMyCountPrimes(1000000)
numPrimes = 78498
f = @() floorMyCountPrimes(1000000);
floorSpeed = timeit(f)
floorSpeed = 0.5296

### Trying Integer Types for an Integer Algorithm

By default, all numbers in MATLAB are floating point of type double which is great for a lot of numerical computing but here we are dealing with a purely integer algorithm. It can sometimes pay dividends to make this explicit in the code and this is one of those times! It turns out that the mod function has to do a lot more work when dealing with floating point numbers than is strictly necessary for this particular example. As the programmer, I know that my algorithm is completely integer based so I can switch to using uint32 for some variables. That is, I change the main loop of MyIsPrime to
q = floor(sqrt(n));
n = uint32(n);
for i = uint32(2):q
if mod(n,i) == 0
result = false;
return
end
end
This allows MATLAB to use a specialised and faster version of the mod function since it now knows that we are dealing with integers instead of doubles.
I did this in the uint32MyIsPrime function defined at the end of this post and called it via uint32MyCountPrimes. First let's check that the result agrees with the original function and then time it
numPrimes = uint32MyCountPrimes(1000000)
numPrimes = 78498
f = @() uint32MyCountPrimes(1000000);
uint32Speed = timeit(f)
uint32Speed = 0.5660
Using either this approach or the floor-based approach, we achieved a significant performance boost for relatively little work and techniques such as parfor parallelisation are still on the table if we need even more speed.

### MATLAB as a Repository of Algorithms

Finally, we could recognise that the fundamental algorithm we are using to count primes is not optimal and use a different one instead. Taking this to the extreme, we could remember that MATLAB is a repository of numerical algorithms as well as a programming language and so find the one that directly solves the problem we want to solve rather than rolling our own. A quick search of the documentation reveals the primes function that returns an array of primes. We produce this array and count the elements as follows.
numPrimes = numel(primes(1000000))
numPrimes = 78498
f = @() numel(primes(1000000));
builtinSpeed = timeit(f)
builtinSpeed = 0.0029
More than fast enough for my purposes!

### System Information

If you run this script on your own machine, you will almost certainly get different results to me. Using the version function and the excellent CPU Info by Ben Tordoff, here are the details of my machine and MATLAB version.
versionInfo = ['MATLAB Release R' version('-release')]
versionInfo = 'MATLAB Release R2021a'
cpuinfo()
ans = struct with fields:
Name: 'Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz' Clock: '2112 MHz' Cache: '1024 KB' NumProcessors: 4 OSType: 'Windows' OSVersion: 'Microsoft Windows 10 Pro'

MATLAB is faster than it has ever been and it will continue to get faster in the future. This, combined with improved hardware, will help you to answer larger and more complex questions than you ever have before. However, there will always be occasions where we need even more. What MATLAB speed-up techniques do you find useful and what do you wish was faster in MATLAB? Let us know here.
Helper functions
function result = floorMyIsPrime(n)
% Function to check if n is prime
if n <= 1
result = false;
return
end
q = floor(sqrt(n));
for i = 2:q
if floor(n/i) == n/i
result = false;
return
end
end
result = true;
end
function res = floorMyCountPrimes(max_n)
% Count how many primes there are up to and including max_n
res = 0;
for i = 2:max_n
if floorMyIsPrime(i)
res = res+1;
end
end
end
function result = uint32MyIsPrime(n)
% Function to check if n is prime
if n <= 1
result = false;
return
end
q = floor(sqrt(n));
n = uint32(n);
for i = uint32(2):q
if mod(n,i) == 0
result = false;
return
end
end
result = true;
end
function res = uint32MyCountPrimes(max_n)
% Count how many primes there are up to and including max_n
res = 0;
for i = 2:max_n
if uint32MyIsPrime(i)
res = res+1;
end
end
end