Sometime in 2021, I was doing some High Performance Computing (HPC) consultancy with a university in the south of England. This involved various things such as getting the code to scale across multiple nodes, reducing memory requirements and speed optimisation. It was a big success and I got code that originally couldn't run at all on their HPC system to not only run but scale quite nicely. Then I threw in a 6x speed increase as a bonus by optimising the user's code.
I eventually got stuck with MATLAB's fzero
function. The user's code was calling it billions of times on rather simple functions which revealed overheads that nobody had reported before.
I had a chat with development who did some fixes and in R2022a, fzero was made around 3x faster than before
for simple cases like this. This was nice but a member of the core math team, Bobby Cheng, wanted to investigate further. He made a few more optimisations and also discovered that any function using varargin
had some additional overheads that could be resolved via the JIT compiler. The team behind JIT compilation took care of this and now many functions using varargin
have been made a little faster.
These fixes made fzero even faster on the simple type of function that was being used in my HPC case. The release notes always compare against the previous MATLAB version but to show the cumulative gains, I prefer to compare this benchmark against R2021b which is when I first noticed issues with this function.
out(i) = fzero(@(x)fzeroFun(x,levels(i)),[0 2]);
- R2021b: 28.52 seconds
- R2022a: 10.25 seconds
- R2023a: 1.67 seconds
That's almost a 17x speedup across a few releases and users do not need to change their code at all! The cumulative effect of all of these optimisations really adds up over time.
function u = fzeroFun(x,lv)
% The function to be solved by fzero