Steve on Image Processing and MATLAB

Concepts, algorithms & MATLAB

Documenting performance improvements 3

Posted by Steve Eddins,

I recently noticed a change in the way we write some of our product release notes, and I wanted to mention it to you.

In my quarter century at MathWorks doing toolbox and MATLAB development, there have been a few areas of focus that have been remarkably consistent over that entire time. One of those areas is performance. Specifically, computation speed.

If you have used MATLAB more than five years, it is likely that something you use in MATLAB a lot has been completely reimplemented to make it go faster in our ever-evolving computational environments.

Maybe it was new algorithms, like image resizing or Gaussian filtering. Maybe the memory access patterns were modified to exploit changing memory cache architectures, like image resizing (again), transposition (and permute), conv2, and even the seemingly straightforward sum function.

Possibly the functions you rely upon were modified to adapt to new core libraries, such as LAPACK or FFTW.

Many, many, many functions and operators were completely overhauled when multicore computers became common. Then they were modified again to exploit extended processor instruction sets for instruction word parallelism.

Finally, the very foundations for MATLAB language execution were completely overhauled in 2015 to make everything go faster. Since then, the MATLAB execution engine continues to be refined with almost every release to add new types of optimizations.

The curious thing about all this effort, over so many years, is how ... well ... vague we typically have been in describing performance improvements in our release notes.

For example, here is a snippet from the R2018b Release Notes for the Image Processing Toolbox:

Like I said: it's vague.

It was never our intent to be obscure. It's just that performance measurements are almost always challenging to report with accuracy and precision, and the experiences of individual users will almost always vary, sometimes considerably. Part of our company culture here is that we are allergic to making statements that could be perceived as inaccurate. I think that's what has been behind the history of vague statements about performance improvements in release notes. (OK, I should state this explicitly: this is my personal opinion, and not a statement of what company policy is or has been.)

Well, things are starting to change. Our documentation writers now have a new standard to follow when writing release notes about performance. Here is a sample from R2019b, which was released last month:

The release note describes what operation has been improved, how it was timed, what the times were for specific releases, and details about the computer used to measure the performance.

Look for more performance changes to be reported with this level of detail in the future. I think this is a great improvement!


Get the MATLAB code

Published with MATLAB® R2019b

3 CommentsOldest to Newest

Michal Kvasnicka replied on : 1 of 3
There is another problem, too. During execution engine development over last few years is less and less clear what kind of programming is optimal from the performance point of view. Let say 10 years ago was situation much more simple. Vectorized code was nearly always faster then classic form of programming (loop over indices,for example). Now is situation more difficult, because in some cases is vectorized code significantly slower than it's classic counterpart. So, there is no more one universal paradigm: vectorization is mostly the best way how to optimize the code performance. Moreover, the profiling is now less meaningful, because profiling results are less relevant to real performance of the code running in non-profiling mode. Finally, Matlab programmers are now in unpleasant situation, because profiling is not definitely the best way how to find performance bottlenecks in code. Are there some plans how to solve this situation?
Yair Altman replied on : 2 of 3
The new documentation standard is indeed a welcome improvement. Previously, nobody could know under which circumstances the speedup occurred, and to which degree (10% speedup? 2x? 10x?). Now, when something (e.g. VideoReader) is known to have improved by 4x, there is a much greater incentive to modify the code to use the improved function, or simply to upgrade the Matlab release.
Michal—Well, I guess I feel better about the situation than that. I think that most users have seen their code execution times steadily increase over time without having to do anything to their code. In a small number cases where there have been unanticipated performance degradations, we have fixed them as soon as possible after hearing about them. If you have specific cases where your code is not performing as you expect, please let me know. In general, I would advise anyone to write code initially with the primary goals of correctness and clarity, and then revise code for performance only as suggested by a tool such as the profiler. I have spoken with language team developers about your comments regarding the profiler, and they are not aware of a problem like you describe. The profiler was overhauled at the same time as the execution engine with the intent to make it just as useful as before. If you have situations where that is not the case, then please let me know, and we will try to address whatever the issues are.
Michal Kvasnicka replied on : 4 of 3
@Steve ... I just test my problematic code examples with R2019b and I must say, that problems mentioned above (with R2016a(b) and 2017a) were eliminated. Thanks...
Michal Kvasnicka replied on : 5 of 3
@Steve Let see the following code:
function [tloop,tvec] = prodeval(neval,nround)
% neval  ... size of problem
% nround ... number of rounds

prodval = 1;
prodval2 = 1;
x = ones(1,neval);

% for-loop product
tic;
for k = 1:nround
    for i = 1:neval
        prodval = prodval*x(i);
    end
end
tloop = toc;

% vectorized product
tic;
for k = 1:nround
    prodval2 = prodval2*prod(x);
end
tvec = toc;

end
Matlab R2019b produce following results:
>> profile on
>> [tloop,tvec] = prodeval(1e1,1e8)

tloop = 61.1934
tvec = 6.6288
with activated profiling the vectorized coce is significant faster than for-loop code. But with de-activated profiler are results completely different:
>> profile off
>> [tloop,tvec] = prodeval(1e1,1e8)

tloop = 0.8389
tvec = 1.1821
So finally, profiler in this case produce results which does not correspond to the results with deactivated profiler.
Michal—Thanks very much for providing a detailed reproduction case for the problem you are seeing with the profiler. I will show this to the language development team.

Michal—I have experimented with your code sample, and I've talked with other MATLAB language developers about your profiler experience.

I ran your code, with and without the profiler on, in both R2015a and R2019b. R2015a is the last version of MATLAB before the new execution engine was introduced. Some conclusions:

  • You have a good point about the profiler sometimes skewing the comparison between different implementations.
  • The skew is a lot less now than before the new execution engine.
  • Vectorization just for the sake of performance is often no longer useful.
  • R2019b is a LOT faster than R2015a for your example, for both the looped and the prod versions.

The looped version is 8 times faster in R2019b than R2015a. The prod version is 36 times faster. As a result of these speedups, the loop and prod version are equally as fast in R2019b.

In R2015a, turning on the profiler slows the loop version down by 320x, and it slows the prod version down by 8x, resulting in a relative measurement skew of 40x.

In R2019a, turning on the profiler slows the loop version down by 50x, and it slows down the prod version by 6x, resulting in a relative measurement skew of 8x.

Other MATLAB language developers tell me your loop code pretty much the worst case for profiler overhead: a small, tight loop containing only scalar indexing and simple arithmetic. For now, there's no workaround that I know of. It is just a profiler characteristic to be aware of. Perhaps we'll be able to improve upon this in a future release. In the meantime, enjoy the speedups, and don't worry as much about vectorizing everything in sight.