Just Keep Swimming

Posted by Andy Campbell, October 5, 2018

2 views (last 30 days) | 0 Likes | 3 comments

Remember Dory?

Image Credit: Silvio Tanaka [ CC BY 2.0 ], via Wikimedia Commons

The model of persistence in the face of difficult circumstances, the hilarious and free spirited fish of the vast ocean expanse, the adopted aunt of our lovable Nemo, that Dory?

Well she was ahead of her time. Who knew of the wisdom of her sage advice, just keep swimming.

We need a little bit of that sometimes when we are performance testing. Specifically, we need to just keep swimming (so to speak) when we are measuring code that is just too fast. For example, let's take the example from last post (CQ's matrix library). The performance tests we wrote here look like this:

classdef tMatrixLibrary < matlab.perftest.TestCase
    
    properties(TestParameter)
        TestMatrix = struct('midSize', magic(600),...
            'largeSize', magic(1000));
    end
    
    methods(Test)
        function testSum(testCase, TestMatrix)
            matrix_sum(TestMatrix);
        end
        
        function testMean(testCase, TestMatrix)
            matrix_mean(TestMatrix);
        end
        
        function testEig(testCase, TestMatrix)
            
            testCase.assertReturnsTrue(@() size(TestMatrix,1) == size(TestMatrix,2), ...
                'Eig only works on square matrix');
            testCase.startMeasuring;
            matrix_eig(TestMatrix);
            testCase.stopMeasuring;
            
        end
    end
end

Here you can see that we tested against a "medium size" problem and a "large size" problem. (Un)conveniently missing, however, is a "small size" problem. Why is this? Well, why don't we add one...

classdef tMatrixLibrary_v2 < matlab.perftest.TestCase
    
    properties(TestParameter)
        TestMatrix = struct('smallSize', magic(100), 'midSize', magic(600),...
            'largeSize', magic(1000));
    end
    
    methods(Test)
        function testSum(testCase, TestMatrix)
            matrix_sum(TestMatrix);
        end
        
        function testMean(testCase, TestMatrix)
            matrix_mean(TestMatrix);
        end
        
        function testEig(testCase, TestMatrix)
            
            testCase.assertReturnsTrue(@() size(TestMatrix,1) == size(TestMatrix,2), ...
                'Eig only works on square matrix');
            testCase.startMeasuring;
            matrix_eig(TestMatrix);
            testCase.stopMeasuring;
            
        end
    end
end

...along with a quick function to check the validity of the result and we'll find out:

function checkResults(results)
disp(newline)
dispFrame
if ~all([results.Valid])
    disp('Oh no Dory, some measurements were invalid!')

else
    disp('Thanks Dory, you''re the best! All our measurements are good.')
end
dispFrame

end
function dispFrame
disp(':::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::');
end

results = runperf('tMatrixLibrary_v2');
checkResults(results)

Running tMatrixLibrary_v2
........
================================================================================
tMatrixLibrary_v2/testSum(TestMatrix=smallSize) was filtered.
    Test Diagnostic: The MeasuredTime should not be too close to the precision of the framework.
================================================================================
.. .......... .......... .......... .......
================================================================================
tMatrixLibrary_v2/testMean(TestMatrix=smallSize) was filtered.
    Test Diagnostic: The MeasuredTime should not be too close to the precision of the framework.
================================================================================
...
.......... .......... .......... .......... .Warning: Target Relative Margin of Error not met after running the MaxSamples
for tMatrixLibrary_v2/testMean(TestMatrix=largeSize). 
.........
.......... .....
Done tMatrixLibrary_v2
__________

Failure Summary:

     Name                                              Failed  Incomplete  Reason(s)
    ===============================================================================================
     tMatrixLibrary_v2/testSum(TestMatrix=smallSize)               X       Filtered by assumption.
    -----------------------------------------------------------------------------------------------
     tMatrixLibrary_v2/testMean(TestMatrix=smallSize)              X       Filtered by assumption.


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Oh no Dory, some measurements were invalid!
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Uh oh, that doesn't look ideal. It looks like some of our small sized measurements weren't valid. They weren't valid because the framework recognized that the code execution time was too close to the measurable precision of the framework. The framework was able to determine that it would have been a garbage measurement, so rather than risk providing bad data as a result it has proactively filtered the tests that were too fast and marked the result as invalid. Well, I still want to measure this fast case, so what do we do? Typically, what we see happening is people wrapping their code with a static for loop, like so:

classdef tMatrixLibrary_v3 < matlab.perftest.TestCase
    
    properties(TestParameter)
        TestMatrix = struct('smallSize', magic(100), 'midSize', magic(600),...
            'largeSize', magic(1000));
    end
    
    methods(Test)
        function testSum(testCase, TestMatrix)
            for idx = 1:1000
                matrix_sum(TestMatrix);
            end
        end
        
        function testMean(testCase, TestMatrix)
            for idx = 1:1000
                matrix_mean(TestMatrix);
            end
        end
        
        function testEig(testCase, TestMatrix)
            
            testCase.assertReturnsTrue(@() size(TestMatrix,1) == size(TestMatrix,2), ...
                'Eig only works on square matrix');
            testCase.startMeasuring;
            matrix_eig(TestMatrix);
            testCase.stopMeasuring;
            
        end
    end
end

results = runperf('tMatrixLibrary_v3');
checkResults(results)

Running tMatrixLibrary_v3
.......... .......... .......... .......... ..........
.......... .......... ..
Done tMatrixLibrary_v3
__________



:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Thanks Dory, you're the best! All our measurements are good.
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Well the results look good at least. Think about this like you were measuring the weight of a feather using a kitchen scale. Measuring one feather will give you a bunk measurement, so instead you gather together 1000 or more feathers, put them in a box, and measure the whole thing to get a good idea of the average weight of a feather.

Problem solved? Well not really. There are some big drawbacks to this approach. Let's enumerate:

I had to choose a number of iterations to test against. What's more, this choice was arbitrary. How do I know that I am not right on the edge of framework precision? If so I am likely to experience tests that sporadically are too fast to measure! In addition, the framework precision is machine dependent, so different machines will have different precision thresholds. There is no one good number to go off of so it's anyone's guess. Not comforting.
#ohmygoodness was this slow! Why was it slow? Because I had to run 1000 iterations for everything, including the larger matrix sizes. These larger sizes don't need to be run in a loop, but in order to maintain an apples/oranges comparison they need to if the smaller sizes are.
This approach falls flat when comparing other algorithms against each other. If one algorithm needs 1000 iterations, but the other only needs 750, we quickly get into comparing apples and oranges and we lose insight into our true performance.
Let's say we diligently track our code performance over time, and furthermore we do a bang-up job optimizing our critical code and make vast improvements in the code performance. Well this improvement may require that we "up" the iteration count, since 1000 iterations may suddenly become too fast to measure as a result of our code optimizations and we need to now measure 10,000 iterations. Once we do this however, all future measurements are on a different scale than our historical data. Lame.
Finally, and perhaps as a root cause of some of these apples/oranges troubles discussed above, it is simply not the true measurement of the code's execution time.

So takeaway, once your code hits the limitations of how fast we can measure something, we can add a static for loop and maybe hobble along, but it's really not the best experience.

Enter R2018b.

In R2018b the performance testing framework now has a new keepMeasuring method on the matlab.perftest.TestCase class to support faster code measurement workflows. How is this used? Put it in a while loop and let the framework determine the right number of iterations:

classdef tMatrixLibrary_final < matlab.perftest.TestCase
    
    properties(TestParameter)
        TestMatrix = struct('smallSize', magic(100), 'midSize', magic(600),...
            'largeSize', magic(1000));
    end
    
    methods(Test)
        function testSum(testCase, TestMatrix)
            while testCase.keepMeasuring
                matrix_sum(TestMatrix);
            end
        end
        
        function testMean(testCase, TestMatrix)
            while testCase.keepMeasuring
                matrix_mean(TestMatrix);
            end
        end
        
        function testEig(testCase, TestMatrix)
            
            testCase.assertReturnsTrue(@() size(TestMatrix,1) == size(TestMatrix,2), ...
                'Eig only works on square matrix');
            while testCase.keepMeasuring
                matrix_eig(TestMatrix);
            end
            
        end
    end
end

results = runperf('tMatrixLibrary_final');
checkResults(results)

Running tMatrixLibrary_final
.......... .......... .......... .......... ..........
.......... .......... ..
Done tMatrixLibrary_final
__________



:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Thanks Dory, you're the best! All our measurements are good.
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Ah! Isn't that lovely? I didn't need to hard code any static for loop values, we were able to accurately measure significantly faster code, and the measured values returned actually are the real times taken rather than the values offset by an arbitrary scaling factor. Also, the code that needs more iterations is the fast code, so I didn't even notice any difference in test time, whereas the slower tests didn't need any more iterations at all for an accurate measurement. So nice. Look at the sample summary and you can see the real time taken:

sampleSummary(results)

ans =

  9×7 table

                           Name                            SampleSize       Mean       StandardDeviation       Min          Median         Max    
    ___________________________________________________    __________    __________    _________________    __________    __________    __________

    tMatrixLibrary_final/testSum(TestMatrix=smallSize)         4         1.4886e-05        1.242e-07        1.4807e-05    1.4833e-05    1.5071e-05
    tMatrixLibrary_final/testSum(TestMatrix=midSize)           4         0.00060309       1.4084e-05        0.00058393    0.00060658    0.00061526
    tMatrixLibrary_final/testSum(TestMatrix=largeSize)         4           0.003275       6.2535e-05         0.0031897     0.0032854     0.0033395
    tMatrixLibrary_final/testMean(TestMatrix=smallSize)        4         1.5534e-05       3.4959e-07        1.5221e-05    1.5535e-05    1.5847e-05
    tMatrixLibrary_final/testMean(TestMatrix=midSize)          4         0.00059933       9.2749e-06        0.00058845    0.00060012    0.00060865
    tMatrixLibrary_final/testMean(TestMatrix=largeSize)        4          0.0032668       4.8834e-05         0.0031958     0.0032859     0.0032997
    tMatrixLibrary_final/testEig(TestMatrix=smallSize)         4           0.003086       6.1447e-05           0.00304     0.0030639     0.0031762
    tMatrixLibrary_final/testEig(TestMatrix=midSize)           4            0.16333         0.001356           0.16232       0.16287       0.16527
    tMatrixLibrary_final/testEig(TestMatrix=largeSize)         4            0.39709       0.00096197           0.39613       0.39704       0.39813

It is worth noting that this is not a silver bullet. There still is some framework overhead in the keepMeasuring method that prevents us from measuring some really fast code. Think about it like measuring a group of feathers, but if each feather comes individually wrapped in a small packet, there comes a point where we are measuring the overhead of the packet rather than the actual feather. So, while there is still some code that will be too fast to measure (don't expect a valid measurement of 1+1 please), using the keepMeasuring method as shown opened up 2 orders of magnitude in allowable precision in our experiments.

Have fun, and like Dory, just keep measuring y'all!

Published with MATLAB® R2018b