Just Keep Swimming
Remember Dory?
Image Credit: Silvio Tanaka [ CC BY 2.0 ], via Wikimedia Commons
The model of persistence in the face of difficult circumstances, the hilarious and free spirited fish of the vast ocean expanse, the adopted aunt of our lovable Nemo, that Dory?
Well she was ahead of her time. Who knew of the wisdom of her sage advice, just keep swimming.
We need a little bit of that sometimes when we are performance testing. Specifically, we need to just keep swimming (so to speak) when we are measuring code that is just too fast. For example, let's take the example from last post (CQ's matrix library). The performance tests we wrote here look like this:
classdef tMatrixLibrary < matlab.perftest.TestCase properties(TestParameter) TestMatrix = struct('midSize', magic(600),... 'largeSize', magic(1000)); end methods(Test) function testSum(testCase, TestMatrix) matrix_sum(TestMatrix); end function testMean(testCase, TestMatrix) matrix_mean(TestMatrix); end function testEig(testCase, TestMatrix) testCase.assertReturnsTrue(@() size(TestMatrix,1) == size(TestMatrix,2), ... 'Eig only works on square matrix'); testCase.startMeasuring; matrix_eig(TestMatrix); testCase.stopMeasuring; end end end
Here you can see that we tested against a "medium size" problem and a "large size" problem. (Un)conveniently missing, however, is a "small size" problem. Why is this? Well, why don't we add one...
classdef tMatrixLibrary_v2 < matlab.perftest.TestCase properties(TestParameter) TestMatrix = struct('smallSize', magic(100), 'midSize', magic(600),... 'largeSize', magic(1000)); end methods(Test) function testSum(testCase, TestMatrix) matrix_sum(TestMatrix); end function testMean(testCase, TestMatrix) matrix_mean(TestMatrix); end function testEig(testCase, TestMatrix) testCase.assertReturnsTrue(@() size(TestMatrix,1) == size(TestMatrix,2), ... 'Eig only works on square matrix'); testCase.startMeasuring; matrix_eig(TestMatrix); testCase.stopMeasuring; end end end
...along with a quick function to check the validity of the result and we'll find out:
function checkResults(results) disp(newline) dispFrame if ~all([results.Valid]) disp('Oh no Dory, some measurements were invalid!') else disp('Thanks Dory, you''re the best! All our measurements are good.') end dispFrame end function dispFrame disp(':::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::'); end
results = runperf('tMatrixLibrary_v2');
checkResults(results)
Running tMatrixLibrary_v2 ........ ================================================================================ tMatrixLibrary_v2/testSum(TestMatrix=smallSize) was filtered. Test Diagnostic: The MeasuredTime should not be too close to the precision of the framework. ================================================================================ .. .......... .......... .......... ....... ================================================================================ tMatrixLibrary_v2/testMean(TestMatrix=smallSize) was filtered. Test Diagnostic: The MeasuredTime should not be too close to the precision of the framework. ================================================================================ ... .......... .......... .......... .......... .Warning: Target Relative Margin of Error not met after running the MaxSamples for tMatrixLibrary_v2/testMean(TestMatrix=largeSize). ......... .......... ..... Done tMatrixLibrary_v2 __________ Failure Summary: Name Failed Incomplete Reason(s) =============================================================================================== tMatrixLibrary_v2/testSum(TestMatrix=smallSize) X Filtered by assumption. ----------------------------------------------------------------------------------------------- tMatrixLibrary_v2/testMean(TestMatrix=smallSize) X Filtered by assumption. ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Oh no Dory, some measurements were invalid! :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Uh oh, that doesn't look ideal. It looks like some of our small sized measurements weren't valid. They weren't valid because the framework recognized that the code execution time was too close to the measurable precision of the framework. The framework was able to determine that it would have been a garbage measurement, so rather than risk providing bad data as a result it has proactively filtered the tests that were too fast and marked the result as invalid. Well, I still want to measure this fast case, so what do we do? Typically, what we see happening is people wrapping their code with a static for loop, like so:
classdef tMatrixLibrary_v3 < matlab.perftest.TestCase properties(TestParameter) TestMatrix = struct('smallSize', magic(100), 'midSize', magic(600),... 'largeSize', magic(1000)); end methods(Test) function testSum(testCase, TestMatrix) for idx = 1:1000 matrix_sum(TestMatrix); end end function testMean(testCase, TestMatrix) for idx = 1:1000 matrix_mean(TestMatrix); end end function testEig(testCase, TestMatrix) testCase.assertReturnsTrue(@() size(TestMatrix,1) == size(TestMatrix,2), ... 'Eig only works on square matrix'); testCase.startMeasuring; matrix_eig(TestMatrix); testCase.stopMeasuring; end end end
results = runperf('tMatrixLibrary_v3');
checkResults(results)
Running tMatrixLibrary_v3 .......... .......... .......... .......... .......... .......... .......... .. Done tMatrixLibrary_v3 __________ ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Thanks Dory, you're the best! All our measurements are good. :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Well the results look good at least. Think about this like you were measuring the weight of a feather using a kitchen scale. Measuring one feather will give you a bunk measurement, so instead you gather together 1000 or more feathers, put them in a box, and measure the whole thing to get a good idea of the average weight of a feather.
Problem solved? Well not really. There are some big drawbacks to this approach. Let's enumerate:
- I had to choose a number of iterations to test against. What's more, this choice was arbitrary. How do I know that I am not right on the edge of framework precision? If so I am likely to experience tests that sporadically are too fast to measure! In addition, the framework precision is machine dependent, so different machines will have different precision thresholds. There is no one good number to go off of so it's anyone's guess. Not comforting.
- #ohmygoodness was this slow! Why was it slow? Because I had to run 1000 iterations for everything, including the larger matrix sizes. These larger sizes don't need to be run in a loop, but in order to maintain an apples/oranges comparison they need to if the smaller sizes are.
- This approach falls flat when comparing other algorithms against each other. If one algorithm needs 1000 iterations, but the other only needs 750, we quickly get into comparing apples and oranges and we lose insight into our true performance.
- Let's say we diligently track our code performance over time, and furthermore we do a bang-up job optimizing our critical code and make vast improvements in the code performance. Well this improvement may require that we "up" the iteration count, since 1000 iterations may suddenly become too fast to measure as a result of our code optimizations and we need to now measure 10,000 iterations. Once we do this however, all future measurements are on a different scale than our historical data. Lame.
- Finally, and perhaps as a root cause of some of these apples/oranges troubles discussed above, it is simply not the true measurement of the code's execution time.
So takeaway, once your code hits the limitations of how fast we can measure something, we can add a static for loop and maybe hobble along, but it's really not the best experience.
Enter R2018b.
In R2018b the performance testing framework now has a new keepMeasuring method on the matlab.perftest.TestCase class to support faster code measurement workflows. How is this used? Put it in a while loop and let the framework determine the right number of iterations:
classdef tMatrixLibrary_final < matlab.perftest.TestCase properties(TestParameter) TestMatrix = struct('smallSize', magic(100), 'midSize', magic(600),... 'largeSize', magic(1000)); end methods(Test) function testSum(testCase, TestMatrix) while testCase.keepMeasuring matrix_sum(TestMatrix); end end function testMean(testCase, TestMatrix) while testCase.keepMeasuring matrix_mean(TestMatrix); end end function testEig(testCase, TestMatrix) testCase.assertReturnsTrue(@() size(TestMatrix,1) == size(TestMatrix,2), ... 'Eig only works on square matrix'); while testCase.keepMeasuring matrix_eig(TestMatrix); end end end end
results = runperf('tMatrixLibrary_final');
checkResults(results)
Running tMatrixLibrary_final .......... .......... .......... .......... .......... .......... .......... .. Done tMatrixLibrary_final __________ ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Thanks Dory, you're the best! All our measurements are good. :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Ah! Isn't that lovely? I didn't need to hard code any static for loop values, we were able to accurately measure significantly faster code, and the measured values returned actually are the real times taken rather than the values offset by an arbitrary scaling factor. Also, the code that needs more iterations is the fast code, so I didn't even notice any difference in test time, whereas the slower tests didn't need any more iterations at all for an accurate measurement. So nice. Look at the sample summary and you can see the real time taken:
sampleSummary(results)
ans = 9×7 table Name SampleSize Mean StandardDeviation Min Median Max ___________________________________________________ __________ __________ _________________ __________ __________ __________ tMatrixLibrary_final/testSum(TestMatrix=smallSize) 4 1.4886e-05 1.242e-07 1.4807e-05 1.4833e-05 1.5071e-05 tMatrixLibrary_final/testSum(TestMatrix=midSize) 4 0.00060309 1.4084e-05 0.00058393 0.00060658 0.00061526 tMatrixLibrary_final/testSum(TestMatrix=largeSize) 4 0.003275 6.2535e-05 0.0031897 0.0032854 0.0033395 tMatrixLibrary_final/testMean(TestMatrix=smallSize) 4 1.5534e-05 3.4959e-07 1.5221e-05 1.5535e-05 1.5847e-05 tMatrixLibrary_final/testMean(TestMatrix=midSize) 4 0.00059933 9.2749e-06 0.00058845 0.00060012 0.00060865 tMatrixLibrary_final/testMean(TestMatrix=largeSize) 4 0.0032668 4.8834e-05 0.0031958 0.0032859 0.0032997 tMatrixLibrary_final/testEig(TestMatrix=smallSize) 4 0.003086 6.1447e-05 0.00304 0.0030639 0.0031762 tMatrixLibrary_final/testEig(TestMatrix=midSize) 4 0.16333 0.001356 0.16232 0.16287 0.16527 tMatrixLibrary_final/testEig(TestMatrix=largeSize) 4 0.39709 0.00096197 0.39613 0.39704 0.39813
It is worth noting that this is not a silver bullet. There still is some framework overhead in the keepMeasuring method that prevents us from measuring some really fast code. Think about it like measuring a group of feathers, but if each feather comes individually wrapped in a small packet, there comes a point where we are measuring the overhead of the packet rather than the actual feather. So, while there is still some code that will be too fast to measure (don't expect a valid measurement of 1+1 please), using the keepMeasuring method as shown opened up 2 orders of magnitude in allowable precision in our experiments.
Have fun, and like Dory, just keep measuring y'all!
- 범주:
- Performance,
- Testing
댓글
댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.