I'm feeling pensive today as I participate in development activities at MathWorks for Release 2006b. We are following a 6-month release schedule and the naming convention for the twice yearly releases is year followed by first "a" and then "b". This schedule requires discipline on many fronts, including for developers. Over the years, we have encountered some truisms for us regarding code we ship; they are are not all novel in the industry and have served us and our users well.
People develop their own best practices over time, ranging from cosmetic issues to ones affecting the behavior or viability of a program. Tools such as mlint, which I wrote about here, provide one way to audit code. You can see more examples of code metrics used on MATLAB Central's File Exchange.
At MathWorks, we have several, sometimes competing, goals with respect to our code, and especially to M-files that we ship. Here's a list of some ideas that work well for us.
- Minimize the amount of code you write.
- Minimize the complexity of the code you write.
- Avoid speculative generality.
- Avoid premature optimization.
- Use components as they were designed to be used.
- Create functions instead of scripts.
- Make the code general and document ranges and exceptions.
- Write and run tests.
A classic trade-off in MATLAB is time versus memory. You can often vectorize code, but in doing so, you add an additional memory burden.
On the MATLAB newsgroup, we've seen discussions about avoiding repmat and we also had similar discussions in these two blog articles: Scalar Expansion and More and More on expansion: arrayfun. What's the reason? It's because, in order to fully vectorize the code, potentially really large intermediate arrays are created, used, and tossed away.
When Steve Eddins, author of Steve on Image Processing, worked on the function inpolygon, he found that he could fully vectorize the code. If he did so, though, there were plenty of realistic cases using it that caused MATLAB to run out of memory. To avoid this, Steve chunked up the code, and you can see this comment:
dbtype inpolygon 51:52
51 % Choose block_length to keep memory usage of vec_inpolygon around 52 % 10 Megabytes.
Within the 10Mb memory limits, the code is vectorized (see lines 53:83, from which the subfunction vec_inpolygon is called).
I remember another case from 1988 or 1989, near the time when we introduce MEX-files. It illustrates well for me why I should not rely on my intuition, but I should instead use tools to help me measure the actual effects I am hoping to influence. Despite by best efforts to think through problems as thoroughly as I think I can, I often find that, left to my own devices, I focus on the central issues, and it's often the more tangential ones that surprisingly consume the CPU cycles. So, much better if I measure than guess.
We were trying to speed up function functions, without the benefit of the MATLAB profiler. So we did experiments, first converting one of the ode solvers to C, and then turning the actual equations into C. Using only the MEX-function for the solver bought us a factor of 2. Using only the MEX-function for the equations also bought us a factor of 2. Can you guess what using both MEX-files simultaneously bought for us? You might guess a factor of 4. That's not what we found. We found a factor of 15 instead.
MATLAB has changed a lot since version 3, so what was true then with respect to performance is not, in detail, relevant today.
I've barely scratched the surface of what characteristics of your code might be the most important in various situations. What's important to you? Let me know.
Published with MATLAB® 7.2
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.