Loren on the Art of MATLAB

January 13th, 2012

Best Practices for Programming MATLAB

I thought I would share my top goto list of things I try to do when I write MATLAB code. And checking with other MathWorks folks whose code I admire, I found they basically used the same mental list that I use. You can find blog posts on all of these topics by selecting relevant categories from the right side of The Art of MATLAB blog site.

Contents

My List of Best Practices

Clearly (at least to me), this is not everything you generally need to do. You still need to comment the code, add good help information and examples, etc. But these are the main coding practices and tools I always rely on.

  1. Vectorize (but sensibly).
  2. Use bsxfun in lieu of repmat where possible.
  3. When looping through an array, loop down columns to access memory in the same order that MATLAB stores the data in.
  4. Profile the code. I am often surprised about what is taking up the time.
  5. Pay attention to messages from the Code Analyzer.
  6. Use functions instead of scripts.
  7. Don't "poof" variables into any workspaces. Translation, don't use load without a left-hand side; avoid eval, evalin, and assignin.
  8. Use logical indexing instead of find.
  9. Avoid global variables.
  10. Don't use equality checks with floating point values.

Missing from Your List? Additions to My List?

What's on my list that you don't currently do? Do you have a major addition to my list (there can't be too many, or I won't remember to do them all!)? Let me know here.


Get the MATLAB code

Published with MATLAB® 7.13

33 Responses to “Best Practices for Programming MATLAB”

  1. JamesL replied on :

    Here’s the biggest thing you missed:

    Pay A LOT of attention to code clarity and code design. In 97% of the code you write, tiny improvements in execution speed don’t matter, and yet you (and your colleagues/coworkers) will spend 10-20 times as much time reading your code as you spent typing it.

    Having recently moved to a MATLAB environment from other scientific scripting environments, I am stunned by the lackadaisical attitude MATLAB programmers seem to have to code quality from the point of view of legibility and clarity of thought expressed in code through good design. At the same time they frequently reject some well-established software engineering best practices on the grounds that they may cost microseconds in execution time.

    The hardware that runs MATLAB now is roughly 100,000 times faster than when MATLAB was first released. Software development practices have changed to recognize the fact that programmer time is vastly more valuable than machine time in almost all cases. MATLAB programmers seem to be behind the curve in realizing this.

  2. Loren Shure replied on :

    JamesL-

    I agree with you on writing for clarity, esp. if the speed isn’t horribly impacted. I don’t know a tool to specifically reinforce that though – in looking for specifics, I believe profiling and vectorize sensibly cover the intent of what you are saying, but without the background you provided. Thanks for sharing.

    –Loren

  3. Iain replied on :

    That’s a nice list. One more I think about:

    Does the code deal with corner cases gracefully? What if a) one or more dimensions of an input array are of length 1, or b) some inputs are Inf, NaN or complex?

    I have an assortment of more thoughts embedded in a more general document that’s far too long for your short list criterion: http://homepages.inf.ed.ac.uk/imurray2/compnotes/matlab_octave_efficiency.html

  4. Eric replied on :

    Thorough error checking of function input parameters is one of my best practices. By using inputParser and assert(), you can provide future users some guidance at least on the form of variables that are provided as inputs.

    This is true throughout the rest of the function as well. I try to explicitly check for as much a priori information as I can. When assertions fail, provide useful information regarding the assumption.

  5. Loren Shure replied on :

    Iain-

    Yes, all your items are important for creating robust code. It’s very hard to keep the list small, as you said.

    –Loren

  6. Loren Shure replied on :

    Eric-

    Yes, error checking is usually important. Interestingly, I have found cases where the error that would occur without checking explicitly was quite clear.

    –Loren

  7. chaowei chen replied on :

    instead of doing,

    for k=1:N,
    array(k)=k;
    end

    implicitly pre-allocate array by

    for k=N:-1:1,
    array(k)=k;
    end

  8. Brian Emery replied on :

    “Use bsxfun in lieu of repmat where possible. “

    This one is new to me, can you explain why, and how it’s used?

    I like the pragmatic theme of today’s post, by the way.

  9. Andrew Newell replied on :

    Use unit tests (the earlier the better) and use the Profiler to make sure that every line of code gets exercised.

  10. Hans replied on :

    I’m curious why you prefer functions over scripts. I find that scripts allow me to do things that I cannot do in a function such as have access to all the data created within the script once the script has finished executing. With a function, once the function is finished, the data is gone. Sometimes that’s desirable, other times it’s not.

  11. Martin replied on :

    Loren,

    thanks for a nice and suitably short list. There were a couple of best practices that I was not aware of:

    2) Using bsxfun in stead of repmat. Do you have any example of when bsxfun is better than repmat? What can be gained in terms of processing time / memory usage?

    10) Not checking for equality with floating point values. Could you give an example here, too? I may be guilty of doing this quite often, since the default class in Matlab is double (floating-point).

  12. Brad Stiritz replied on :

    Hi Loren,

    Best wishes for 2012! Very interesting post, thank you. A few comments I might add from my own experience..

    Additions to Loren’s list:

    1) As multi-function projects get larger & more complex, coding standards & best practices must be applied ever more rigorously.

    2) When developing class hierarchies : plan on several development generations & much refactoring before stable, high-quality code is achieved.

    3) Within all function M-files : use Hungarian Notation prefixes or suffixes to clarify variable types.

    4) Always step through newly-written function M-files & visually check values of every single variable, via datatip, command-line, or Variable Editor.

    Qualifications to Loren’s list:

    7. Don’t “poof” variables into any workspaces. (unless meta-programming is an integral part of your project design; example upon request)

    8. Use logical indexing instead of find. (unless you explicitly want to generate a vector of non-zero indexes)

    any comments appreciated,
    brad

  13. Jan Simon replied on :

    Thanks, Loren! A valuable list which should be considered by all Matlab users and developers. Some additions – there is no reason to keep the list small:

    * Documentation is required to make a working function usable.
    Inputs, outputs and the applied procedure must be explained.
    Because bugs are everywhere changes of the code must be accompanied by a version history. Without documentation it is impossible to reuse code for other projects.

    * Create a unit-test function, which compares the output of the function with a set of known answers. Automatic unit-tests are more powerful than manual tests, because they can run reproducible after each bug fix and when a new Matlab release is used.

    * Keep it simple stupid.
    This is one of the rare cases, where D.E. Knuth and Conan the Barbarian agree.
    I’m still amused by this line from SAVEPATH:
    mlr_dirs = cellfun(@(x) ismember(1,x),strfind(dirnames,mlroot))
    How funny. What about:
    mlr_dirs = strncmp(dirnames, mlroot, length(mlroot));

    * program time = design time + programming time + debug time + runtime
    Pre-mature optimization of speed will have negative effects to the debug time. A bad design increase the programming time due to necessary refactoring. Spending an hour to save a milli-second of runtime matters, if the program runs more than 3.6e6 times.

  14. Jonatan Wulcan replied on :

    11 Disable the Code Analyzer. The code analyzer frequently advice’s you to micro optimize your code not considering the impact on the readability. If you don’t disable it your left with the following bad options:
    A Do the optimization and make your code less readable for almost no speed improvement.
    B Clutter your code with %#ok
    C Ignore the message and let the code be cluttered with underlines easily masking more serious errors such as syntax errors.

  15. David Young replied on :

    Two things come to mind:

    1. Don’t get too buried in the code to think about the algorithm and how it can be improved. If you don’t step back, it’s all too easy to write things like sqrt(v) < r when v < r.^2 could be a lot faster if v is a big vector.

    2. Try to make your code as general and reusable as possible. (Writing functions rather than scripts is an aspect of this.) This may mean splitting up big functions into smaller ones, making sure that all cases are covered, thinking about whether the arguments give the user all the control they might want, and providing good documentation always.

  16. Tony Sturges replied on :

    Loren,
    in keeping with reply # 1, my main addition would be that LOTS of comments are good. When I come back to see a routine, a year later, even with lots of comments I sometimes wonder what a particular snippet was for.

    And altho’ I agree that “poofing” (is that a standard term?) is not good, I do not know for sure how to do the equivalent of putting a left-hand-side in the code or on the command line when I do an “import data,” other than a comment line.

    good stuff, all, and thanks
    tony

  17. matt fig replied on :

    My list is basically the same, except I also add:

    Comment! If I vectorize for speed and it makes the code hard to read I often leave the original, non-vectorized code as a comment (highlight, ctrl+r). This gets the best of both worlds. Anyone reading the code then sees what the vectorized version was meant to do and can uncomment the simple code for debugging or whatever purpose. Other than that, spending the time to comment every step or block can save much time later when performing maintenance, even if it is me looking at my own code. Comments are free, be generous with them.

    Test! Depending on how critical the code is, I will spend significant time trying to break it by doing dumb things to the code. Passing in the wrong types of variables, or doing things out of the expected order (especially important for GUIs).

    Help! Always spend the time to write good, clear help. This helps the user know (remember) what the code does, even if it is me!

    Don’t Mask! Pay attention when assigning variable (and function M-file) names such that built-in functions are not overwritten.

  18. Loren Shure replied on :

    Thanks for all the great feedback. I will only address a few items, at least right now.

    1) why a short list? because people need it in their heads, I believe. I don’t think most people have incorporated consulting long lists as part of most of their processes.

    2) for bsxfun, please see posts in the vectorization category. At least 2 talk about bsfux.

    3) similar to #2, please see the category about numerical accuracy for the issues relating to comparing floating point numbers.

    4) functions generally execute more quickly than scripts because they have a contained, known workspace and can be more fully analyzed by the language machinery.

    Thanks again!
    –Loren

  19. Richard Johnson replied on :

    Loren-

    Thanks for the thoughtful list. A blog that triggers 17 (and counting) responses is on target.

    Your list concentrates on coding practices (8 of 10 items). Many of the responses suggest development practices (test, document, etc). My experience in interacting with over 100 MATLAB users a year is that they are interested in and benefit from both.

    Of course you know where my favorite list is…

    -Richard

  20. Joseph Kirk replied on :

    This comment is not exactly “code performance” related, but the importance of clearly named variables/functions should not be overlooked. In my code I try to prefix variable names in a way that makes it immediately obvious what the variable contains. Here are some examples:

    (1) Use an “n” or “m” in front of a variable name that stores the number of “things”:
    i.e. nRecords = length(recordData);
    [nRows,nCols] = size(dataArray);

    (2) Use an “i” (or “j” or “k”) in front of a variable name that is incremented in a for-loop to make it clear what the loop is indexing through:
    i.e. for iRecord = 1:nRecords
    recordData{iRecord} = …;
    end

    (3) Use “is” as a prefix for logical arrays:
    i.e. isDetected = measurementData > threshold;
    isInPolygon = inpolygon(x,y,xv,yv);

    (4) Use “h” in front of variables that contain handles:
    i.e. hAx = axes;
    hFig = get(hAx,’Parent’);

    Also, it is extremely useful to attach unit identifiers to variable names to avoid (potentially disastrous amounts of) confusion. I like to use an underscore to separate the variable name from the units for clarity:
    i.e. elevationAngle_deg = …
    terrainHeight_ft = …
    vehicleSpeed_mps = …

  21. Sarah Zaranek replied on :

    I love this – great post!

    What is your thought on using try/catch – and where would you caution people against using it?

    I have my own thoughts – but I would like yours :)

  22. Loren Shure replied on :

    Richard, Joseph, and Sarah,

    1) Thanks Richard. Indeed I was focusing primarily on actual coding and not the rest of the development process. That is, of course, a very important topic and your book addresses it well.

    2) Joseph- I agree about having good naming conventions. Exactly how to agree on conventions can be difficult. I prefer MixedCase or camelCase vs. using an extra character with _ generally, for example. Others don’t share my aversion to _!

    3) Sarah-
    I like and think try/catch should be used in general. It didn’t make my top list because… not sure… maybe because I am not sure most people are writing code that needs to be robust to the extent of supporting a large group of users. Also, it may be richer than they often need. Sometimes simply issuing an error is enough. Only if you have to do something special after the error is try/catch *required*. If used, I believe it’s best used with the MException syntax. What do you think?

    –Loren

  23. Jiro Doke replied on :

    Great post, Loren!

    @Joseph, I was reading your list, and I noticed that I have the exact same convention as you have.

    I agree with many of these points. The one that resonated with me was about error checking. Aside from demos, I tend to create utility functions, so I always make sure that I have solid error checking, using functions like “assert”, “inputParser”, “validateattributes”, and “validatestring”. Writing extremely detailed help is another thing I do. Especially if I’m creating a class, I make sure I format it appropriately so that the HELP/DOC commands will render it correctly.

  24. KE replied on :

    For #7, what do you think about the following form of load which specifies what variables are appearing in the workspace?
    load(‘my_file_name.mat’, ‘myVariable’); % No poof!
    Great blog topic. Would love to see a future one on best practices for managing a code project in Matlab, such as many linked functions to do a complicated analysis task.

  25. Chad Gilbert replied on :

    11. Organize functions into packages or classes when a work directory gets too large, or lots of “helper” functions start appearing.

    For other languages, that probably wouldn’t need menition. But I see some very ugly MATLAB directories, and frequntly run into namespace collisions when using code from others.

  26. Loren Shure replied on :

    KE-

    You are still poofing the variable. You never “see” it in the file until it is used on the right hand side as a variable. So I don’t care for that usage.

    –Loren

  27. Loren Shure replied on :

    Chad-

    True, but not in my top group as in my experience, most users are not making large applications.

    –Loren

  28. Troy Haskin replied on :

    Excellent practices and information from everyone so far. One thing I have done in the past that I try to avoid (mainly for my sanity):

    o Do not use nested functions; use a driver function and subfunctions (and please close all of the subfunctions with “end”).

    This is mainly to avoid accidental workspace collisions/overwrites from parent functions. And not closing all subfunctions with “end” is a just a huge pet peeve (thanks to the indoctrination of F90).

  29. Loren Shure replied on :

    Troy-

    I don’t happen to agree with you about never using nested functions. The shared workspace, when used correctly, can massively reduce memory needs for one. Also, the nested functions can have much simpler and more intuitive apis.

    My $0.02,
    Loren

  30. Eleftheria replied on :

    Hi and many thanks for this post.
    Could you please explain why one should avoid eval (#7)

    Best,
    Eleftheria

  31. Loren Shure replied on :

    Eleftheria-

    Please see these 2 blog posts, in addition to what I mentioned – it is “unfriendly” to have new variables showing up in MATLAB when they are not created from an assignment.

    http://blogs.mathworks.com/loren/2005/12/28/evading-eval/

    http://blogs.mathworks.com/loren/2006/01/04/more-in-eval/

    –Loren

  32. Troy Haskin replied on :

    Loren:

    Hm, I never considered that, but it makes sense. I guess I’ve become spoiled with 8GB of memory at work.

    And I guess the last time I’ve used nested function was for one of the largest projects I’ve ever done (an incompressible, 2D CFD simulation with a 200×200 grid). I’ll consider this in the future.

    - Troy Haskin

  33. Mark replied on :

    With regards post http://blogs.mathworks.com/loren/2012/01/13/best-practices-for-programming-matlab/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+mathworks%2Floren+%28Loren+on+the+Art+of+MATLAB%29&utm_content=Google+International#comment-32892

    Loren, what you ideally want is a plugin architecture like Visual Studio whereby someone can write a ReSharper addin. Possibly one of the single most productivity and code-clarity beneficial addins I have ever used. Although the “beast of Redmond” is much derided I would urge any language developer to look at the functionality and usability of their IDE.

Leave a Reply

Wrap code fragments inside <pre> tags, like this:

<pre class="code">
a = magic(3);
sum(a)
</pre>

If you have a "<" character in your code, either follow it with a space or replace it with "&lt;" (including the semicolon).


MathWorks
Loren Shure works on design of the MATLAB language at MathWorks. She writes here about once a week on MATLAB programming and related topics.

These postings are the author's and don't necessarily represent the opinions of The MathWorks.