Last fall Loren wrote a blog post about a new syntax in R2009b for ignoring function inputs or function outputs. For example, suppose you call sort and only want the second output. In R2009b you can do that using this syntax:
[~, idx] = sort(A);
The tilde character here is used as a placeholder to indicate that you don't need the first output argument of sort.
For another example, suppose you write a function that has to take three input arguments (because another function is always going to pass it three arguments), but your function doesn't need the second argument. Then you can write the first line of your function this way:
function out = myfun(A, ~, C)
This new syntax has drawn a startling amount of discussion, both in the comments on Loren's post as well as in the comp.soft-sys.matlab newsgroup. (See the thread "Getting indexes of rows of matrix with more than n repetitions," for example.)
Responses to this new syntax have fallen into roughly four categories:
- "Finally, I've been waiting for this!"
- Clarification questions about how it works and how to use it.
- Complaints about the specific syntax chosen and suggestions for alternatives.
- "The intro of the tilde op was nothing but a big, useless blunder."
The passionate arguments in comp.soft-sys.matlab caught my eye and have prompted me to add my two cents here. Although I have no particular expectation of changing anyone's mind, I thought it might be interesting to address one particular question that was expressed well by Matt Fig: "If something already works, why complicate things?"
Matt means that we already have at least a couple of ways to ignore an output variable. The first is to use a variable name that (hopefully) makes the programmer's intent clear:
[unused, idx] = sort(A);
The second technique takes advantage of the way MATLAB assigns function outputs from left to right:
[idx, idx] = sort(A);
So why go to the trouble of introducing a new syntax?
Well, you'll probably get a somewhat different answer from every MATLAB developer you ask. Some might even agree. (I'll note, however, that this was one of the least-controversial MATLAB syntax proposals ever considered by the MATLAB language team.)
The proposed syntax was originally considered years ago, sometime around 2001. The proposal was approved internally at that time but then didn't get implemented right away because of competing priorities.
To understand why we eventually did decide to go ahead and implement it, you have to understand how this fairly minor syntactic issue is connected to our long-term efforts to do something much more important: provide automatic (and helpful!) advice to MATLAB users about their code. If you've used the MATLAB Editor at all over the last few years, you've probably observed how it marks code sections and makes suggestions to you. These suggestions fall into several categories, such as errors, possible errors, performance, new features, etc.
One of the things we try to flag is potential programming errors. A coding pattern that very often indicates a programming error is when you save a computed value by assigning it to a variable, but then you never use that saved value. At the very least, that pattern may indicate "dead code," or code that was doing something useful at one time but is now just cruft.
Both of the conventions mentioned above ([unused,idx] = sort(A) and [idx,idx] = sort(A)) exhibit this pattern of computing and saving a value and then never using it.
That might not be obvious for the [idx,idx] = sort(A) case, but it's true. The first output of sort is assigned to the variable idx. Then the second output is assigned to idx, causing the first output value to be discarded without ever being used.
These cases aren't programming errors, though, because we've given you no other way to ignore outputs. We can't automatically and reliably distinguish between the intentional [idx,idx] = sort(A) and the similar-looking [y,y] = foobar(x) that is the result of a typo.
I think it's important for the programmer to communicate his or her intent very clearly (which is why I tend to prefer the [unused,idx] = sort(A) convention). It is useful to have a way to communicate intent syntactically instead of by convention. And the new syntax helps us, in a small way, progress toward our long-term goal of helping MATLAB users write better MATLAB code.
OK, fire away!
Get the MATLAB code
Published with MATLAB® 7.9
Comments are closed.
16 CommentsOldest to Newest
I like the new syntax, and I agree that it is better to be able to communicate your coding intentions to the compiler/interpreter so that it can help you. In this case, the function can know when a return value won’t be used and could opt to not generate it to improve performance.
First you (TMW) introduced JIT compiling which diminished the value of years of vectorizing experience. Now you upgrade your syntax and enhance your editor to detect errors and make MATLAB programming accessible to anyone. I now see that there is a systematic plan, a conspiracy, to deprive MATLAB wizards of their livelihood.
I think that it’s a very nice little addition to MATLAB syntax since it makes it very clear to the reader that you don’t want that part of the output and it saves memory as a bonus.
All of the alternatives I have seen are not as good in my opinion. For example
[dummy, ind] = sort(X)
Wastes memory and
[ind, ind] = sort(X)
looks weird whereas
[~, ind] = sort(X)
makes the intent of the author more obvious IMHO. Thanks for adding it.
What Michael said. But [ind ind] == sort(x) is worse than weird. It smells like making your code dependent on a particular language implementation that could change.
Citing Cris Luengo in comment #42 to Loren’s post on this topic:
“Tom Elmer (post 5) complained about the reuse of symbols. … A lot more difficult to distinguish are the two uses of the quote character. …”
and your response (#43):
“Chris—Yes, many things about MATLAB code would be simpler if we had used ” for strings way back when.”
Will the output ~ produce a similar hindsight ‘way into the future’?
I prefer as few duplications of symbol usages as possible for a somewhat different reason. While a computer may be able to parse a code segment into an unique meaning, we humans typically prefer to *read* the source code. And the further away from common language and the more special symbols there are (especially when reused) the less code reading and the more code parsing it becomes. After all, we have source code because it is easier for humans to *read*.
BTW, what was the problem with a syntax where the ~ is replaced by [ ] as no-output-argument placeholder? That would be consistent with the input argument list syntax.
Lars—Ouch, another place where I misspelled Cris’ name.
Regarding your speculation about possibly regretting the use of ~ way into the future – I doubt it. The MATLAB language team now collectively has decades of experience in language parsing, interpretation, code generation, execution, etc. Team members are very sensitive to aspects of the language definition that are, shall we say, “challenging.” No one flagged the use of ~ for this purpose.
I do not recall if  was discussed as a possible placeholder.
Well, [ ] seems to me like the obvious candidate because of consistency in syntax with the input argument list.
If it actually is possible from a parsing point of view to use [ ] I would eagerly like to suggest that you introduce it as an alternative syntax.
I do not use ~ because the performance is poor compared to other syntaxes ([IDX,IDX] = … or [DUMMY, IDX] = …)
Poor performance was also underlined in the previous contest…tildes were replaced.
I think that the tilde will become an interesting feature if the computation of that precise argument can be completely avoided…which is, always according to my opinion, very improbable.
Lars—I do not understand your comment about consistency with the input argument list. This is not legal syntax:
function y = foobar(a,,c)
On the other hand, the new tilde syntax is consistently available for both the input and the output argument list.
In my opinion, there is no chance that this particular syntax choice will be revisited.
Oleg—The performance difference is merely an implementation issue, because the two forms are semantically equivalent. I expect the performance difference will go away in a future release.
I like the new format. It makes the programmers intention crystal clear.
But how much of a performance hit are we talking about…
I don’t see much of one on my system.
Matt—I don’t really know. The use cases we’re talking about here come up fairly rarely in my own programming, and usually not in bottleneck code that’s worth optimizing.
Thank you for addressing this issue. I voiced an opinion when Loren discussed this last autumn, but I think I was a little too reluctant to the syntax at the time. Having had a few months’ experience with this I find I more frequently reach for tilde when I wish to ignore an output or more. I personally think that using the tilde symbol as a placeholder clearly shows intent. This was the right thing to do in my opinion.
Still, backwards compatibility dictates that I don’t use this feature in any code intended for a purpose other than “has to run on my workstation right now”. I don’t foresee being able to use this in code I distribute to others for at least another couple of years. In portable code I tend to prefer the
[ind, ind] = sort(X)
Thank you again for delving into some of the history of this feature.
It was nice that you wrote, “I think it’s important for the programmer to communicate his or her intent very clearly . . .”
Over 40 years ago, Dijkstra went even further by saying in effect that the programmer’s primary task is to write the code in a way that demonstrates that the code is correct, just by reading it.
I like the new ~ syntax since it makes it easier to produce quiet MLint reports.
But, like others have mentioned, I would have used the  rather than the ~.
While I agree this is a great feature in Matlab, I must say that the pushback you are getting from your community is not suprising to me. It comes down to the fundamental fact that I have found in my experiances that engineers make really horrible computer scientists/programmers. They don’t care that they have to overwrite a variable, let it sit unused, or wait for the junk collector to take it away for them (in modern languages of course). They do what works the in the most uneligant and kludgy way possible that still computes the needed answer and then they move on. I applaud efforts language architects make to make programs easier to read because we all know that the programmer isn’t going to take the 60 seconds to write a decent header and add a few key comments.
Matt—Getting pushback on this did not surprise me at all. Language syntax always draws contradictory feedback. Also, internal MathWorks discussions on language features tend to be much more “spirited” than anything I’ve seen here.
I don’t share your disparaging view of engineers. They don’t typically regard themselves as programmers, and they typically don’t have substantial programmer training. They want to use MATLAB as a tool to get a task done, and for them the primary deliverable is often NOT the code. Our job at MathWorks is to help them get their task done.
Along the way we are also trying to improve the MATLAB language and environment for use by programmers.