I've always been a release notes nerd as I think that you can learn a lot from the raft of enhancements that come with every new release of software like MATLAB. Even before I worked here, when a new version of MATLAB dropped I would spend ages picking through the release notes looking for anything that might be useful or just straight-up interesting. I like to think about the stories behind each feature. Why did the developers choose to work on this thing and why now? What problems does it solve? Why is it designed as it is? Why are there obvious limitations...didn't the developers notice these? How might I use it?
This brings me to the new combinations function in R2023a which, when I looked at the release notes, I didn't really understand the point of since I could think of a bunch of other ways to achieve what it did. I reached out to the head of the development team who built combinations and asked for the story. Here's what I learned. What does combinations do?
The doc tells us that combinations will "Generate all element combinations of arrays". The output is always a table. A couple of examples speaks better than 1000 words:
T = combinations([1 8 6],[9 3 2])
T = 9×2 table
| Var1 | Var2 |
---|
1 | 1 | 9 |
---|
2 | 1 | 3 |
---|
3 | 1 | 2 |
---|
4 | 8 | 9 |
---|
5 | 8 | 3 |
---|
6 | 8 | 2 |
---|
7 | 6 | 9 |
---|
8 | 6 | 3 |
---|
9 | 6 | 2 |
---|
Every row is a combination with the first element coming from the first vector and the second element from the second vector.
The combinations function can take as many input arguments as you want and you can mix data types. The fact that you can mix data types is why the output is always a table since tables allow the input datatypes to be conserved.
method = ["kmeans" "dbscan" "kmedoids"];
date = categorical(["small" "large"]);
T = combinations(experimentID,method,date)
T = 18×3 table
| experimentID | method | date |
---|
1 | 1 | "kmeans" | small |
---|
2 | 1 | "kmeans" | large |
---|
3 | 1 | "dbscan" | small |
---|
4 | 1 | "dbscan" | large |
---|
5 | 1 | "kmedoids" | small |
---|
6 | 1 | "kmedoids" | large |
---|
7 | 2 | "kmeans" | small |
---|
8 | 2 | "kmeans" | large |
---|
9 | 2 | "dbscan" | small |
---|
10 | 2 | "dbscan" | large |
---|
11 | 2 | "kmedoids" | small |
---|
12 | 2 | "kmedoids" | large |
---|
13 | 3 | "kmeans" | small |
---|
14 | 3 | "kmeans" | large |
---|
⋮ |
---|
What problem does combinations solve?
Element combinations are commonly used for parameter sweeps. For example, imagine I have 3 experiments with IDs 1,2 and 3.
I'm going to cluster the data using one of three methods
method = ["kmeans" "dbscan" "kmedoids"];
I have data from these experiments conducted on different days
date = datetime(["15-Oct-2013","20-Nov-2014"]);
I can form all possible combinations of these input variables:
T = combinations(experimentID,method,date)
T = 18×3 table
| experimentID | method | date |
---|
1 | 1 | "kmeans" | 15-Oct-2013 |
---|
2 | 1 | "kmeans" | 20-Nov-2014 |
---|
3 | 1 | "dbscan" | 15-Oct-2013 |
---|
4 | 1 | "dbscan" | 20-Nov-2014 |
---|
5 | 1 | "kmedoids" | 15-Oct-2013 |
---|
6 | 1 | "kmedoids" | 20-Nov-2014 |
---|
7 | 2 | "kmeans" | 15-Oct-2013 |
---|
8 | 2 | "kmeans" | 20-Nov-2014 |
---|
9 | 2 | "dbscan" | 15-Oct-2013 |
---|
10 | 2 | "dbscan" | 20-Nov-2014 |
---|
11 | 2 | "kmedoids" | 15-Oct-2013 |
---|
12 | 2 | "kmedoids" | 20-Nov-2014 |
---|
13 | 3 | "kmeans" | 15-Oct-2013 |
---|
14 | 3 | "kmeans" | 20-Nov-2014 |
---|
⋮ |
---|
Once I have all of these combinations in a table, I can run my analysis function on all of them. One way to do this would be to use rowfun which uses the contents of each row of the input table as the arguments to my function. I have defined a trivial myAnalysis function at the end of this article to show how this would work. results = rowfun(@myAnalysis,T);
Working on ID=1 dated Tuesday October 15 2013 wih method kmeans
Working on ID=1 dated Thursday November 20 2014 wih method kmeans
Working on ID=1 dated Tuesday October 15 2013 wih method dbscan
Working on ID=1 dated Thursday November 20 2014 wih method dbscan
Working on ID=1 dated Tuesday October 15 2013 wih method kmedoids
Working on ID=1 dated Thursday November 20 2014 wih method kmedoids
Working on ID=2 dated Tuesday October 15 2013 wih method kmeans
Working on ID=2 dated Thursday November 20 2014 wih method kmeans
Working on ID=2 dated Tuesday October 15 2013 wih method dbscan
Working on ID=2 dated Thursday November 20 2014 wih method dbscan
Working on ID=2 dated Tuesday October 15 2013 wih method kmedoids
Working on ID=2 dated Thursday November 20 2014 wih method kmedoids
Working on ID=3 dated Tuesday October 15 2013 wih method kmeans
Working on ID=3 dated Thursday November 20 2014 wih method kmeans
Working on ID=3 dated Tuesday October 15 2013 wih method dbscan
Working on ID=3 dated Thursday November 20 2014 wih method dbscan
Working on ID=3 dated Tuesday October 15 2013 wih method kmedoids
Working on ID=3 dated Thursday November 20 2014 wih method kmedoids
Now that we've seen an example of the new workflows that combinations allows. I thought that it would be fun to explore some of the thinking behind its design.
Older solutions #1 - combvec
Of course, people have been doing parameter sweeps for a long time and there are a range of solutions in common use. There are issues with all of these, however, that led to us deciding to create something new.
One such function is combvec in the Deep Learning toolbox -- a function so old that it was in Deep Learning toolbox before Deep Learning was cool and we called it Neural Network Toolbox. a4 = combvec(a1,a2)
1 2 3 1 2 3
4 5 6 4 5 6
7 7 7 8 8 8
9 9 9 10 10 10
Seems to do the job! I can have as many input vectors as I like and each combination is a column. One issue, however, is that this requires a license for Deep Learning Toolbox which is unhelpful for those who want to generate parameter sweeps for anything other than Deep Learning. We did consider simply moving combvec to core MATLAB but there are aspects of the design we'd do differently today; starting with the fact that we wouldn't call it combvec since it works with more than just vectors. combvec also doesn't support nonumeric data:
v4 = combvec(v1,v2,v3)
1 7 4 1 7 4 1 7 4 1 7 4 1 7 4 1 7 4
9 9 9 42 42 42 8 8 8 9 9 9 42 42 42 8 8 8
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
To bring this into core MATLAB in a way that would satisfy the user requirements we've been collecting would require quite a hefty redesign. This would break backwards compatibility and so we decided against it. This decision kicked off our thought processes though; we wanted to do something! If not move combvec into core MATLAB then what? Let's look around at other ways people solve the combinations problem.
Older Solutions #2 - meshgrid, ndgrid and allcomb
Three other functions are frequently recommended on MATLAB Answers to solve this problem: meshgrid, ndgrid and allcomb. The first two are built into MATLAB while allcomb is a popular function on File Exchange. allcomb is a superb piece of work that has been downloaded almost 25,000 times. Several of the reviews asked why such a useful piece of functionality wasn't part of core MATLAB. Quite! All of these work just fine in certain circumstances. For example, consider these inputs
I could use meshgrid like this
[am, bm] = meshgrid(v1,v2);
ndgird like this
[an, bn] = ndgrid(v1,v2);
Finally, the File exchange's allcomb gives us
So far so good but if I add the following two vectors
The above solutions no longer work:
[am, bm, cm, dm] = meshgrid(v1,v2, v3, v4);
Error using meshgrid
Too many input arguments.
[an, bn, cn, dn] = ndgrid(v1,v2,v3,v4);
result = [an(:), bn(:), cn(:) dn(:)]
Error using categorical/horzcat
Unable to concatenate a double array and a categorical array.
allcomb(v1,v2,v3,v4)
Error using categorical/cat
Unable to concatenate a double array and a categorical array.
Error in allcomb (line 111)
A = reshape(cat(NC+1,A{:}), [], NC) ;
The new combinations function works just fine though
combinations(v1,v2,v3,v4)
ans = 16×4 table
| v1 | v2 | v3 | v4 |
---|
1 | 1 | 3 | "a" | a |
---|
2 | 1 | 3 | "a" | c |
---|
3 | 1 | 3 | "c" | a |
---|
4 | 1 | 3 | "c" | c |
---|
5 | 1 | 4 | "a" | a |
---|
6 | 1 | 4 | "a" | c |
---|
7 | 1 | 4 | "c" | a |
---|
8 | 1 | 4 | "c" | c |
---|
9 | 2 | 3 | "a" | a |
---|
10 | 2 | 3 | "a" | c |
---|
11 | 2 | 3 | "c" | a |
---|
12 | 2 | 3 | "c" | c |
---|
13 | 2 | 4 | "a" | a |
---|
14 | 2 | 4 | "a" | c |
---|
⋮ |
---|
So far so good but I have to confess to you that some aspects of this function triggered me in ways that I'll need to discuss with my therapist so I tortured development even further with my questions.
Why is the output a table?
Returning the result as a table allows input datatypes to be preserved in the output since each column can hold a different datatype. We could have also chosen a cell but decided against it because we find that most users consider using cell arrays to be advanced manueuvers. Also, it was very strange to see a cell array as output when all of the inputs are double! Most people on the design team preferred tables
Why not allow the user to change output format?
Although I appreciate the elegance of a table format, I can imagine times when I'd prefer my output to be something else. The most obvious being an array when all inputs are numeric or cell array if I'm the kind of person who prefers them to tables. Why not just allow something like the following?
After accepting my $5 donation to the "just" Jar, development told me that they considered this idea but the problem lies with the ambiguity between inputs and name-value pairs. Given the pair "OuputFormat","array", the question becomes "is that pair defining an option or two additional inputs"? We could do exact matching; that is if an input is "OutputFormat" then consider it an option switch but what if you want the input to be "OutputFormat"?
Since R2021a, MATLAB has supported a different way of defining name-value arguments so we could have insisted that ONLY the new format be accepted. i.e.
The issue here is that a huge number of our users have been using the traditional "name","value" pairs for many years. At the moment, it doesn't matter which method you use in most functions. In those cases where it does matter, it will likely be only the old syntax that's supported because the developer hasn't gotten around to supporting the new method yet. Starting to introduce functions in core MATLAB that only support the newer name="value" syntax might result in annoying more people than this solution satisfies.
OK, someone countered, why not just pack the inputs in curly braces {} to avoid ambiguity? We could just do
Another $5 in the 'just' jar. The reason we didn't go with this may be contentious. Personally, I like it! It solves the problem by removing ambiguity between inputs and options defined as name-value pairs. However, over the years MathWorks has learned that always forcing users to pack inputs in a cell array makes the function signature a little unusual, which can be confusing to our users. This consideration won out and the design was discarded.
One more option: We could have introduced a required FORMAT argument. That is, the function call would have looked like this
for example
The main issue with this is that users will always need to type the required argument. When most people end up using "table" there will be cries of "Why didn't you make "table" the default"? It's also not extensible, there could never be any additional options. Not many people liked it!
Eventually, it was decided that we just won't provide the option to choose output type. Tables support the majority of the use cases we were targeting and utility functions exist to convert tables to other datatypes.
If you have a workflow where this causes an issue for us, do let us know!
Combinations and rowfun -- your new combination for parameter sweeps
I hope you enjoyed this peek behind the curtain...and it really is just a peek! I've attempted to summarise a huge amount of discussion here and I apologise to my colleagues in development if I have misrepresented their thinking in any way. When all is said and done, I think we have a beautiful new way to support conducting parameter sweeps in MATLAB and I hope you enjoy it
%Create arguments to sweep over
method = ["kmeans" "dbscan" "kmedoids"];
date = datetime(["15-Oct-2013","20-Nov-2014"]);
%Form all combinations of the arguments
T = combinations(experimentID,method,date);
%Perform analysis using all arguments
results = rowfun(@myAnalysis,T);
Helper functions
function result = myAnalysis(ID,method,date)
date.Format ="eeee MMMM d yyyy";
fprintf("Working on ID=%d dated %s wih method %s\n",ID,date,method);
result = rand(); %% A proxy for real work
コメント
コメントを残すには、ここ をクリックして MathWorks アカウントにサインインするか新しい MathWorks アカウントを作成します。