Dictionaries are one of the many new features of MATLAB R2022b which was released yesterday. Today I'll take a look at some of the details of this new datatype in MATLAB.
A dictionary is a collection of key-value pairs.
A dictionary (also known as an associative array in some programming languages), is a collection of key-value pairs where each unique key maps onto a value. For example, imagine I am keeping track of the weight of three people called Mike, Dave and Bob and I want to use a dictionary called gym to do this. How would I create this in MATLAB?
Now I have this dictionary object, I can look up the weight of any entry by supplying the relevant key, which in this case is a person's name.
This look-up operation, which relies on unique keys rather than position, is the key (no pun intended!) to the usefulness of dictionaries. They are completely optimised for fast look-up and do so in constant time (or O(1) if you are familiar with big O notation) which essentially means that it's as fast to look up an item in a 10 element dictionary as it is in a 10 million element dictionary (in theory at least!).
It is vital to understand that keys in a dictionary are unique. If I make another assignment to "Bob" in this dictionary, it will overwrite the value that "Bob" used to map to rather than creating a second entry called "Bob".
There can be only one of each key!
What if a key does not exist in a dictionary?
We can check to see if a key exists in a dictionary using the isKey() function. For example, the key "Michelle" does not exist in the dictionary gym and this is confirmed by isKey()
Attempting to look up a key that does not exist results in an error.
However, assigning to a key that does not exist will insert that key-value pair into the dictionary and return the result.
Returning all of the keys and values of a dictionary as arrays
We can get all of the keys out of a dictionary using the keys function. In this case, since our keys were strings, the result is an array of strings.
Similary, we get an array of values as follows
Note that each element of the gymkeys array corresponds to each element of the gymvalues array. For example, "Bob" is the 3rd element of gymkeys and his weight is the 3rd element of gymvalues. Furthermore, keys and values are returned in the same order that they were inserted into the dictionary. In theory, a dictionary is an unordered object but in MATLAB's implementation, insertion order is maintained.
3 ways of creating dictionaries in MATLAB
There are currently three ways of creating a dictionary in MATLAB. We've already seen the first one, which is a very MATLAB-y, vectorised way of doing it. The idea is to create an array of keys then an array of values and pass them both to dictionary() which then forms a dictionary from the set of elementwise pairs.
Vectorised dictionary creation
Dictionary creation using interleaved keys and values
An alternative method is to interleave keys and values like this
The line breaks were just for readability, I could just have easily done this:
Dictionary creation starting from an empty dictionary
The final method is to start with an empty dictionary
and then add key-value pairs, one at a time
All of these methods give equivalent results although performance might vary.
Counting the number of entries in a dictionary: (Hint - You don't use numel)
To return the number of entries in a dictionary, we use the new numEntries() function
When I first starting using dictionaries in MATLAB, I tried to use numel instead and was rather suprised by the result
I actually reported this as a bug when I used early versions of dictionary but its all consistent. The thinking here is that you only have one dictionary, d3, which itself contains the entries.
To expand on this a little further, MathWorks think that the size, numel, isempty, indexing, and cat functions should be consistent and follow normal MATLAB array behavior. This isn't possible with dictionary since the set of entries doesn't have a shape and duplicate values overwrite.
More about empty dictionaries: isConfigured()
Let's look closer at an empty dictionary.
Not only is newDict empty but the key and value types are both unset. We say that this dictionary is both empty and unconfigured. We can check configuration status using the isConfigured() function.
You may ask "Why does this matter?". Well, there are certain things you can't do with unconfigured dictionaries. You can't ask if a key exists for example
To configure an empty dictionary in R2022b, you simply add an entry.
This has added a single entry and configured newDict dictionary so that its keys are of type datetime and values are of type string. isConfigured now returns true.
The next question you might ask is, how can I have an empty but configured dictionary? In R2022b, the answer is to create the dictionary with empty arrays of the types you want
Dictionaries are vectorised
"Of course they are vectorised, this is MATLAB!" was the answer I got from development when I asked the question. Let's see what this means.
We've already seen vectorised construction of dictionaries
We can also perform vectorised assignment. Let's change Mike's and Bob's weights simultaneously
Scalar expansion also works. Two twins join the gym and since they have the same weight, I only need to supply the value once
It's also possible to do vectorised look up. The shape of the returned values array will be the same as the same of the keys queries
This naturally leads to the question: "What happens if any one of the input keys does not exist?".
We get an error and no results are returned.
Exporting and converting dictionaries
Like many other MATLAB objects, dictionaries can be serialised in a .mat file. All you need to do is save it as you would a matrix or whatever
If you wanted to export your dictionary to a .csv file, the workflow is to first convert it to a table and then export that.
The entries function provides conversion functionality from dictionaries to tables, structures or cells. Here, we use a table.
Note that the entries of the table are in the order they were inserted in the original dictionary. Let's now write this to a .csv file
An example application for dictionaries: counting word occurrences
One of the classical uses for dictionaries is to count occurrences of words in a file. Let's do this using the sonnets.txt file that ships with MATLAB that contains 'The Sonnets' by William Shakespeare.
First we load the text, get rid of unnecessary punctuation and convert everything to lowercase.
Split the string into an array of words
Let's see how many words we have and how many of them are unique
To count the number of occurrences of each word, we first create a suitable empty dictionary where the keys are strings and the values are doubles
Now iterate over all of the words. For each word, we check to see if it already exists in our dictionary. If it does, increase the value by one. If it doesn't, create a new entry with the value of 1.
Our dictionary has 3436 entries which is equal to the number of unique words we found earlier. This gives us some confidence that we are heading in the right direction. As another check, we could sum all of the values and see that they equal the total number of words we found earlier
It turns out that Mr Shakespeare talked about love a lot more than hate!
That's it for our first look at this new datatype in MATLAB. This is the first of three posts on the subject of dictionaries. In the next two posts I'll take a deep dive into the different types you can use in dictionaries, including user-defined classes and will also compare the new functionality with the venerable containers.map. Let me know what you think and enjoy playing with the new version of MATLAB.