# An introduction to dictionaries (associative arrays) in MATLAB

Dictionaries are one of the many new features of MATLAB R2022b which was released yesterday. Today I'll take a look at some of the details of this new datatype in MATLAB.

### A dictionary is a collection of key-value pairs.

A dictionary (also known as an associative array in some programming languages), is a collection of key-value pairs where each unique key maps onto a value. For example, imagine I am keeping track of the weight of three people called Mike, Dave and Bob and I want to use a dictionary called gym to do this. How would I create this in MATLAB?
names = ["Mike","Dave","Bob"];
weights = [89,75,68]; % weight in kilograms
gym = dictionary(names,weights) % Vectorised constructor. Performs elementwise mapping.
gym =
dictionary (stringdouble) with 3 entries: "Mike" ⟼ 89 "Dave" ⟼ 75 "Bob" ⟼ 68
Now I have this dictionary object, I can look up the weight of any entry by supplying the relevant key, which in this case is a person's name.
gym("Dave")
ans = 75
gym("Bob")
ans = 68
This look-up operation, which relies on unique keys rather than position, is the key (no pun intended!) to the usefulness of dictionaries. They are completely optimised for fast look-up and do so in constant time (or O(1) if you are familiar with big O notation) which essentially means that it's as fast to look up an item in a 10 element dictionary as it is in a 10 million element dictionary (in theory at least!).
It is vital to understand that keys in a dictionary are unique. If I make another assignment to "Bob" in this dictionary, it will overwrite the value that "Bob" used to map to rather than creating a second entry called "Bob".
There can be only one of each key!
gym("Bob")=110
gym =
dictionary (stringdouble) with 3 entries: "Mike" ⟼ 89 "Dave" ⟼ 75 "Bob" ⟼ 110

### What if a key does not exist in a dictionary?

We can check to see if a key exists in a dictionary using the isKey() function. For example, the key "Michelle" does not exist in the dictionary gym and this is confirmed by isKey()
isKey(gym,"Michelle")
ans = logical
0
Attempting to look up a key that does not exist results in an error.
gym("Michelle")
Error using ()
However, assigning to a key that does not exist will insert that key-value pair into the dictionary and return the result.
gym("Michelle") = 70
gym =
dictionary (stringdouble) with 4 entries: "Mike" ⟼ 89 "Dave" ⟼ 75 "Bob" ⟼ 110 "Michelle" ⟼ 70

### Returning all of the keys and values of a dictionary as arrays

We can get all of the keys out of a dictionary using the keys function. In this case, since our keys were strings, the result is an array of strings.
gymkeys = keys(gym)
gymkeys = 4×1 string
"Mike"
"Dave"
"Bob"
"Michelle"
Similary, we get an array of values as follows
gymvalues = values(gym)
gymvalues = 4×1
89 75 110 70
Note that each element of the gymkeys array corresponds to each element of the gymvalues array. For example, "Bob" is the 3rd element of gymkeys and his weight is the 3rd element of gymvalues. Furthermore, keys and values are returned in the same order that they were inserted into the dictionary. In theory, a dictionary is an unordered object but in MATLAB's implementation, insertion order is maintained.

### 3 ways of creating dictionaries in MATLAB

There are currently three ways of creating a dictionary in MATLAB. We've already seen the first one, which is a very MATLAB-y, vectorised way of doing it. The idea is to create an array of keys then an array of values and pass them both to dictionary() which then forms a dictionary from the set of elementwise pairs.
Vectorised dictionary creation
fruits = ["Apple","Pear","Banana"];
colours = ["Red","Green","Yellow"];
d1 = dictionary(fruits,colours)
d1 =
dictionary (stringstring) with 3 entries: "Apple" ⟼ "Red" "Pear" ⟼ "Green" "Banana" ⟼ "Yellow"
Dictionary creation using interleaved keys and values
An alternative method is to interleave keys and values like this
d2 = dictionary("Apple","Red", ...
"Pear","Green", ...
"Banana","Yellow")
d2 =
dictionary (stringstring) with 3 entries: "Apple" ⟼ "Red" "Pear" ⟼ "Green" "Banana" ⟼ "Yellow"
The line breaks were just for readability, I could just have easily done this:
d2 = dictionary("Apple","Red","Pear","Green","Banana","Yellow")
d2 =
dictionary (stringstring) with 3 entries: "Apple" ⟼ "Red" "Pear" ⟼ "Green" "Banana" ⟼ "Yellow"
Dictionary creation starting from an empty dictionary
d3 = dictionary()
d3 =
dictionary with unset key and value types.
and then add key-value pairs, one at a time
d3("Apple") = "Red"
d3 =
dictionary (stringstring) with 1 entry: "Apple" ⟼ "Red"
d3("Pear") = "Green"
d3 =
dictionary (stringstring) with 2 entries: "Apple" ⟼ "Red" "Pear" ⟼ "Green"
d3("Banana") = "Yellow"
d3 =
dictionary (stringstring) with 3 entries: "Apple" ⟼ "Red" "Pear" ⟼ "Green" "Banana" ⟼ "Yellow"
All of these methods give equivalent results although performance might vary.

### Counting the number of entries in a dictionary: (Hint - You don't use numel)

To return the number of entries in a dictionary, we use the new numEntries() function
numEntries(d3)
ans = 3
When I first starting using dictionaries in MATLAB, I tried to use numel instead and was rather suprised by the result
numel(d3)
ans = 1
I actually reported this as a bug when I used early versions of dictionary but its all consistent. The thinking here is that you only have one dictionary, d3, which itself contains the entries.
To expand on this a little further, MathWorks think that the size, numel, isempty, indexing, and cat functions should be consistent and follow normal MATLAB array behavior. This isn't possible with dictionary since the set of entries doesn't have a shape and duplicate values overwrite.

### More about empty dictionaries: isConfigured()

Let's look closer at an empty dictionary.
newDict = dictionary()
newDict =
dictionary with unset key and value types.
Not only is newDict empty but the key and value types are both unset. We say that this dictionary is both empty and unconfigured. We can check configuration status using the isConfigured() function.
isConfigured(newDict)
ans = logical
0
You may ask "Why does this matter?". Well, there are certain things you can't do with unconfigured dictionaries. You can't ask if a key exists for example
isKey(newDict,"SomeKey")
Error using isKey
Unable to perform a lookup in a dictionary with unset key and value types. Add
entries to the dictionary.
To configure an empty dictionary in R2022b, you simply add an entry.
newDict(datetime("06-Sep-2022")) = "today"
newDict =
dictionary (datetimestring) with 1 entry: 06-Sep-2022 ⟼ "today"
This has added a single entry and configured newDict dictionary so that its keys are of type datetime and values are of type string. isConfigured now returns true.
isConfigured(newDict)
ans = logical
1
The next question you might ask is, how can I have an empty but configured dictionary? In R2022b, the answer is to create the dictionary with empty arrays of the types you want
emptyAndConfigured = dictionary(string.empty,double.empty)
emptyAndConfigured =
dictionary (stringdouble) with no entries.
isConfigured(emptyAndConfigured)
ans = logical
1

### Dictionaries are vectorised

"Of course they are vectorised, this is MATLAB!" was the answer I got from development when I asked the question. Let's see what this means.
We've already seen vectorised construction of dictionaries
names = ["Mike","Dave","Bob"];
weights = [89,75,68]; % weight in kilograms
gym = dictionary(names,weights) % Vectorised constructor. Performs elementwise mapping.
gym =
dictionary (stringdouble) with 3 entries: "Mike" ⟼ 89 "Dave" ⟼ 75 "Bob" ⟼ 68
We can also perform vectorised assignment. Let's change Mike's and Bob's weights simultaneously
gym(["Mike","Bob"]) = [80,73]
gym =
dictionary (stringdouble) with 3 entries: "Mike" ⟼ 80 "Dave" ⟼ 75 "Bob" ⟼ 73
Scalar expansion also works. Two twins join the gym and since they have the same weight, I only need to supply the value once
gym(["Twin 1","Twin 2"]) = 100
gym =
dictionary (stringdouble) with 5 entries: "Mike" ⟼ 80 "Dave" ⟼ 75 "Bob" ⟼ 73 "Twin 1" ⟼ 100 "Twin 2" ⟼ 100
It's also possible to do vectorised look up. The shape of the returned values array will be the same as the same of the keys queries
gym(["Mike","Bob";"Twin 1","Dave"])
ans = 2×2
80 73 100 75
This naturally leads to the question: "What happens if any one of the input keys does not exist?".
gym(["Mike","Bob";"DoesNotExist","Dave"])

### Exporting and converting dictionaries

Like many other MATLAB objects, dictionaries can be serialised in a .mat file. All you need to do is save it as you would a matrix or whatever
%Save the dictionary called gym to a file called mydictionary.mat
save("mydictionary",'gym')
If you wanted to export your dictionary to a .csv file, the workflow is to first convert it to a table and then export that.
The entries function provides conversion functionality from dictionaries to tables, structures or cells. Here, we use a table.
gymTable = entries(gym,"table")
gymTable = 3×2 table
KeyValue
1"Mike"89
2"Dave"75
3"Bob"110
Note that the entries of the table are in the order they were inserted in the original dictionary. Let's now write this to a .csv file
writetable(gymTable,"gym.csv")

### An example application for dictionaries: counting word occurrences

One of the classical uses for dictionaries is to count occurrences of words in a file. Let's do this using the sonnets.txt file that ships with MATLAB that contains 'The Sonnets' by William Shakespeare.
First we load the text, get rid of unnecessary punctuation and convert everything to lowercase.
punctuationCharacters = ["." "?" "!" "," ";" ":"];
sonnets = replace(sonnets,punctuationCharacters," ");
sonnets = lower(sonnets);
Split the string into an array of words
words = split(sonnets);
Let's see how many words we have and how many of them are unique
numberOfWords = numel(words)
numberOfWords = 17712
numberOfUniqueWords = numel(unique(words))
numberOfUniqueWords = 3436
To count the number of occurrences of each word, we first create a suitable empty dictionary where the keys are strings and the values are doubles
d = dictionary(string.empty,double.empty);
Now iterate over all of the words. For each word, we check to see if it already exists in our dictionary. If it does, increase the value by one. If it doesn't, create a new entry with the value of 1.
for word = words'
if isKey(d,word) % If this word exists in the dictionary
d(word) = d(word) +1; % Increment the value associated with that word by 1.
else
d(word) = 1; % Initialise a new word in the dictionary with the value set to 1.
end
end
d
d =
Our dictionary has 3436 entries which is equal to the number of unique words we found earlier. This gives us some confidence that we are heading in the right direction. As another check, we could sum all of the values and see that they equal the total number of words we found earlier
numberOfWords=sum(values(d))
numberOfWords = 17712
It turns out that Mr Shakespeare talked about love a lot more than hate!
d("love")
ans = 160
d("hate")
ans = 13

### Next time

That's it for our first look at this new datatype in MATLAB. This is the first of three posts on the subject of dictionaries. In the next two posts I'll take a deep dive into the different types you can use in dictionaries, including user-defined classes and will also compare the new functionality with the venerable containers.map. Let me know what you think and enjoy playing with the new version of MATLAB.
|

### 댓글

댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.