Loren on the Art of MATLAB

Turn ideas into MATLAB

Stable Matching Problem and the Algorithm that Won a Nobel Prize

Many here in Massachusetts started social distancing about a month ago and we have no end in sight yet. If you live alone, maybe you are ready to match up with someone after you get through this hardship. Today's guest blogger, Toshi Takeuchi, has a perfect algorithm for you. I love that this was inspired by a problem that, at first glance, doesn't seem very technical or relevant. But it is!

Contents

It Began with a Facebook Post

One day I came across a Facebook post in one of those groups that had sprung up to help people cope with social distancing. It began:

"A coding challenge and a puzzle to solve! The Nobel prize for Economics (my subject) was recently awarded for work on matching markets, where you care not just about demand and supply in general but about precisely who gets what. Sound familiar? Well that’s a milonga, right?" (a milonga is an Argentine Tango dance event)

That's how I learned about the Stable Matching Problem (SMP). The original poster wanted people to apply the SMP to solve the problem of matching people for a partner dance. That sounded trivial, but the actual problem had many consequential applications.

And indeed the algorithm that is 100% guaranteed to find solutions to the SMP won the 2012 Nobel Prize in Economics.

Problem Definition

Given two sets of participants of equal size to match, such as medical students and residency programs, and with their fixed preferences, such as:

S1 = {R1 R2 R3}  R1 = {S1 S3 S2}
S2 = {R3 R1 R2}  R2 = {S1 S2 S3}
S3 = {R1 R3 R2}  R3 = {S3 S1 S2}

Can we find stable matches? Stability in this case means:

  • no one is left unmatched, and
  • no one can find a better match who is willing to switch

A solution may look like this:

(S1, R1), (S2, R2), (S3, R3)

This is stable because:

  • S1 is matched to R1, and they are the first choices for each other. Yay!
  • S2 prefers R3 to R2, but R3 is matched to S3, its first choice, and therefore R3 is not willing to switch
  • S3 prefers R1 to R3, but R1 is matched to S1, its first choice, and therefore R1 is not willing to switch

Now what is this algorithm that is guaranteed to generate solutions to the SMP?

Gale Shapley Algorithm

This remarkable algorithm is called the Gale-Shapley algorithm and it is surprisingly straight-forward.

  • One set of participants always proposes their matches to the other set of participants
  • Those who propose (proposers) list their most to least favorite
  • Those who get proposed to (acceptors) always accept the most favorable proposal if not currently matched
  • Matched acceptors can switch to a new suitor if the new suitor is preferable
  • The algorithm terminates when everyone is matched

The pseudocode from the Wikipedia page is as follows:

algorithm stable_matching is
    Initialize all m  M and w  W to free
    while ∃ free man m who still has a woman w to propose to do
        w := first woman on m's list to whom m has not yet proposed
        if w is free then
            (m, w) become engaged
        else some pair (m', w) already exists
            if w prefers m to m' then
                m' becomes free
                (m, w) become engaged
            else
                (m', w) remain engaged
            end if
        end if
    repeat

Here is the sample data from the Wikipedia page. There are 4 proposers and 4 acceptors. Rows represent either proposers or acceptors depending on the table, and columns their ranked preferences, where numbers are indices to their preferred matches.

prefProposers = [2,1,3,4;4,1,2,3;1,3,2,4;2,3,1,4]
prefAcceptors = [1,3,2,4;3,4,1,2;4,2,3,1;3,2,1,4]
wikiSolution = [1,1;2,4;3,3;4,2];
prefProposers =
     2     1     3     4
     4     1     2     3
     1     3     2     4
     2     3     1     4
prefAcceptors =
     1     3     2     4
     3     4     1     2
     4     2     3     1
     3     2     1     4

Unfortunately, it is not very intuitive to use numbers to walk through this example. To avoid a cliché, I tried to come up with a non-gender-based example, but I couldn't come up with a compelling one. For example, Loren asked me to apply the SMP to matching people to their favorite food, but I can't imagine pizza or sushi rejecting me (not sure if I can handle that).

This is not very original but at least I can reverse the traditional gender roles in our little Jane Austen drama, and let girls ask out boys.

proposers = ["Charlotte","Jane","Lizzy","Lydia"];
acceptors = ["Bingley","Collins","Darcy","Wickham"];

The first row of prefProposers represents Charlottes' preferences, and her first choice is 2, or Collins. Her second is choice 1, or Bingley, etc.

prefCharlotte = prefProposers(proposers == "Charlotte",:)
prefCharlotte =
     2     1     3     4

The pseudocode is implemented literally as the function galeShapley1 (see the Functions section at the bottom), which sequentially iterates over proposers to find their matches. The routine inside the while loop is in the subfunction findMatch.

It uses the vector matched to keep track of engagements, which is initially all zeros. Rows 1 to 4 map to respective proposers. If all elements are zeros, you know all proposers are free.

matched = [0;0;0;0]
matched =
     0
     0
     0
     0

When Charlotte proposes to Collins and he accepts, the first element is replaced with 2, which maps to Collins. Now they are engaged. We continue this way until Lydia proposes to Collins, who is already engaged to Charlotte.

matched = [2;4;1;0]
matched =
     2
     4
     1
     0

Because Collins prefers Lydia to Charlotte, he switches, and Charlotte becomes free. We know this because the first element is zero.

matched = [0;4;1;2]
matched =
     0
     4
     1
     2

Now we tried to find a match for Charlotte. Since she was already rejected by her first choice, we go with her second, Bingley, who is engaged to Lizzy. Bingley likes Charlotte better. Now Lizzy becomes free.

matched = [1;4;0;2]
matched =
     1
     4
     0
     2

Lizzy's second choice is Darcy, and Darcy is free. Now they are engaged.

matched = [1;4;3;2]
matched =
     1
     4
     3
     2

Now everyone is engaged, and the algorithm terminates.

Let's run the MATLAB code.

stableMatch1 = galeShapley1(prefProposers,prefAcceptors);

And here is the outcome.

namedMatch = strings(size(stableMatch1,1),1);
namedMatch(:,1) = proposers';
for ii = 1:length(acceptors)
    namedMatch(stableMatch1(:,2) == ii,2) = acceptors(ii);
end
namedMatch
namedMatch = 
  4×2 string array
    "Charlotte"    "Bingley"
    "Jane"         "Wickham"
    "Lizzy"        "Darcy"  
    "Lydia"        "Collins"

And here is the result in numbers, which matches to the solution in the GIF animation from the Wikipedia page.

doesMatch = isequal(wikiSolution,stableMatch1)
doesMatch =
  logical
   1

Improved Implementation

We know that the sequential approach is not very efficient. In the first round, no one is rejected or matched yet. The acceptors are supposed to accept any first proposals. Therefore, we don't have to iterate over each proposer in this round. We could just copy their first choices.

matched = [2;4;1;2]
matched =
     2
     4
     1
     2

Then we need to check if any acceptors have received multiple proposals, such as 2 (Collins) in this example, and deal with them.

Function galeShapley2 implements this idea. From the second round on, we use the original routine. We can do this because in the Gale-Shapley algorithm, the order of execution doesn't affect the outcome, as you can see from the output below.

[stableMatch2,scores] = galeShapley2(prefProposers,prefAcceptors);

The output matches the previous result.

doesMatch = isequal(stableMatch1,stableMatch2)
doesMatch =
  logical
   1

How Happy Are They?

With galeShapley2, the second return variable scores shows how satisfied the participants are with the matches. It contains the rankings of their final matches. The first column is for proposers and the second column acceptors.

scores
scores =
     2     1
     1     2
     2     3
     1     2
  • Charlotte is matched to her #2 choice, Jane her #1, Lizzy her #2, and Lydia her #1.
  • Bingley his #1, Collins his #2, Darcy his #3, and Wickham his #2.

We can sum the rankings to see who did better. The smaller the score the better.

totalScores = sum(scores)
totalScores =
     6     8

It appears that the proposers got better matches than acceptors.

Reversing the Roles

Let's see what happens if we switch the proposers and acceptors.

[stableMatch3,scores] = galeShapley2(prefAcceptors,prefProposers);

And here is the outcome. You see that the outcome is not the same as before.

stableMatch3
stableMatch3 =
     1     1
     2     3
     3     4
     4     2

And here are the scores.

scores
scores =
     1     2
     1     1
     1     3
     2     2
  • Bingley is matched to his #1 choice, Collins his #1, Darcy his #1, and Wickham his #2.
  • Charlotte her #2, Jane her #1, Lizzy her #3, and Lydia her #2.
totalScores = sum(scores)
totalScores =
     5     8

Again, those who proposed got better matches than those who accepted.

Simulations Over 100 Trials

Under the Gale-Shapley algorithm, proposers generally do better than acceptors.

  • Proposers start out with their first choices and they can only go down from there
  • Acceptors can only improve the match through rejection, and they don't have control over how many proposals they get

Let's generate a dataset randomly using the function generateSMPdataset and plot the scores of proposers and acceptors. You can see that the scores of proposers are generally lower than those of acceptors and therefore they are getting the better matches.

iter = 100;
totalScores = zeros(iter,2);
for ii = 1:iter
    [prefProposers,prefAcceptors] = generateSMPdataset(4,4);
    [~, scores] = galeShapley2(prefProposers,prefAcceptors);
    totalScores(ii,:) = sum(scores);
end
figure
plot(1:iter,totalScores(:,1))
hold on
plot(1:iter,totalScores(:,2))
hold off
xlabel("Trials")
ylabel("Scores - the Lower the Better")
title(["Gale-Shapley Algorithm - Scores by Role";"The Lower the Score, the Better"])
legend("Proposers","Acceptors","Location","best")

It's Better to Be Proposers Than Acceptors

Naively, it looks better to be acceptors because they can say no and don't have to face humiliating rejections. But, in exchange for the pain of rejection, proposers get better choices and that leads to the better outcomes. It is also known that if acceptors reject more by lying about their preferences, then they can game the system in their favor.

Let this be your life lesson: initiate and risk rejections because that gives you better choices.

It is also very important to consider who will be the proposers when we design a system that affects who the system benefits the most.

  • In the case of NRMP, the medical students are the proposers. The system favors the medical students over the residency programs
  • NYC High School Choice program, the students are the proposers. The system favors the students over the high schools they are applying for
  • The edge servers are the proposers. The system favors the edge servers over the web requests

The system doesn't have to be set up this way, but they are what they are because they are designed to benefit one type of participants over the other.

Back to the Original Challenge

Now that we are familiar with the basic formulation of the SMP and the Gale-Shapley algorithm, let's go back to the challenge the original poster gave us.

In a milonga (dance event), we have 4 followers and 4 leaders.

namesF = ["Anna","Brigitte","Caroline","Dominique"];
namesL = ["Oliver","Thomas","Henri","Frank"];

Their rankings are as follows:

prefF = [1,2,3,4;1,3,2,4;2,4,1,3;3,1,4,2]
prefL = [4,3,2,1;3,4,1,2;2,3,4,1;3,1,4,2]
prefF =
     1     2     3     4
     1     3     2     4
     2     4     1     3
     3     1     4     2
prefL =
     4     3     2     1
     3     4     1     2
     2     3     4     1
     3     1     4     2
  • Only leaders can invite followers and followers cannot reject a leader if they are free
  • There is only time for 3 tandas during the milonga
  • A tanda is a set of 3 or 4 songs and you dance with the same partner during a tanda
  • In a new tanda, however, no one can dance with a previous partner

The puzzle

  1. Who does leader 1 (Oliver) dance with on the third tanda?
  2. Which dancers don't get to dance with their first choices during the milonga?

The Coding Challenge

  1. Write a program in your favorite language to find the GS outcome for a single tanda (Done: galeShapley2)
  2. Extend the program to find the solution for a milonga of t tandas. Use your program to obtain the solution to the puzzle above.

Here is the function milongaMatch to meet the coding challenge. The only new rule we need to deal with is that people are not allowed to dance with the same person again. Please note that this variant no longer 100% guarantees that everyone gets a partner (dancers, take notice). It stops when free proposers run out of acceptors to propose to.

tandas = milongaMatch(prefL,prefF,3);

Let's assign names to the output.

solution = strings(size(tandas));
for ii = 1:length(namesF)
    solution(tandas == ii) = namesF(ii);
end
solution = array2table(solution,"RowNames",namesL, ...
    "VariableNames","Tanda" + (1:size(tandas,2)))
solution =
  4×3 table
                Tanda1         Tanda2         Tanda3   
              ___________    ___________    ___________
    Oliver    "Dominique"    "Brigitte"     "Caroline" 
    Thomas    "Caroline"     "Anna"         "Brigitte" 
    Henri     "Brigitte"     "Dominique"    "Anna"     
    Frank     "Anna"         "Caroline"     "Dominique"

The answer to the puzzle

  1. Oliver (row 1) dances with Caroline (3) in the third tanda (column 3)
  2. Anna (1) doesn't get to dance with her first choice Oliver (row 1), but everyone else dances with their respective first choice at some point

One complaint I have with Gale-Shapley is that it requires the two types of participants be in equal numbers. When it comes to social dancing, that never happens - usually we get more followers than leaders. I came up with a variant flexibleGS that can handle when proposers and acceptors are not equal in numbers. Give it a try!

prefProposers = [2,1,3,4,5;4,1,2,3,5;1,3,2,4,5;2,3,1,4,5];
prefAcceptors = [1,3,2,4;3,4,1,2;4,2,3,1;3,2,1,4;1,3,2,4];
stableMatch4 = flexibleGS(prefProposers,prefAcceptors);

Summary

Do you like puzzles and coding challenges? Did you find the SMP interesting? Can you think of fun examples to apply the SMP to? I am still thinking of a food example. There are restaurants in Japan that reject first time customers, unless accompanied by a regular who can vouch for you that your taste buds are worthy of their cuisine. Is that a SMP though? Share your fun SMP ideas here.

Functions

galeShapley1

function matched = galeShapley1(prefP,prefA)
    % Initialize all proposers and acceptors to free
    matched = zeros(size(prefP,1),1);
    proposed = false(size(prefP,1));

    % while there exist a free proposer |p| ...
    while any(matched == 0)
        % find a match for |p|
        [matched,proposed] = findMatch(prefP,prefA,matched,proposed);
    end
    matched = [(1:size(prefP,1))',matched];
end

findMatch

function [matched,proposed] = findMatch(prefP,prefA,matched,proposed)
    p = find(matched == 0,1);
    % with |p| who still has acceptors to propose to
    acceptors = prefP(p,~proposed(p,:));

    for ii = 1:length(acceptors)
        % |a| is |p|'s highest ranked such acceptor to whom |p| has not yet
        % proposed
        a = acceptors(ii);
        proposed(p,prefP(p,:) == a) = true;

        % if |a| is free
        if ~any(matched == a)
            % (p, a) become engaged
            matched(p) = a;
            break
        else
            % else some pair (p', a) already exists
            p_ = find(matched == a);
            [ranking,~] = find(prefA(a,:) == [p;p_]);

            % if |a| prefers |p| to |p'|
            if  ranking(1) < ranking(2)
                % |p'| becomes free
                matched(p_) = 0;
                % (p, a) become engaged
                matched(p) = a;
                break
            else
                % else (p', a) remain engaged
            end
        end
    end
end

galeShapley2

function [matched,varargout] = galeShapley2(prefP,prefA)
    matched = zeros(size(prefP,1),1);
    proposed = false(size(prefP,1));

    while any(matched == 0)
        % This handles round 1
        if all(matched == 0)
            matched = prefP(:,1);
            proposed(:,1) = true;
            [acceptors,~,idx] = unique(matched);
            a = acceptors(accumarray(idx,1) > 1);
            for ii = 1:length(a)
                p = find(ismember(matched,a(ii)));
                matched(p) = 0;
                [ranking,~] = find(prefA(a(ii),:) == p);
                [~,idx] = min(ranking);
                matched(p(idx)) = a(ii);
            end
        else
            % round 2 and onwards
            [matched,proposed] = findMatch(prefP,prefA,matched,proposed);
        end
    end
    matched = [(1:size(prefP,1))',matched];

    % scoring the matching result
    scores = zeros(size(matched));
    for ii = 1:size(matched,1)
        p = matched(ii,1);
        a = matched(ii,2);
        scores(p,1) = find(prefP(p,:) == a);
        scores(a,2) = find(prefA(a,:) == p);
    end
    varargout{1} = scores;
end

generateSMPdataset

function [prefP,prefA] = generateSMPdataset(N,M)
    prefP = zeros(N,M);
    prefA = zeros(M,N);
    for ii = 1:N
        prefP(ii,:) = randperm(M);
    end
    for ii = 1:M
        prefA(ii,:) = randperm(N);
    end
end

milongaMatch

function tandas = milongaMatch(prefL,prefF,t)
    tandas = zeros(size(prefL,1),t);
    for ii = 1:t
        tanda = zeros(size(prefL,1),1);
        proposed = false(size(prefL,1));

        % make sure no one dances with a previous partner
        if ii > 1
            prev = tandas(:,1:ii-1);
            for jj = 1:size(prefL,1)
                proposed(jj,:) = ismember(prefL(jj,:),prev(jj,:));
            end
        end

        while any(tanda == 0)
            [tanda,proposed] = findMatch(prefL,prefF,tanda,proposed);
            % stop if free proposers run out of acceptors to propose to
            if all(all(proposed(tanda == 0,:),2))
                break
            end
        end
        tandas(:,ii) = tanda;
    end
end

flexibleGS

function matched = flexibleGS(prefP,prefA)

    N = size(prefP,1);
    M = size(prefA,1);
    if N < M
        matched = zeros(M,1);
    else
        matched = zeros(N,1);
    end
    proposed = false(N,M);

    % while we still have free proposers ...
    while any(matched(1:N) == 0)
        % find them matches
        [matched,proposed] = findMatch(prefP,prefA,matched,proposed);
        % but stop if they run out of acceptors to propose to
        if all(all(proposed(matched(1:N) == 0,:),2))
            break
        end
    end
    if N < M
        matched(matched == 0) = setdiff(1:M,matched);
        matched = [[(1:N)';0],matched];
    else
        matched = [(1:size(prefP,1))',matched];
    end
end




Published with MATLAB® R2020a

|

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.