{"id":2168,"date":"2017-01-31T10:22:25","date_gmt":"2017-01-31T15:22:25","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=2168"},"modified":"2017-07-28T05:58:56","modified_gmt":"2017-07-28T10:58:56","slug":"speed-dating-experiment","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2017\/01\/31\/speed-dating-experiment\/","title":{"rendered":"Speed Dating Experiment"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>Valentine's day is fast approaching and those who are in a relationship might start thinking about plans. For those who are not as lucky, read on! Today's guest blogger, Today's guest blogger, <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/951521\">Toshi Takeuchi<\/a>, explores how you can be successful at speed dating events through data.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/wedding_cake_topper.jpg\" alt=\"\"> <\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#0d73c8df-b21e-410c-96a9-ca5c36aef577\">Speed Dating Dataset<\/a><\/li><li><a href=\"#a5b8804e-6510-45b7-8414-23db5a26f5a9\">What Participants Looked For In the Opposite Sex<\/a><\/li><li><a href=\"#5a656325-cd14-47d5-9784-1a7689e6d310\">Were They Able to Find Matches?<\/a><\/li><li><a href=\"#129b2f8e-caca-4b19-90d7-f2bfa83900fc\">Do You Get More Matches If You Make More Requests?<\/a><\/li><li><a href=\"#0ce64244-a2b0-4be7-94a8-c942518fde4a\">Decision to Say Yes for Second Date<\/a><\/li><li><a href=\"#7a666d1d-a547-4b72-a2e3-18c3ae8a0850\">Factors Influencing the Decision<\/a><\/li><li><a href=\"#c030790f-d40a-47a4-ac98-f2dcadc65bf9\">Validatng the Model with the Holdout Set<\/a><\/li><li><a href=\"#b8b277d5-c2f4-4af7-b316-b0ea6f6004de\">Relative Attractiveness<\/a><\/li><li><a href=\"#c1e59d9f-5b2e-4df9-b690-38244d79d705\">Are We Good at Assessing Our Own Attractiveness?<\/a><\/li><li><a href=\"#ba792d47-8984-4318-85fb-f21c963663a8\">Attractiveness is in the Eye of the Beholder<\/a><\/li><li><a href=\"#94f843d2-ec0a-4f48-b711-745ca7d6df95\">Modesty is the Key to Success<\/a><\/li><li><a href=\"#df349717-e410-49c3-9efd-3336ceed2445\">Summary<\/a><\/li><\/ul><\/div><h4>Speed Dating Dataset<a name=\"0d73c8df-b21e-410c-96a9-ca5c36aef577\"><\/a><\/h4><p>I recently came across an interesting Kaggle dataset <a href=\"https:\/\/www.kaggle.com\/annavictoria\/speed-dating-experiment\">Speed Dating Experiment - What attributes influence the selection of a romantic partner?<\/a>. I never experienced speed dating, so I got curious.<\/p><p>The data comes from a series of heterosexual speed dating experiements at Columbia University from 2002-2004. In these experiments, you each met all of you opposite-sex participants for four minutes. The number of the first dates varied by the event - on average there were 15, but it could be as few as 5 or as many as 22. Then you were asked if you would like to meet any of them again. You also provided ratings on six attributes about your dates:<\/p><div><ul><li>Attractiveness<\/li><li>Sincerity<\/li><li>Intelligence<\/li><li>Fun<\/li><li>Ambition<\/li><li>Shared Interests<\/li><\/ul><\/div><p>The dataset also includes participants' preferences on those attributes at various point in the process, along with other demographic information.<\/p><pre class=\"codeinput\">opts = detectImportOptions(<span class=\"string\">'Speed Dating Data.csv'<\/span>);                <span class=\"comment\">% set import options<\/span>\r\nopts.VariableTypes([9,38,50:end]) = {<span class=\"string\">'double'<\/span>};                     <span class=\"comment\">% treat as double<\/span>\r\nopts.VariableTypes([35,49]) = {<span class=\"string\">'categorical'<\/span>};                      <span class=\"comment\">% treat as categorical<\/span>\r\ncsv = readtable(<span class=\"string\">'Speed Dating Data.csv'<\/span>, opts);                     <span class=\"comment\">% import data<\/span>\r\ncsv.tuition = str2double(csv.tuition);                              <span class=\"comment\">% convert to double<\/span>\r\ncsv.zipcode = str2double(csv.zipcode);                              <span class=\"comment\">% convert to double<\/span>\r\ncsv.income = str2double(csv.income);                                <span class=\"comment\">% convert to double<\/span>\r\n<\/pre><h4>What Participants Looked For In the Opposite Sex<a name=\"a5b8804e-6510-45b7-8414-23db5a26f5a9\"><\/a><\/h4><p>The participants filled out survey questions when they signed up, including what they were looking for in the opposite sex and what they thought others of their own gender looked for. If you take the mean ratings and then subtract the self-rating from the peer rating, you see that participants thought others were more into looks while they were also into sincerity and intelligence. Sounds a bit biased. We will see how they actually made decisions.<\/p><pre class=\"codeinput\">vars = csv.Properties.VariableNames;                                <span class=\"comment\">% get var names<\/span>\r\n[G, res] = findgroups(csv(:,{<span class=\"string\">'iid'<\/span>,<span class=\"string\">'gender'<\/span>}));                     <span class=\"comment\">% group by id and gender<\/span>\r\n[~,idx,~] = unique(G);                                              <span class=\"comment\">% get unique indices<\/span>\r\npref = table2array(csv(idx,contains(vars,<span class=\"string\">'1_1'<\/span>)));                  <span class=\"comment\">% subset pref as array<\/span>\r\npref(isnan(pref)) = 0;                                              <span class=\"comment\">% replace NaN with 0<\/span>\r\npref1_1 = pref .\/ sum(pref,2) * 100;                                <span class=\"comment\">% convert to 100-pt alloc<\/span>\r\npref = table2array(csv(idx,contains(vars,<span class=\"string\">'4_1'<\/span>)));                  <span class=\"comment\">% subset pref as array<\/span>\r\npref(isnan(pref)) = 0;                                              <span class=\"comment\">% replace NaN with 0<\/span>\r\npref4_1 = pref .\/ sum(pref,2) * 100;                                <span class=\"comment\">% convert to 100-pt alloc<\/span>\r\nlabels = {<span class=\"string\">'attr'<\/span>,<span class=\"string\">'sinc'<\/span>,<span class=\"string\">'intel'<\/span>,<span class=\"string\">'fun'<\/span>,<span class=\"string\">'amb'<\/span>,<span class=\"string\">'shar'<\/span>};                <span class=\"comment\">% attributes<\/span>\r\nfigure                                                              <span class=\"comment\">% new figure<\/span>\r\nb = bar([mean(pref4_1(res.gender == 1,:),<span class=\"string\">'omitnan'<\/span>) - <span class=\"keyword\">...<\/span><span class=\"comment\">           % bar plot<\/span>\r\n    mean(pref1_1(res.gender == 1,:),<span class=\"string\">'omitnan'<\/span>); <span class=\"keyword\">...<\/span>\r\n    mean(pref4_1(res.gender == 0,:),<span class=\"string\">'omitnan'<\/span>) - <span class=\"keyword\">...<\/span>\r\n    mean(pref1_1(res.gender == 0,:),<span class=\"string\">'omitnan'<\/span>)]');\r\nb(1).FaceColor = [0 .45 .74]; b(2).FaceColor = [.85 .33 .1];        <span class=\"comment\">% change face color<\/span>\r\nb(1).FaceAlpha = 0.6; b(2).FaceAlpha = 0.6;                         <span class=\"comment\">% change face alpha<\/span>\r\ntitle(<span class=\"string\">'What Peers Look For More Than You Do in the Opposite Sex?'<\/span>)  <span class=\"comment\">% add title<\/span>\r\nxticklabels(labels)                                                 <span class=\"comment\">% label bars<\/span>\r\nylabel(<span class=\"string\">'Differences in Mean Ratings - Peers vs Self'<\/span>)               <span class=\"comment\">% y axis label<\/span>\r\nlegend(<span class=\"string\">'Men'<\/span>,<span class=\"string\">'Women'<\/span>)                                               <span class=\"comment\">% add legend<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/speeddating_01.png\" alt=\"\"> <h4>Were They Able to Find Matches?<a name=\"5a656325-cd14-47d5-9784-1a7689e6d310\"><\/a><\/h4><p>Let's first find out how successful those speed dating events were. If both you and your partner request another date after the first one, then you have a match. What percentage of initial dates resulted in matches?<\/p><div><ul><li>Most people found matches - the median match rate was around 13%<\/li><li>A few people were extremely successful, getting more than an 80% match rate<\/li><li>About 17-19% of men and women found no matches, unfortunately<\/li><li>All in all, it looks like these speed dating events delivered the promised results<\/li><\/ul><\/div><pre class=\"codeinput\">res.rounds = splitapply(@numel, G, G);                              <span class=\"comment\">% # initial dates<\/span>\r\nres.matched = splitapply(@sum, csv.match, G);                       <span class=\"comment\">% # matches<\/span>\r\nres.matched_r = res.matched .\/ res.rounds;                          <span class=\"comment\">% match rate<\/span>\r\nedges = [0 0.4 1:9].\/ 10;                                           <span class=\"comment\">% bin edges<\/span>\r\nfigure                                                              <span class=\"comment\">% new figure<\/span>\r\nhistogram(res.matched_r(res.gender == 1), edges,<span class=\"keyword\">...<\/span><span class=\"comment\">                 % histogram of<\/span>\r\n    <span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)                                  <span class=\"comment\">% male match rate<\/span>\r\nhold <span class=\"string\">on<\/span>                                                             <span class=\"comment\">% overlay another plot<\/span>\r\nhistogram(res.matched_r(res.gender == 0), edges,<span class=\"keyword\">...<\/span><span class=\"comment\">                 % histogram of<\/span>\r\n    <span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)                                  <span class=\"comment\">% female match rate<\/span>\r\nhold <span class=\"string\">off<\/span>                                                            <span class=\"comment\">% stop overlay<\/span>\r\ntitle(<span class=\"string\">'What Percentage of the First Dates Resulted in Matches?'<\/span>)    <span class=\"comment\">% add title<\/span>\r\nxlabel(sprintf(<span class=\"string\">'%% Matches (Median - Men %.1f%%, Women %.1f%%)'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">% x-axis label<\/span>\r\n    median(res.matched_r(res.gender == 1))*100, <span class=\"keyword\">...<\/span><span class=\"comment\">                 % median men<\/span>\r\n    median(res.matched_r(res.gender == 0))*100))                    <span class=\"comment\">% median women<\/span>\r\nxticklabels(string(0:10:90))                                        <span class=\"comment\">% use percentage<\/span>\r\nxlim([-0.05 0.95])                                                  <span class=\"comment\">% x-axis range<\/span>\r\nylabel(<span class=\"string\">'% Participants'<\/span>)                                            <span class=\"comment\">% y-axis label<\/span>\r\nyticklabels(string(0:5:30))                                         <span class=\"comment\">% use percentage<\/span>\r\nlegend(<span class=\"string\">'Men'<\/span>,<span class=\"string\">'Women'<\/span>)                                               <span class=\"comment\">% add legend<\/span>\r\ntext(-0.04,0.21,<span class=\"string\">'0 matches'<\/span>)                                        <span class=\"comment\">% annotate<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/speeddating_02.png\" alt=\"\"> <h4>Do You Get More Matches If You Make More Requests?<a name=\"129b2f8e-caca-4b19-90d7-f2bfa83900fc\"><\/a><\/h4><p>In order to make a match, you need to request another date and get that request accepted. This means some people who got very high match rate must have requested a second date with almost everyone they met and they got their favor returned. Does that mean people who made fewer matches were more picky and didn't request another date as often as those who were more successful? Let's plot the request rate vs. match rate - if they correlate, then we should see a diagonal line!<\/p><div><ul><li>You can see some correlation below a 50% request rate - particularly for women. The more requests they make, the more matches they seem to get, to a point<\/li><li>There is a clear gender gap in request rate - women tend to make fewer requests - the median for men is 44% vs. women for 37%<\/li><li>If you request everyone, your mileage varies - you may still get no matches. In the end, you only get matches if your requests are accepted<\/li><\/ul><\/div><pre class=\"codeinput\">res.requests = splitapply(@sum, csv.dec, G);                        <span class=\"comment\">% # requests<\/span>\r\nres.request_r = res.requests .\/ res.rounds;                         <span class=\"comment\">% request rate<\/span>\r\nfigure                                                              <span class=\"comment\">% new figure<\/span>\r\nsubplot(2,1,1)                                                      <span class=\"comment\">% add subplot<\/span>\r\nscatter(res.request_r(res.gender == 1), <span class=\"keyword\">...<\/span><span class=\"comment\">                         % scatter plot male<\/span>\r\n    res.matched_r(res.gender == 1),<span class=\"string\">'filled'<\/span>,<span class=\"string\">'MarkerFaceAlpha'<\/span>, 0.6)\r\nhold <span class=\"string\">on<\/span>                                                             <span class=\"comment\">% overlay another plot<\/span>\r\nscatter(res.request_r(res.gender == 0), <span class=\"keyword\">...<\/span><span class=\"comment\">                         % scatter plot female<\/span>\r\n    res.matched_r(res.gender == 0),<span class=\"string\">'filled'<\/span>,<span class=\"string\">'MarkerFaceAlpha'<\/span>, 0.6)\r\nr = refline(1,0); r.Color = <span class=\"string\">'r'<\/span>;r.LineStyle = <span class=\"string\">':'<\/span>;                  <span class=\"comment\">% reference line<\/span>\r\nhold <span class=\"string\">off<\/span>                                                            <span class=\"comment\">% stop overlay<\/span>\r\ntitle(<span class=\"string\">'Do You Get More Matches If You Ask More?'<\/span>)                   <span class=\"comment\">% add title<\/span>\r\nxlabel(<span class=\"string\">'% Second Date Requests'<\/span>)                                    <span class=\"comment\">% x-axis label<\/span>\r\nxticklabels(string(0:10:100))                                       <span class=\"comment\">% use percentage<\/span>\r\nylabel(<span class=\"string\">'% Matches'<\/span>)                                                 <span class=\"comment\">% y-axis label<\/span>\r\nyticklabels(string(0:50:100))                                       <span class=\"comment\">% use percentage<\/span>\r\nlegend(<span class=\"string\">'Men'<\/span>,<span class=\"string\">'Women'<\/span>,<span class=\"string\">'Location'<\/span>,<span class=\"string\">'NorthWest'<\/span>)                        <span class=\"comment\">% add legend<\/span>\r\nsubplot(2,1,2)                                                      <span class=\"comment\">% add subplot<\/span>\r\nhistogram(res.request_r(res.gender == 1),<span class=\"keyword\">...<\/span><span class=\"comment\">                        % histogram of<\/span>\r\n    <span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)                                  <span class=\"comment\">% male match rate<\/span>\r\nhold <span class=\"string\">on<\/span>                                                             <span class=\"comment\">% overlay another plot<\/span>\r\nhistogram(res.request_r(res.gender == 0),<span class=\"keyword\">...<\/span><span class=\"comment\">                        % histogram of<\/span>\r\n    <span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)                                  <span class=\"comment\">% female match rate<\/span>\r\nhold <span class=\"string\">off<\/span>                                                            <span class=\"comment\">% stop overlay<\/span>\r\ntitle(<span class=\"string\">'Do Women Make Fewer Requests Than Men?'<\/span>)                     <span class=\"comment\">% add title<\/span>\r\nxlabel(sprintf(<span class=\"string\">'%% Second Date Requests (Median - Men %.1f%%, Women %.1f%%)'<\/span>, <span class=\"keyword\">...<\/span>\r\n    median(res.request_r(res.gender == 1))*100, <span class=\"keyword\">...<\/span><span class=\"comment\">                 % median men<\/span>\r\n    median(res.request_r(res.gender == 0))*100))                    <span class=\"comment\">% median women<\/span>\r\nxticklabels(string(0:10:100))                                       <span class=\"comment\">% use percentage<\/span>\r\nylabel(<span class=\"string\">'% Participants'<\/span>)                                            <span class=\"comment\">% y-axis label<\/span>\r\nyticklabels(string(0:5:20))                                         <span class=\"comment\">% use percentage<\/span>\r\nlegend(<span class=\"string\">'Men'<\/span>,<span class=\"string\">'Women'<\/span>)                                               <span class=\"comment\">% add legend<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/speeddating_03.png\" alt=\"\"> <h4>Decision to Say Yes for Second Date<a name=\"0ce64244-a2b0-4be7-94a8-c942518fde4a\"><\/a><\/h4><p>Most people are more selective about who they go out with for second date. The two most obvious factors in such a decision are how much they liked who they met and how likely they think they will get a \"yes\" from them. There is no point in asking a person for a second date no matter how much you like him or her when you feel there's no chance of getting a yes. You can see a fairly good correlation between Yes and No using those two factors.<\/p><p>Please note that I normalized the ratings for \"like\" and \"probability\" (of getting yes) because the actual rating scale differs from person to person. You subtract the mean from the data to center the data around zero and then divide by the standard deviation to normalize the value range.<\/p><pre class=\"codeinput\">func = @(x) {(x - mean(x)).\/std(x)};                                <span class=\"comment\">% normalize func<\/span>\r\nf = csv.like;                                                       <span class=\"comment\">% get like<\/span>\r\nf(isnan(f)) = 0;                                                    <span class=\"comment\">% replace NaN with 0<\/span>\r\nf = splitapply(func, f, G);                                         <span class=\"comment\">% normalize by group<\/span>\r\nlike = vertcat(f{:});                                               <span class=\"comment\">% add normalized like<\/span>\r\nf = csv.prob;                                                       <span class=\"comment\">% get prob<\/span>\r\nf(isnan(f)) = 0;                                                    <span class=\"comment\">% replace NaN with 0<\/span>\r\nf = splitapply(func, f, G);                                         <span class=\"comment\">% normalize by group<\/span>\r\nprob = vertcat(f{:});                                               <span class=\"comment\">% add normalized prob<\/span>\r\nfigure                                                              <span class=\"comment\">% new figure<\/span>\r\nsubplot(1,2,1)                                                      <span class=\"comment\">% add subplot<\/span>\r\ngscatter(like(csv.gender == 1), <span class=\"keyword\">...<\/span><span class=\"comment\">                                 % scatter plot male<\/span>\r\n    prob(csv.gender == 1),csv.dec(csv.gender == 1),<span class=\"string\">'rb'<\/span>,<span class=\"string\">'o'<\/span>)\r\ntitle(<span class=\"string\">'Second Date Decision - Men'<\/span>)                                 <span class=\"comment\">% add title<\/span>\r\nxlabel(<span class=\"string\">'Like, normalized'<\/span>)                                          <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">'Prob of Get \"Yes\", normalized'<\/span>)                             <span class=\"comment\">% y-axis label<\/span>\r\nylim([-5 5])                                                        <span class=\"comment\">% y-axis range<\/span>\r\nlegend(<span class=\"string\">'No'<\/span>,<span class=\"string\">'Yes'<\/span>)                                                  <span class=\"comment\">% add legend<\/span>\r\nsubplot(1,2,2)                                                      <span class=\"comment\">% add subplot<\/span>\r\ngscatter(like(csv.gender == 0), <span class=\"keyword\">...<\/span><span class=\"comment\">                                 % scatter plot female<\/span>\r\n    prob(csv.gender == 0),csv.dec(csv.gender == 0),<span class=\"string\">'rb'<\/span>,<span class=\"string\">'o'<\/span>)\r\ntitle(<span class=\"string\">'Second Date Decision - Women'<\/span>)                               <span class=\"comment\">% add title<\/span>\r\nxlabel(<span class=\"string\">'Like, normalized'<\/span>)                                          <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">'Prob of Get \"Yes\", normalized'<\/span>)                             <span class=\"comment\">% y-axis label<\/span>\r\nylim([-5 5])                                                        <span class=\"comment\">% y-axis range<\/span>\r\nlegend(<span class=\"string\">'No'<\/span>,<span class=\"string\">'Yes'<\/span>)                                                  <span class=\"comment\">% add legend<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/speeddating_04.png\" alt=\"\"> <h4>Factors Influencing the Decision<a name=\"7a666d1d-a547-4b72-a2e3-18c3ae8a0850\"><\/a><\/h4><p>If you can correctly guess the probability of getting a yes, that should help a lot. Can we make such a prediction using observable and discoverable factors only?<\/p><p>Features were generated using normalization and other techniques - see <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/feature_eng.m\">feature_eng.m<\/a><\/tt> for more details. To determine which features matter more, the resulting feature set was then split into training and holdout sets and the training set was used to generate a Random Forest model <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/bt_all.mat\">bt_all.mat<\/a><\/tt> using the <a href=\"https:\/\/www.mathworks.com\/help\/stats\/classificationlearner-app.html\">Classification Learner app<\/a>.  What's nice about a <a href=\"https:\/\/www.mathworks.com\/help\/stats\/treebagger.html\">Random Forest<\/a> is that it can show you the <a href=\"https:\/\/www.mathworks.com\/help\/stats\/classificationbaggedensemble.oobpermutedpredictorimportance.html\">predictor importance estimates<\/a> based on how error increases if you randomly change the value of particular predictors. If they don't matter, it shouldn't increase the error rate much, right?<\/p><p>Based on those scores, the most important features are:<\/p><div><ul><li>attrgap - difference between partners' attractiveness rated by participants and their own self rating<\/li><li>attr - rating of attractiveness participants gave to their partners<\/li><li>shar - rating of shared interests participants gave to their partners<\/li><li>fun - rating of fun participants gave to their partners<\/li><li>field_cd - participants' field of study<\/li><\/ul><\/div><p>The least important features are:<\/p><div><ul><li>agegap - difference between the ages of participants and their partners<\/li><li>order - in which part of the event they first met - earlier or later<\/li><li>samerace - whether participants and their partners were the same race<\/li><li>reading - participant's rating of interest in reading<\/li><li>race_o - the race of the partners<\/li><\/ul><\/div><pre class=\"codeinput\">feature_eng                                                         <span class=\"comment\">% engineer features<\/span>\r\nload <span class=\"string\">bt_all<\/span>                                                         <span class=\"comment\">% load trained model<\/span>\r\nimp = oobPermutedPredictorImportance(bt_all.ClassificationEnsemble);<span class=\"comment\">% get predictor importance<\/span>\r\nvars = bt_all.ClassificationEnsemble.PredictorNames;                <span class=\"comment\">% predictor names<\/span>\r\nfigure                                                              <span class=\"comment\">% new figure<\/span>\r\nsubplot(2,1,1)                                                      <span class=\"comment\">% add subplot<\/span>\r\n[~,rank] = sort(imp,<span class=\"string\">'descend'<\/span>);                                     <span class=\"comment\">% get ranking<\/span>\r\nbar(imp(rank(1:10)));                                               <span class=\"comment\">% plot top 10<\/span>\r\ntitle(<span class=\"string\">'Out-of-Bag Permuted Predictor Importance Estimates'<\/span>);        <span class=\"comment\">% add title<\/span>\r\nylabel(<span class=\"string\">'Estimates'<\/span>);                                                <span class=\"comment\">% y-axis label<\/span>\r\nxlabel(<span class=\"string\">'Top 10 Predictors'<\/span>);                                        <span class=\"comment\">% x-axis label<\/span>\r\nxticks(1:20);                                                       <span class=\"comment\">% set x-axis ticks<\/span>\r\nxticklabels(vars(rank(1:10)))                                       <span class=\"comment\">% label bars<\/span>\r\nxtickangle(45)                                                      <span class=\"comment\">% rotate labels<\/span>\r\nax = gca;                                                           <span class=\"comment\">% get current axes<\/span>\r\nax.TickLabelInterpreter = <span class=\"string\">'none'<\/span>;                                   <span class=\"comment\">% turn off latex<\/span>\r\nsubplot(2,1,2)                                                      <span class=\"comment\">% add subplot<\/span>\r\n[~,rank] = sort(imp);                                               <span class=\"comment\">% get ranking<\/span>\r\nbar(imp(rank(1:10)));                                               <span class=\"comment\">% plot bottom 10<\/span>\r\ntitle(<span class=\"string\">'Out-of-Bag Permuted Predictor Importance Estimates'<\/span>);        <span class=\"comment\">% add title<\/span>\r\nylabel(<span class=\"string\">'Estimates'<\/span>);                                                <span class=\"comment\">% y-axis label<\/span>\r\nxlabel(<span class=\"string\">'Bottom 10 Predictors'<\/span>);                                     <span class=\"comment\">% x-axis label<\/span>\r\nxticks(1:10);                                                       <span class=\"comment\">% set x-axis ticks<\/span>\r\nxticklabels(vars(rank(1:10)))                                       <span class=\"comment\">% label bars<\/span>\r\nxtickangle(45)                                                      <span class=\"comment\">% rotate labels<\/span>\r\nax = gca;                                                           <span class=\"comment\">% get current axes<\/span>\r\nax.TickLabelInterpreter = <span class=\"string\">'none'<\/span>;                                   <span class=\"comment\">% turn off latex<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/speeddating_05.png\" alt=\"\"> <h4>Validatng the Model with the Holdout Set<a name=\"c030790f-d40a-47a4-ac98-f2dcadc65bf9\"><\/a><\/h4><p>To be confident about the predictor values, let's check its predictive perfomance. The model was retained without the bottom 2 predictors - see <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/bt_45.mat\">bt_45.mat<\/a><\/tt> and the resulting model can predict the decision of participants to 79.6% accuracy. This looks a lot better than human participants did.<\/p><pre class=\"codeinput\">load <span class=\"string\">bt_45<\/span>                                                          <span class=\"comment\">% load trained model<\/span>\r\nY = holdout.dec;                                                    <span class=\"comment\">% ground truth<\/span>\r\nX = holdout(:,1:end-1);                                             <span class=\"comment\">% predictors<\/span>\r\nYpred = bt_45.predictFcn(X);                                        <span class=\"comment\">% prediction<\/span>\r\nc = confusionmat(Y,Ypred);                                          <span class=\"comment\">% get confusion matrix<\/span>\r\ndisp(array2table(c, <span class=\"keyword\">...<\/span><span class=\"comment\">                                             % show the matrix as table<\/span>\r\n    <span class=\"string\">'VariableNames'<\/span>,{<span class=\"string\">'Predicted_No'<\/span>,<span class=\"string\">'Predicted_Yes'<\/span>}, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'RowNames'<\/span>,{<span class=\"string\">'Actual_No'<\/span>,<span class=\"string\">'Actual_Yes'<\/span>}));\r\naccuracy = sum(c(logical(eye(2))))\/sum(sum(c))                      <span class=\"comment\">% classification accuracy<\/span>\r\n<\/pre><pre class=\"codeoutput\">                  Predicted_No    Predicted_Yes\r\n                  ____________    _____________\r\n    Actual_No     420              66          \r\n    Actual_Yes    105             246          \r\naccuracy =\r\n       0.7957\r\n<\/pre><h4>Relative Attractiveness<a name=\"b8b277d5-c2f4-4af7-b316-b0ea6f6004de\"><\/a><\/h4><p>We saw that relative attractiveness was a major factor in the decision to say yes. <tt>attrgap<\/tt> scores indicate how much more attractive the partners were relative to the participants. As you can see, people tend to say yes when their partners are more attractive than themselves regardless of gender.<\/p><div><ul><li>This is a dilemma, because if you say yes to people who are more attractive than you are, they are more likely to say no because you are less attractive.<\/li><li>But if you have some redeeming quality, such as having more shared interests or being fun, then you may be able to get yes from more attractive partners<\/li><li>This applies to both genders.  Is it just me, or does it look like men may be more willing to say yes to less attractive partners while women tends to more more receptive to fun partners? Loren isn't sure - she thinks it's just me.<\/li><\/ul><\/div><pre class=\"codeinput\">figure                                                              <span class=\"comment\">% new figure<\/span>\r\nsubplot(1,2,1)                                                      <span class=\"comment\">% add subplot<\/span>\r\ngscatter(train.fun(train.gender == <span class=\"string\">'1'<\/span>), <span class=\"keyword\">...<\/span><span class=\"comment\">                        % scatter male<\/span>\r\n    train.attrgap(train.gender == <span class=\"string\">'1'<\/span>),train.dec(train.gender == <span class=\"string\">'1'<\/span>),<span class=\"string\">'rb'<\/span>,<span class=\"string\">'o'<\/span>)\r\ntitle(<span class=\"string\">'Second Date Decision - Men'<\/span>)                                 <span class=\"comment\">% add title<\/span>\r\nxlabel(<span class=\"string\">'Partner''s Fun Rating'<\/span>)                                     <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">'Partner''s Relative Attractivenss'<\/span>)                         <span class=\"comment\">% y-axis label<\/span>\r\nylim([-4 4])                                                        <span class=\"comment\">% y-axis range<\/span>\r\nlegend(<span class=\"string\">'No'<\/span>,<span class=\"string\">'Yes'<\/span>)                                                  <span class=\"comment\">% add legend<\/span>\r\nsubplot(1,2,2)                                                      <span class=\"comment\">% add subplot<\/span>\r\ngscatter(train.fun(train.gender == <span class=\"string\">'0'<\/span>), <span class=\"keyword\">...<\/span><span class=\"comment\">                        % scatter female<\/span>\r\n    train.attrgap(train.gender == <span class=\"string\">'0'<\/span>),train.dec(train.gender == <span class=\"string\">'0'<\/span>),<span class=\"string\">'rb'<\/span>,<span class=\"string\">'o'<\/span>)\r\ntitle(<span class=\"string\">'Second Date Decision - Women'<\/span>)                               <span class=\"comment\">% add title<\/span>\r\nxlabel(<span class=\"string\">'Partner''s Fun Rating'<\/span>)                                     <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">'Partner''s Relative Attractivenss'<\/span>)                         <span class=\"comment\">% y-axis label<\/span>\r\nylim([-4 4])                                                        <span class=\"comment\">% y-axis range<\/span>\r\nlegend(<span class=\"string\">'No'<\/span>,<span class=\"string\">'Yes'<\/span>)                                                  <span class=\"comment\">% add legend<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/speeddating_06.png\" alt=\"\"> <h4>Are We Good at Assessing Our Own Attractiveness?<a name=\"c1e59d9f-5b2e-4df9-b690-38244d79d705\"><\/a><\/h4><p>If relative attractiveness is one of the key factors in our decision to say \"yes\", how good are we at assessing our own attractiveness? Let's compare the self-rating for attractivess to the average ratings participants received. If you subtract the average ratings received from the self rating, you can see how much people overestimate their attractiveness.<\/p><div><ul><li>We are generally overestimating our own attractiveness - the median is almost as as much 1 point higher out of 1-10 scale<\/li><li>Men tend to overestimate more than women<\/li><li>If you overestimate, then you are more likely to be overconfident about the probability you will get \"yes\" answers<\/li><\/ul><\/div><pre class=\"codeinput\">res.attr_mu = splitapply(@(x) mean(x,<span class=\"string\">'omitnan'<\/span>), csv.attr_o, G);    <span class=\"comment\">% mean attr ratings<\/span>\r\n[~,idx,~] = unique(G);                                              <span class=\"comment\">% get unique indices<\/span>\r\nres.attr3_1 = csv.attr3_1(idx);                                     <span class=\"comment\">% remove duplicates<\/span>\r\nres.atgap = res.attr3_1 - res.attr_mu;                              <span class=\"comment\">% add the rating gaps<\/span>\r\nfigure                                                              <span class=\"comment\">% new figure<\/span>\r\nhistogram(res.atgap(res.gender == 1),<span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>) <span class=\"comment\">% histpgram male<\/span>\r\nhold <span class=\"string\">on<\/span>                                                             <span class=\"comment\">% overlay another plot<\/span>\r\nhistogram(res.atgap(res.gender == 0),<span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>) <span class=\"comment\">% histpgram female<\/span>\r\nhold <span class=\"string\">off<\/span>                                                            <span class=\"comment\">% stop overlay<\/span>\r\ntitle(<span class=\"string\">'How Much People Overestimate Their Attractiveness'<\/span>)         <span class=\"comment\">% add title<\/span>\r\nxlabel([<span class=\"string\">'Rating Differences '<\/span> <span class=\"keyword\">...<\/span><span class=\"comment\">                                   % x-axis label<\/span>\r\n    sprintf(<span class=\"string\">'(Median - Men %.2f, Women %.2f)'<\/span>, <span class=\"keyword\">...<\/span>\r\n    median(res.atgap(res.gender == 1),<span class=\"string\">'omitnan'<\/span>), <span class=\"keyword\">...<\/span>\r\n    median(res.atgap(res.gender == 0),<span class=\"string\">'omitnan'<\/span>))])\r\nylabel(<span class=\"string\">'% Participants'<\/span>)                                            <span class=\"comment\">% y-axis label<\/span>\r\nlegend(<span class=\"string\">'Men'<\/span>,<span class=\"string\">'Women'<\/span>)                                               <span class=\"comment\">% add legend<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/speeddating_07.png\" alt=\"\"> <h4>Attractiveness is in the Eye of the Beholder<a name=\"ba792d47-8984-4318-85fb-f21c963663a8\"><\/a><\/h4><p>One possible reason we are not so good at judging our own attractiveness is that for majority of people it it in the eye of the beholder. If you plot to standard deviations of ratings people received, the spread is pretty wide, especially for men.<\/p><pre class=\"codeinput\">figure\r\nres.attr_sigma = splitapply(@(x) std(x,<span class=\"string\">'omitnan'<\/span>), csv.attr_o, G);  <span class=\"comment\">% sigma of attr ratings<\/span>\r\nhistogram(res.attr_sigma(res.gender == 1), <span class=\"keyword\">...<\/span><span class=\"comment\">                      % histpgram male<\/span>\r\n    <span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)\r\nhold <span class=\"string\">on<\/span>                                                             <span class=\"comment\">% overlay another plot<\/span>\r\nhistogram(res.attr_sigma(res.gender == 0), <span class=\"keyword\">...<\/span><span class=\"comment\">                      % histpgram female<\/span>\r\n    <span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)\r\nhold <span class=\"string\">off<\/span>                                                            <span class=\"comment\">% stop overlay<\/span>\r\ntitle(<span class=\"string\">'Distribution of Attractiveness Ratings Received'<\/span>)            <span class=\"comment\">% add title<\/span>\r\nxlabel(<span class=\"string\">'Standard Deviations of Attractiveness Ratings Received'<\/span>)    <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">'% Participants'<\/span>)                                            <span class=\"comment\">% y-axis label<\/span>\r\nyticklabels(string(0:5:30))                                         <span class=\"comment\">% use percentage<\/span>\r\nlegend(<span class=\"string\">'Men'<\/span>,<span class=\"string\">'Women'<\/span>)                                               <span class=\"comment\">% add legend<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/speeddating_08.png\" alt=\"\"> <h4>Modesty is the Key to Success<a name=\"94f843d2-ec0a-4f48-b711-745ca7d6df95\"><\/a><\/h4><p>Given that people are not always good at assessing their own attractiveness, how does it affect the ultimate goal - getting matches? Let's focus on average looking people (people who rate themselves 6 or 7) only to level the field and see how the self assessment affects the outcome.<\/p><div><ul><li>People who estimated their attractiveness realistically, with the gap from the mean received rating less than 0.9, did reasonably well<\/li><li>People who overestimated performed the worst<\/li><li>People who underestimated performed the best<\/li><\/ul><\/div><pre class=\"codeinput\">is_realistic = abs(res.atgap) &lt; 0.9;                                <span class=\"comment\">% lealitics estimate<\/span>\r\nis_over = res.atgap &gt;= 0.9;                                         <span class=\"comment\">% overestimate<\/span>\r\nis_under = res.atgap &lt;= -0.9;                                       <span class=\"comment\">% underestimate<\/span>\r\nis_avg = ismember(res.attr3_1,6:7);                                 <span class=\"comment\">% avg looking group<\/span>\r\nfigure                                                              <span class=\"comment\">% new figure<\/span>\r\nscatter(res.attr_mu(is_avg &amp; is_over), <span class=\"keyword\">...<\/span><span class=\"comment\">                          % plot overestimate<\/span>\r\n    res.matched_r(is_avg &amp; is_over))\r\nhold <span class=\"string\">on<\/span>                                                             <span class=\"comment\">% overlay another plot<\/span>\r\nscatter(res.attr_mu(is_avg &amp; is_under), <span class=\"keyword\">...<\/span><span class=\"comment\">                         % plot underestimate<\/span>\r\n    res.matched_r(is_avg &amp; is_under))\r\nscatter(res.attr_mu(is_avg &amp; is_realistic), <span class=\"keyword\">...<\/span><span class=\"comment\">                     % plot lealitics<\/span>\r\n    res.matched_r(is_avg &amp; is_realistic))\r\nhold <span class=\"string\">off<\/span>                                                            <span class=\"comment\">% stop overlay<\/span>\r\ntitle(<span class=\"string\">'Match Rates among Self-Rated 6-7 in Attractiveness'<\/span>)         <span class=\"comment\">% add title<\/span>\r\nxlabel(<span class=\"string\">'Mean Attractiveness Ratings Recevided'<\/span>)                     <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">'% Matches'<\/span>)                                                 <span class=\"comment\">% y-axis label<\/span>\r\nlegend(<span class=\"string\">'Overestimators'<\/span>,<span class=\"string\">'Underestimators'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">.                     % add legend<\/span>\r\n    <span class=\"string\">'Realistic Estimators'<\/span>,<span class=\"string\">'Location'<\/span>,<span class=\"string\">'NorthWest'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/speeddating_09.png\" alt=\"\"> <h4>Summary<a name=\"df349717-e410-49c3-9efd-3336ceed2445\"><\/a><\/h4><p>It looks like you can get matches in speed dating as long as you set your expectations appropriately. Here are some of my suggestions.<\/p><div><ul><li>Relative attractiveness is more important than people admit, because you are not going to learn a lot about your partners in four minutes.<\/li><li>But who people find attractive varies a lot.<\/li><li>You can still do well if you have more shared interests or more fun - so use your four minutes wisely.<\/li><li>Be modest about your own looks and look for those are also modest about their looks - you will more likely get a match.<\/li><\/ul><\/div><p>We should also remember that the data comes from those who went to Columbia at that time - as indicated by such variables as the field of study - and therefore the findings may not generalize to other situations.<\/p><p>Of course I also totally lack practical experience in speed dating - if you do, please let us know what you think of this analysis compared to your own experience below.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_9776bb13b732402691b270a6c8e7ad5b() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='9776bb13b732402691b270a6c8e7ad5b ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 9776bb13b732402691b270a6c8e7ad5b';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2017 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_9776bb13b732402691b270a6c8e7ad5b()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2016b<br><\/p><\/div><!--\r\n9776bb13b732402691b270a6c8e7ad5b ##### SOURCE BEGIN #####\r\n%% Speed Dating Experiment\r\n% Valentine's day is fast approaching and those who are in a relationship\r\n% might start thinking about plans. For those who are not as lucky, read\r\n% on! Today's guest blogger, Today's guest blogger,\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/951521 Toshi\r\n% Takeuchi>, explores how you can be successful at speed dating events\r\n% through data.\r\n%\r\n% <<wedding_cake_topper.jpg>>\r\n\r\n%% Speed Dating Dataset\r\n% I recently came across an interesting Kaggle dataset\r\n% <https:\/\/www.kaggle.com\/annavictoria\/speed-dating-experiment Speed Dating\r\n% Experiment - What attributes influence the selection of a romantic\r\n% partner?>. I never experienced speed dating, so I got curious. \r\n%\r\n% The data comes from a series of heterosexual speed dating experiements at\r\n% Columbia University from 2002-2004. In these experiments, you each met\r\n% all of you opposite-sex participants for four minutes. The number of the\r\n% first dates varied by the event - on average there were 15, but it could\r\n% be as few as 5 or as many as 22. Then you were asked if you would like to\r\n% meet any of them again. You also provided ratings on six attributes about\r\n% your dates: \r\n%\r\n% * Attractiveness\r\n% * Sincerity\r\n% * Intelligence\r\n% * Fun\r\n% * Ambition\r\n% * Shared Interests \r\n%\r\n% The dataset also includes participants' preferences on those attributes\r\n% at various point in the process, along with other demographic\r\n% information.\r\n\r\nopts = detectImportOptions('Speed Dating Data.csv');                % set import options\r\nopts.VariableTypes([9,38,50:end]) = {'double'};                     % treat as double\r\nopts.VariableTypes([35,49]) = {'categorical'};                      % treat as categorical\r\ncsv = readtable('Speed Dating Data.csv', opts);                     % import data\r\ncsv.tuition = str2double(csv.tuition);                              % convert to double\r\ncsv.zipcode = str2double(csv.zipcode);                              % convert to double\r\ncsv.income = str2double(csv.income);                                % convert to double\r\n\r\n%% What Participants Looked For In the Opposite Sex\r\n% The participants filled out survey questions when they signed up,\r\n% including what they were looking for in the opposite sex and what they\r\n% thought others of their own gender looked for. If you take the mean\r\n% ratings and then subtract the self-rating from the peer rating, you see\r\n% that participants thought others were more into looks while they were\r\n% also into sincerity and intelligence. Sounds a bit biased. We will see\r\n% how they actually made decisions.\r\n\r\nvars = csv.Properties.VariableNames;                                % get var names\r\n[G, res] = findgroups(csv(:,{'iid','gender'}));                     % group by id and gender\r\n[~,idx,~] = unique(G);                                              % get unique indices\r\npref = table2array(csv(idx,contains(vars,'1_1')));                  % subset pref as array\r\npref(isnan(pref)) = 0;                                              % replace NaN with 0\r\npref1_1 = pref .\/ sum(pref,2) * 100;                                % convert to 100-pt alloc\r\npref = table2array(csv(idx,contains(vars,'4_1')));                  % subset pref as array\r\npref(isnan(pref)) = 0;                                              % replace NaN with 0\r\npref4_1 = pref .\/ sum(pref,2) * 100;                                % convert to 100-pt alloc\r\nlabels = {'attr','sinc','intel','fun','amb','shar'};                % attributes\r\nfigure                                                              % new figure\r\nb = bar([mean(pref4_1(res.gender == 1,:),'omitnan') - ...           % bar plot\r\n    mean(pref1_1(res.gender == 1,:),'omitnan'); ...\r\n    mean(pref4_1(res.gender == 0,:),'omitnan') - ...\r\n    mean(pref1_1(res.gender == 0,:),'omitnan')]');\r\nb(1).FaceColor = [0 .45 .74]; b(2).FaceColor = [.85 .33 .1];        % change face color\r\nb(1).FaceAlpha = 0.6; b(2).FaceAlpha = 0.6;                         % change face alpha\r\ntitle('What Peers Look For More Than You Do in the Opposite Sex?')  % add title\r\nxticklabels(labels)                                                 % label bars\r\nylabel('Differences in Mean Ratings - Peers vs Self')               % y axis label\r\nlegend('Men','Women')                                               % add legend\r\n\r\n%% Were They Able to Find Matches?\r\n% Let's first find out how successful those speed dating events were. If\r\n% both you and your partner request another date after the first one, then\r\n% you have a match. What percentage of initial dates resulted in matches?\r\n% \r\n% * Most people found matches - the median match rate was around 13%\r\n% * A few people were extremely successful, getting more than an 80% match\r\n% rate\r\n% * About 17-19% of men and women found no matches, unfortunately\r\n% * All in all, it looks like these speed dating events delivered the\r\n% promised results\r\n\r\nres.rounds = splitapply(@numel, G, G);                              % # initial dates\r\nres.matched = splitapply(@sum, csv.match, G);                       % # matches\r\nres.matched_r = res.matched .\/ res.rounds;                          % match rate\r\nedges = [0 0.4 1:9].\/ 10;                                           % bin edges\r\nfigure                                                              % new figure\r\nhistogram(res.matched_r(res.gender == 1), edges,...                 % histogram of \r\n    'Normalization','probability')                                  % male match rate\r\nhold on                                                             % overlay another plot\r\nhistogram(res.matched_r(res.gender == 0), edges,...                 % histogram of \r\n    'Normalization','probability')                                  % female match rate\r\nhold off                                                            % stop overlay\r\ntitle('What Percentage of the First Dates Resulted in Matches?')    % add title\r\nxlabel(sprintf('%% Matches (Median - Men %.1f%%, Women %.1f%%)', ...% x-axis label\r\n    median(res.matched_r(res.gender == 1))*100, ...                 % median men\r\n    median(res.matched_r(res.gender == 0))*100))                    % median women\r\nxticklabels(string(0:10:90))                                        % use percentage\r\nxlim([-0.05 0.95])                                                  % x-axis range\r\nylabel('% Participants')                                            % y-axis label\r\nyticklabels(string(0:5:30))                                         % use percentage\r\nlegend('Men','Women')                                               % add legend\r\ntext(-0.04,0.21,'0 matches')                                        % annotate\r\n\r\n%% Do You Get More Matches If You Make More Requests?\r\n% In order to make a match, you need to request another date and get that\r\n% request accepted. This means some people who got very high match rate\r\n% must have requested a second date with almost everyone they met and they\r\n% got their favor returned. Does that mean people who made fewer matches\r\n% were more picky and didn't request another date as often as those who\r\n% were more successful? Let's plot the request rate vs. match rate - if\r\n% they correlate, then we should see a diagonal line!\r\n% \r\n% * You can see some correlation below a 50% request rate - particularly\r\n% for women. The more requests they make, the more matches they seem to\r\n% get, to a point\r\n% * There is a clear gender gap in request rate - women tend to make fewer\r\n% requests - the median for men is 44% vs. women for 37%\r\n% * If you request everyone, your mileage varies - you may still get no\r\n% matches. In the end, you only get matches if your requests are accepted\r\n\r\nres.requests = splitapply(@sum, csv.dec, G);                        % # requests\r\nres.request_r = res.requests .\/ res.rounds;                         % request rate\r\nfigure                                                              % new figure\r\nsubplot(2,1,1)                                                      % add subplot\r\nscatter(res.request_r(res.gender == 1), ...                         % scatter plot male\r\n    res.matched_r(res.gender == 1),'filled','MarkerFaceAlpha', 0.6)\r\nhold on                                                             % overlay another plot\r\nscatter(res.request_r(res.gender == 0), ...                         % scatter plot female\r\n    res.matched_r(res.gender == 0),'filled','MarkerFaceAlpha', 0.6)\r\nr = refline(1,0); r.Color = 'r';r.LineStyle = ':';                  % reference line\r\nhold off                                                            % stop overlay\r\ntitle('Do You Get More Matches If You Ask More?')                   % add title\r\nxlabel('% Second Date Requests')                                    % x-axis label\r\nxticklabels(string(0:10:100))                                       % use percentage\r\nylabel('% Matches')                                                 % y-axis label\r\nyticklabels(string(0:50:100))                                       % use percentage\r\nlegend('Men','Women','Location','NorthWest')                        % add legend\r\nsubplot(2,1,2)                                                      % add subplot\r\nhistogram(res.request_r(res.gender == 1),...                        % histogram of \r\n    'Normalization','probability')                                  % male match rate\r\nhold on                                                             % overlay another plot\r\nhistogram(res.request_r(res.gender == 0),...                        % histogram of\r\n    'Normalization','probability')                                  % female match rate\r\nhold off                                                            % stop overlay\r\ntitle('Do Women Make Fewer Requests Than Men?')                     % add title\r\nxlabel(sprintf('%% Second Date Requests (Median - Men %.1f%%, Women %.1f%%)', ...\r\n    median(res.request_r(res.gender == 1))*100, ...                 % median men\r\n    median(res.request_r(res.gender == 0))*100))                    % median women\r\nxticklabels(string(0:10:100))                                       % use percentage\r\nylabel('% Participants')                                            % y-axis label\r\nyticklabels(string(0:5:20))                                         % use percentage\r\nlegend('Men','Women')                                               % add legend\r\n\r\n%% Decision to Say Yes for Second Date\r\n% Most people are more selective about who they go out with for second\r\n% date. The two most obvious factors in such a decision are how much they\r\n% liked who they met and how likely they think they will get a \"yes\" from\r\n% them. There is no point in asking a person for a second date no matter\r\n% how much you like him or her when you feel there's no chance of getting a\r\n% yes. You can see a fairly good correlation between Yes and No using those\r\n% two factors.\r\n% \r\n% Please note that I normalized the ratings for \"like\" and \"probability\"\r\n% (of getting yes) because the actual rating scale differs from person to\r\n% person. You subtract the mean from the data to center the data around\r\n% zero and then divide by the standard deviation to normalize the value\r\n% range.\r\n\r\nfunc = @(x) {(x - mean(x)).\/std(x)};                                % normalize func\r\nf = csv.like;                                                       % get like\r\nf(isnan(f)) = 0;                                                    % replace NaN with 0\r\nf = splitapply(func, f, G);                                         % normalize by group\r\nlike = vertcat(f{:});                                               % add normalized like\r\nf = csv.prob;                                                       % get prob\r\nf(isnan(f)) = 0;                                                    % replace NaN with 0\r\nf = splitapply(func, f, G);                                         % normalize by group\r\nprob = vertcat(f{:});                                               % add normalized prob\r\nfigure                                                              % new figure\r\nsubplot(1,2,1)                                                      % add subplot\r\ngscatter(like(csv.gender == 1), ...                                 % scatter plot male\r\n    prob(csv.gender == 1),csv.dec(csv.gender == 1),'rb','o')\r\ntitle('Second Date Decision - Men')                                 % add title\r\nxlabel('Like, normalized')                                          % x-axis label\r\nylabel('Prob of Get \"Yes\", normalized')                             % y-axis label\r\nylim([-5 5])                                                        % y-axis range\r\nlegend('No','Yes')                                                  % add legend\r\nsubplot(1,2,2)                                                      % add subplot\r\ngscatter(like(csv.gender == 0), ...                                 % scatter plot female\r\n    prob(csv.gender == 0),csv.dec(csv.gender == 0),'rb','o')\r\ntitle('Second Date Decision - Women')                               % add title\r\nxlabel('Like, normalized')                                          % x-axis label\r\nylabel('Prob of Get \"Yes\", normalized')                             % y-axis label\r\nylim([-5 5])                                                        % y-axis range\r\nlegend('No','Yes')                                                  % add legend\r\n\r\n%% Factors Influencing the Decision\r\n% If you can correctly guess the probability of getting a yes, that should\r\n% help a lot. Can we make such a prediction using observable and\r\n% discoverable factors only?\r\n% \r\n% Features were generated using normalization and other techniques - see\r\n% |<https:\/\/blogs.mathworks.com\/images\/loren\/2017\/feature_eng.m\r\n% feature_eng.m>| for more details. To determine which features matter\r\n% more, the resulting feature set was then split into training and holdout\r\n% sets and the training set was used to generate a Random Forest model\r\n% |<https:\/\/blogs.mathworks.com\/images\/loren\/2017\/bt_all.mat bt_all.mat>|\r\n% using the\r\n% <https:\/\/www.mathworks.com\/help\/stats\/classificationlearner-app.html\r\n% Classification Learner app>.  What's nice about a\r\n% <https:\/\/www.mathworks.com\/help\/stats\/treebagger.html Random Forest> is\r\n% that it can show you the\r\n% <https:\/\/www.mathworks.com\/help\/stats\/classificationbaggedensemble.oobpermutedpredictorimportance.html\r\n% predictor importance estimates> based on how error increases if you\r\n% randomly change the value of particular predictors. If they don't matter,\r\n% it shouldn't increase the error rate much, right?\r\n% \r\n% Based on those scores, the most important features are: \r\n% \r\n% * attrgap - difference between partners' attractiveness rated by\r\n% participants and their own self rating\r\n% * attr - rating of attractiveness participants gave to their partners\r\n% * shar - rating of shared interests participants gave to their partners\r\n% * fun - rating of fun participants gave to their partners\r\n% * field_cd - participants' field of study\r\n% \r\n% The least important features are:\r\n% \r\n% * agegap - difference between the ages of participants and their\r\n% partners\r\n% * order - in which part of the event they first met - earlier or later\r\n% * samerace - whether participants and their partners were the same race\r\n% * reading - participant's rating of interest in reading\r\n% * race_o - the race of the partners\r\n\r\nfeature_eng                                                         % engineer features\r\nload bt_all                                                         % load trained model\r\nimp = oobPermutedPredictorImportance(bt_all.ClassificationEnsemble);% get predictor importance\r\nvars = bt_all.ClassificationEnsemble.PredictorNames;                % predictor names\r\nfigure                                                              % new figure\r\nsubplot(2,1,1)                                                      % add subplot\r\n[~,rank] = sort(imp,'descend');                                     % get ranking\r\nbar(imp(rank(1:10)));                                               % plot top 10\r\ntitle('Out-of-Bag Permuted Predictor Importance Estimates');        % add title\r\nylabel('Estimates');                                                % y-axis label\r\nxlabel('Top 10 Predictors');                                        % x-axis label\r\nxticks(1:20);                                                       % set x-axis ticks\r\nxticklabels(vars(rank(1:10)))                                       % label bars\r\nxtickangle(45)                                                      % rotate labels\r\nax = gca;                                                           % get current axes\r\nax.TickLabelInterpreter = 'none';                                   % turn off latex\r\nsubplot(2,1,2)                                                      % add subplot\r\n[~,rank] = sort(imp);                                               % get ranking\r\nbar(imp(rank(1:10)));                                               % plot bottom 10\r\ntitle('Out-of-Bag Permuted Predictor Importance Estimates');        % add title\r\nylabel('Estimates');                                                % y-axis label\r\nxlabel('Bottom 10 Predictors');                                     % x-axis label\r\nxticks(1:10);                                                       % set x-axis ticks\r\nxticklabels(vars(rank(1:10)))                                       % label bars\r\nxtickangle(45)                                                      % rotate labels\r\nax = gca;                                                           % get current axes\r\nax.TickLabelInterpreter = 'none';                                   % turn off latex\r\n\r\n%% Validatng the Model with the Holdout Set\r\n% To be confident about the predictor values, let's check its predictive\r\n% perfomance. The model was retained without the bottom 2 predictors - see\r\n% |<https:\/\/blogs.mathworks.com\/images\/loren\/2017\/bt_45.mat bt_45.mat>| and\r\n% the resulting model can predict the decision of participants to 79.6%\r\n% accuracy. This looks a lot better than human participants did. \r\n\r\nload bt_45                                                          % load trained model\r\nY = holdout.dec;                                                    % ground truth\r\nX = holdout(:,1:end-1);                                             % predictors\r\nYpred = bt_45.predictFcn(X);                                        % prediction\r\nc = confusionmat(Y,Ypred);                                          % get confusion matrix\r\ndisp(array2table(c, ...                                             % show the matrix as table\r\n    'VariableNames',{'Predicted_No','Predicted_Yes'}, ...\r\n    'RowNames',{'Actual_No','Actual_Yes'}));\r\naccuracy = sum(c(logical(eye(2))))\/sum(sum(c))                      % classification accuracy\r\n\r\n%% Relative Attractiveness\r\n% We saw that relative attractiveness was a major factor in the decision to\r\n% say yes. |attrgap| scores indicate how much more attractive the partners\r\n% were relative to the participants. As you can see, people tend to say yes\r\n% when their partners are more attractive than themselves regardless of\r\n% gender. \r\n% \r\n% * This is a dilemma, because if you say yes to people who are more\r\n% attractive than you are, they are more likely to say no because you are\r\n% less attractive.\r\n% * But if you have some redeeming quality, such as having more shared\r\n% interests or being fun, then you may be able to get yes from more\r\n% attractive partners\r\n% * This applies to both genders.  Is it just me, or does it look like men\r\n% may be more willing to say yes to less attractive partners while women\r\n% tends to more more receptive to fun partners? Loren isn't sure - she\r\n% thinks it's just me.\r\n\r\nfigure                                                              % new figure\r\nsubplot(1,2,1)                                                      % add subplot\r\ngscatter(train.fun(train.gender == '1'), ...                        % scatter male\r\n    train.attrgap(train.gender == '1'),train.dec(train.gender == '1'),'rb','o')\r\ntitle('Second Date Decision - Men')                                 % add title\r\nxlabel('Partner''s Fun Rating')                                     % x-axis label\r\nylabel('Partner''s Relative Attractivenss')                         % y-axis label\r\nylim([-4 4])                                                        % y-axis range\r\nlegend('No','Yes')                                                  % add legend\r\nsubplot(1,2,2)                                                      % add subplot\r\ngscatter(train.fun(train.gender == '0'), ...                        % scatter female\r\n    train.attrgap(train.gender == '0'),train.dec(train.gender == '0'),'rb','o')\r\ntitle('Second Date Decision - Women')                               % add title\r\nxlabel('Partner''s Fun Rating')                                     % x-axis label\r\nylabel('Partner''s Relative Attractivenss')                         % y-axis label\r\nylim([-4 4])                                                        % y-axis range\r\nlegend('No','Yes')                                                  % add legend\r\n\r\n%% Are We Good at Assessing Our Own Attractiveness?\r\n% If relative attractiveness is one of the key factors in our decision to\r\n% say \"yes\", how good are we at assessing our own attractiveness? Let's\r\n% compare the self-rating for attractivess to the average ratings\r\n% participants received. If you subtract the average ratings received from\r\n% the self rating, you can see how much people overestimate their\r\n% attractiveness.\r\n% \r\n% * We are generally overestimating our own attractiveness - the median is\r\n% almost as as much 1 point higher out of 1-10 scale\r\n% * Men tend to overestimate more than women\r\n% * If you overestimate, then you are more likely to be overconfident about \r\n% the probability you will get \"yes\" answers\r\n\r\nres.attr_mu = splitapply(@(x) mean(x,'omitnan'), csv.attr_o, G);    % mean attr ratings\r\n[~,idx,~] = unique(G);                                              % get unique indices\r\nres.attr3_1 = csv.attr3_1(idx);                                     % remove duplicates\r\nres.atgap = res.attr3_1 - res.attr_mu;                              % add the rating gaps\r\nfigure                                                              % new figure\r\nhistogram(res.atgap(res.gender == 1),'Normalization','probability') % histpgram male\r\nhold on                                                             % overlay another plot\r\nhistogram(res.atgap(res.gender == 0),'Normalization','probability') % histpgram female\r\nhold off                                                            % stop overlay\r\ntitle('How Much People Overestimate Their Attractiveness')         % add title\r\nxlabel(['Rating Differences ' ...                                   % x-axis label\r\n    sprintf('(Median - Men %.2f, Women %.2f)', ...   \r\n    median(res.atgap(res.gender == 1),'omitnan'), ...\r\n    median(res.atgap(res.gender == 0),'omitnan'))])\r\nylabel('% Participants')                                            % y-axis label\r\nlegend('Men','Women')                                               % add legend\r\n\r\n%% Attractiveness is in the Eye of the Beholder\r\n% One possible reason we are not so good at judging our own attractiveness\r\n% is that for majority of people it it in the eye of the beholder. If you\r\n% plot to standard deviations of ratings people received, the spread is\r\n% pretty wide, especially for men. \r\n\r\nfigure\r\nres.attr_sigma = splitapply(@(x) std(x,'omitnan'), csv.attr_o, G);  % sigma of attr ratings\r\nhistogram(res.attr_sigma(res.gender == 1), ...                      % histpgram male\r\n    'Normalization','probability') \r\nhold on                                                             % overlay another plot\r\nhistogram(res.attr_sigma(res.gender == 0), ...                      % histpgram female\r\n    'Normalization','probability') \r\nhold off                                                            % stop overlay\r\ntitle('Distribution of Attractiveness Ratings Received')            % add title\r\nxlabel('Standard Deviations of Attractiveness Ratings Received')    % x-axis label\r\nylabel('% Participants')                                            % y-axis label\r\nyticklabels(string(0:5:30))                                         % use percentage\r\nlegend('Men','Women')                                               % add legend\r\n\r\n%% Modesty is the Key to Success\r\n% Given that people are not always good at assessing their own\r\n% attractiveness, how does it affect the ultimate goal - getting matches?\r\n% Let's focus on average looking people (people who rate themselves 6 or 7)\r\n% only to level the field and see how the self assessment affects the\r\n% outcome.\r\n% \r\n% * People who estimated their attractiveness realistically, with the gap\r\n% from the mean received rating less than 0.9, did reasonably well\r\n% * People who overestimated performed the worst\r\n% * People who underestimated performed the best\r\n\r\nis_realistic = abs(res.atgap) < 0.9;                                % lealitics estimate\r\nis_over = res.atgap >= 0.9;                                         % overestimate\r\nis_under = res.atgap <= -0.9;                                       % underestimate\r\nis_avg = ismember(res.attr3_1,6:7);                                 % avg looking group\r\nfigure                                                              % new figure\r\nscatter(res.attr_mu(is_avg & is_over), ...                          % plot overestimate\r\n    res.matched_r(is_avg & is_over))\r\nhold on                                                             % overlay another plot                                         \r\nscatter(res.attr_mu(is_avg & is_under), ...                         % plot underestimate\r\n    res.matched_r(is_avg & is_under))\r\nscatter(res.attr_mu(is_avg & is_realistic), ...                     % plot lealitics\r\n    res.matched_r(is_avg & is_realistic))\r\nhold off                                                            % stop overlay\r\ntitle('Match Rates among Self-Rated 6-7 in Attractiveness')         % add title\r\nxlabel('Mean Attractiveness Ratings Recevided')                     % x-axis label\r\nylabel('% Matches')                                                 % y-axis label\r\nlegend('Overestimators','Underestimators', ....                     % add legend\r\n    'Realistic Estimators','Location','NorthWest')  \r\n\r\n%% Summary\r\n% It looks like you can get matches in speed dating as long as you set your\r\n% expectations appropriately. Here are some of my suggestions. \r\n% \r\n% * Relative attractiveness is more important than people admit, because\r\n% you are not going to learn a lot about your partners in four minutes.\r\n% * But who people find attractive varies a lot.\r\n% * You can still do well if you have more shared interests or more fun\r\n% - so use your four minutes wisely. \r\n% * Be modest about your own looks and look for those are also modest about\r\n% their looks - you will more likely get a match.\r\n%\r\n% We should also remember that the data comes from those who went to\r\n% Columbia at that time - as indicated by such variables as the field of\r\n% study - and therefore the findings may not generalize to other\r\n% situations.\r\n%\r\n% Of course I also totally lack practical experience in speed dating - if\r\n% you do, please let us know what you think of this analysis compared to\r\n% your own experience below.\r\n\r\n##### SOURCE END ##### 9776bb13b732402691b270a6c8e7ad5b\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/speeddating_09.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>Valentine's day is fast approaching and those who are in a relationship might start thinking about plans. For those who are not as lucky, read on! Today's guest blogger, Today's guest blogger, <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/951521\">Toshi Takeuchi<\/a>, explores how you can be successful at speed dating events through data.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2017\/01\/31\/speed-dating-experiment\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[66,33,43,48],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2168"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=2168"}],"version-history":[{"count":6,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2168\/revisions"}],"predecessor-version":[{"id":2394,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2168\/revisions\/2394"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=2168"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=2168"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=2168"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}