{"id":1687,"date":"2016-06-27T08:13:29","date_gmt":"2016-06-27T13:13:29","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=1687"},"modified":"2016-06-09T14:16:52","modified_gmt":"2016-06-09T19:16:52","slug":"survey-reveals-diversity-in-the-learn-to-code-movement","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2016\/06\/27\/survey-reveals-diversity-in-the-learn-to-code-movement\/","title":{"rendered":"Survey Reveals Diversity in the &#8220;Learn to Code&#8221; Movement"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>Do you use any free \"learn to code\" website to teach yourself programming? You may already know how to program in MATLAB, but you may very well be learning other skills on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Massive_open_online_course\">MOOCs<\/a>.<\/p><p>Today's guest blogger, Toshi, analyzed a publicly available survey data to understand the demographic of self-taught coders.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/matlab-academy.png\" alt=\"\"> <\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#d93d69a7-bbe1-4d71-9199-04667a401710\">Load Data<\/a><\/li><li><a href=\"#bfddaa56-0b3a-42c8-baff-9dab7527b7b0\">Higher Female Representation Than Expected<\/a><\/li><li><a href=\"#3640d862-a498-4e79-a43e-dedde1a3f715\">Mostly Studying in Countries of Citizenship<\/a><\/li><li><a href=\"#c105553c-1bb2-4250-aa57-76b7a685c7e1\">Ethnically Diverse English Speakers in US<\/a><\/li><li><a href=\"#b7785136-83a9-47ef-9dda-a7ad564dea88\">Many Are Highly Educated and Already Employed in the US<\/a><\/li><li><a href=\"#998ba97e-a120-43d0-baa2-0be34c02ebc6\">Many Already Work In Software Development and IT in US<\/a><\/li><li><a href=\"#af8d347c-f184-4f80-83b2-8fcd766c6f61\">Academic Background in Software Development and IT<\/a><\/li><li><a href=\"#76c5f36d-ef89-4bd4-8657-f6503ceecb58\">Wide Income Gap in Software Development and IT<\/a><\/li><li><a href=\"#bf0928fd-56a1-4459-ace9-1bae375a3d27\">What Affects Income in Software Development and IT?<\/a><\/li><li><a href=\"#00fb1938-9418-402a-8b9a-51b521886462\">Age Factor<\/a><\/li><li><a href=\"#9092c558-679e-46ac-8390-b6a59f69d33b\">Big Companies Not Preferred<\/a><\/li><li><a href=\"#d52a48e9-e8e9-4a37-8a76-64a56d0135b2\">Dream Jobs<\/a><\/li><li><a href=\"#418470eb-3592-4984-8457-93fff2abade7\">Student Loan Debt<\/a><\/li><li><a href=\"#315af204-2474-485b-b03b-20c27ca8464d\">Women Prefer More Welcoming Venues<\/a><\/li><li><a href=\"#4badc247-2ff1-4b5a-8e62-c9b79ef12adc\">Summary<\/a><\/li><\/ul><\/div><h4>Load Data<a name=\"d93d69a7-bbe1-4d71-9199-04667a401710\"><\/a><\/h4><p>I came across a Techcrunch article, <a href=\"http:\/\/techcrunch.com\/2016\/05\/04\/free-code-camp-survey-reveals-demographics-of-self-taught-coders\/\">Free Code Camp survey reveals demographics of self-taught coders<\/a>, and I got curious because a lot of people seem to interested in learning how to code, and industry and government are also encouraging this trend. But programming is hard. Who exactly are the kind of people who have taken the plunge? Our own free interactive online programming classes on <a href=\"https:\/\/matlabacademy.mathworks.com\/\">MATLAB Academy<\/a> or gamified <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/about\/cody\/\">MATLAB Cody<\/a> are also gaining popularity and I would like to understand what motivates this interest.<\/p><p>The survey was conducted anonymously and published on the web and promoted via social media from March 28 through May 2, 2016, targeting people who are relatively new to programming.<\/p><p>The following analysis shows significant diversity in gender and ethnic mix among self-taught coders and the possible impact of MOOCs in opening up access for under-served populations by traditional STEM education paths.<\/p><p>I first downloaded the  <a href=\"https:\/\/github.com\/FreeCodeCamp\/2016-new-coder-survey\">2016 New Coder Survey result from Github<\/a>. I then unzipped the CSV files into my current folder. There are two files - part 1 and part 2 - and we will read them into separate tables. We could perhaps merge them using <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/innerjoin.html\">innerjoin<\/a><\/tt>, but in this case I am primarily interested in part 2 only and we will be discarding at least 1000 responses from part 1, given the differences in number of responses.<\/p><pre class=\"codeinput\">warning(<span class=\"string\">'off'<\/span>,<span class=\"string\">'MATLAB:table:ModifiedVarnames'<\/span>)              <span class=\"comment\">% suppress warning<\/span>\r\ncsv = <span class=\"string\">'2016 New Coders Survey Part 1.csv'<\/span>;                  <span class=\"comment\">% filename<\/span>\r\npart1 = readtable(csv);                                     <span class=\"comment\">% read into table<\/span>\r\npart1.Properties.VariableNames = <span class=\"keyword\">...<\/span><span class=\"comment\">                        % format variable names<\/span>\r\n    regexprep(part1.Properties.VariableNames,<span class=\"string\">'_+$'<\/span>,<span class=\"string\">''<\/span>);     <span class=\"comment\">% by removing extra \"_\"<\/span>\r\ncsv = <span class=\"string\">'2016 New Coders Part 2.csv'<\/span>;                         <span class=\"comment\">% filename<\/span>\r\npart2 = readtable(csv);                                     <span class=\"comment\">% read into table<\/span>\r\npart2.Properties.VariableNames = <span class=\"keyword\">...<\/span><span class=\"comment\">                        % format variable names<\/span>\r\n    regexprep(part2.Properties.VariableNames,<span class=\"string\">'_+$'<\/span>,<span class=\"string\">''<\/span>);     <span class=\"comment\">% by removing extra \"_\"<\/span>\r\nwarning(<span class=\"string\">'on'<\/span>,<span class=\"string\">'MATLAB:table:ModifiedVarnames'<\/span>)               <span class=\"comment\">% enable warning<\/span>\r\npart1.SubmitDate_UTC = datetime(part1.SubmitDate_UTC);      <span class=\"comment\">% convert date strings to datetime<\/span>\r\npart2.SubmitDate_UTC = datetime(part2.SubmitDate_UTC);      <span class=\"comment\">% convert date strings to datetime<\/span>\r\ns = sprintf(<span class=\"string\">'part1 %d responses from %s thru %s\\n'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">     % summary of part1<\/span>\r\n    height(part1),datestr(min(part1.SubmitDate_UTC)), <span class=\"keyword\">...<\/span><span class=\"comment\">   % count of responses, start date<\/span>\r\n    datestr(max(part1.SubmitDate_UTC)));                    <span class=\"comment\">% and end date<\/span>\r\nfprintf(<span class=\"string\">'%spart2 %d responses from %s thru %s'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">         % summary of part2<\/span>\r\n    s,height(part2),datestr(min(part2.SubmitDate_UTC)), <span class=\"keyword\">...<\/span><span class=\"comment\"> % count of responses, start date<\/span>\r\n    datestr(max(part2.SubmitDate_UTC)));                    <span class=\"comment\">% and end date<\/span>\r\n<\/pre><pre class=\"codeoutput\">part1 15653 responses from 29-Mar-2016 21:23:43 thru 02-May-2016 19:12:44\r\npart2 14625 responses from 29-Mar-2016 21:25:36 thru 02-May-2016 18:35:59<\/pre><h4>Higher Female Representation Than Expected<a name=\"bfddaa56-0b3a-42c8-baff-9dab7527b7b0\"><\/a><\/h4><p>Let's start by plotting a histogram of age distrubution. Loren pointed out we can use the <tt>omitnan<\/tt> flag in <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/median.html\">median<\/a><\/tt> to deal with missing values instead of <tt><a href=\"https:\/\/www.mathworks.com\/help\/stats\/nanmedian.html\">nanmedian<\/a><\/tt>.<\/p><p>The histogram shows that a lot of people who responded to this survey fall into the so-called \"millenials\" category. It is interesting to see the number of women who responded to this survey, considering the often cited gender gap in STEM fields. It is not clear if this reflects the true population or are women are over-represented via self-selection? Or somehow self-teaching programming more appealing to women than traditional instruction?<\/p><pre class=\"codeinput\">age = part2.HowOldAreYou;                                   <span class=\"comment\">% get age from part2<\/span>\r\ngender = categorical(part2.What_sYourGender);               <span class=\"comment\">% get gender from part2 as categorical<\/span>\r\npart2.What_sYourGender = gender;                            <span class=\"comment\">% update table<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nx = age(age ~= 0 &amp; gender == <span class=\"string\">'male'<\/span>);                       <span class=\"comment\">% subset age by gender<\/span>\r\nhistogram(x)                                                <span class=\"comment\">% plot histogram<\/span>\r\ntext(50,550, sprintf(<span class=\"string\">'Median Age (Male)   : %d'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">        % annotate<\/span>\r\n    median(x,<span class=\"string\">'omitnan'<\/span>)))\r\ntext(50,470, sprintf(<span class=\"string\">'Mode Age (Male)     : %d'<\/span>,mode(x)))   <span class=\"comment\">% annotate<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nx = age(age ~= 0 &amp; gender == <span class=\"string\">'female'<\/span>);                     <span class=\"comment\">% subset age by gender<\/span>\r\nhistogram(x)                                                <span class=\"comment\">% plot histogram<\/span>\r\ntext(50,520, sprintf(<span class=\"string\">'Median Age (Female): %d'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">         % annotate<\/span>\r\n    median(x,<span class=\"string\">'omitnan'<\/span>)))\r\ntext(50,440, sprintf(<span class=\"string\">'Mode Age (Female)  : %d'<\/span>,mode(x)))    <span class=\"comment\">% annotate<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle(<span class=\"string\">'Age Distribution by Gender'<\/span>)                         <span class=\"comment\">% add title<\/span>\r\nxlabel(<span class=\"string\">'Age'<\/span>)                                               <span class=\"comment\">% add x axis label<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% add y axis label<\/span>\r\nlegend(<span class=\"string\">'Male'<\/span>,<span class=\"string\">'Female'<\/span>)                                     <span class=\"comment\">% add legend<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_01.png\" alt=\"\"> <h4>Mostly Studying in Countries of Citizenship<a name=\"3640d862-a498-4e79-a43e-dedde1a3f715\"><\/a><\/h4><p>Since the survey was done online, anyone could participate. Let's check the geographic breakdown. As you would expect, the largest portion of the responses came from the US. You can also see that female responses were 40.59% of male responses in the US, confirming high female representation in the responses.<\/p><p>China is notably missing from the top 10 countries. Perhaps the \"learn to code\" buzz has not caught on there?<\/p><pre class=\"codeinput\">part2.WhichCountryDoYouCurrentlyLiveIn = <span class=\"keyword\">...<\/span><span class=\"comment\">                % convert to categorical<\/span>\r\n    categorical(part2.WhichCountryDoYouCurrentlyLiveIn);\r\ncountry = part2.WhichCountryDoYouCurrentlyLiveIn;           <span class=\"comment\">% get country of residence<\/span>\r\ncatcount = countcats(country);                              <span class=\"comment\">% get category count<\/span>\r\ncats = categories(country);                                 <span class=\"comment\">% get categories<\/span>\r\n[~, rank] = sort(catcount,<span class=\"string\">'descend'<\/span>);                       <span class=\"comment\">% rank category by count<\/span>\r\nbelow_top10 = setdiff(cats,cats(rank(1:10)));               <span class=\"comment\">% categories below top 10<\/span>\r\ncountry = mergecats(country, below_top10, <span class=\"string\">'Other'<\/span>);         <span class=\"comment\">% merge them into other<\/span>\r\ncountry = reordercats(country,[cats(rank(1:10));{<span class=\"string\">'Other'<\/span>}]);<span class=\"comment\">% reorder cats by ranking<\/span>\r\nratio = sum(country == <span class=\"string\">'United States of America'<\/span> &amp; <span class=\"keyword\">...<\/span><span class=\"comment\">     % ratio of female\/male in us<\/span>\r\n    gender == <span class=\"string\">'female'<\/span>)\/sum(country == <span class=\"string\">'United States of America'<\/span> &amp; gender == <span class=\"string\">'male'<\/span>);\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nhistogram(country(gender == <span class=\"string\">'male'<\/span>))                        <span class=\"comment\">% plot histogram<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nhistogram(country(gender == <span class=\"string\">'female'<\/span>))                      <span class=\"comment\">% plot histogram<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.XTickLabelRotation = 90;                                 <span class=\"comment\">% rotate x tick label<\/span>\r\ntitle(<span class=\"string\">'Country of Residence by Gender'<\/span>)                     <span class=\"comment\">% add title<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% add y axis label<\/span>\r\nlegend(<span class=\"string\">'Male'<\/span>,<span class=\"string\">'Female'<\/span>)                                     <span class=\"comment\">% add legend<\/span>\r\ntext(1.5, 1900, sprintf(<span class=\"string\">'US Female\/Male %.2f%%'<\/span>,ratio*100)) <span class=\"comment\">% annotate<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_02.png\" alt=\"\"> <p>You can also visualize migration patterns by mapping countries of citizenship to countries of residence. The number of edges are just 467 - meaning only 467 out of all 14,625 responses in part2 are from migrants, and most people live and study in their countries of citizenship. If you take the ratio of immigration over emigration, US, United Kingdom, Canada, Australia Germany and Russia enjoy net gains from any <a href=\"https:\/\/en.wikipedia.org\/wiki\/Human_capital_flight\">brain drain<\/a>.<\/p><pre class=\"codeinput\">part2.WhichCountryAreYouACitizenOf = <span class=\"keyword\">...<\/span><span class=\"comment\">                    % convert to categorical<\/span>\r\n    categorical(part2.WhichCountryAreYouACitizenOf);\r\ncitizenship = part2.WhichCountryAreYouACitizenOf;           <span class=\"comment\">% get country of citizenship<\/span>\r\ntbl = table(cellstr(citizenship),cellstr(country));         <span class=\"comment\">% create table of residence and citizenship<\/span>\r\ntbl(isundefined(citizenship) &amp; isundefined(country),:) = [];<span class=\"comment\">% drop empty rows<\/span>\r\ntbl.(1)(strcmp(tbl.(1),<span class=\"string\">'&lt;undefined&gt;'<\/span>)) = <span class=\"keyword\">...<\/span><span class=\"comment\">                % use residence if citizenship is emtpy<\/span>\r\n    tbl.(2)(strcmp(tbl.(1),<span class=\"string\">'&lt;undefined&gt;'<\/span>));\r\ntbl.(2)(strcmp(tbl.(2),<span class=\"string\">'&lt;undefined&gt;'<\/span>)) = <span class=\"keyword\">...<\/span><span class=\"comment\">                % use citizenship if residence is emtpy<\/span>\r\n    tbl.(1)(strcmp(tbl.(2),<span class=\"string\">'&lt;undefined&gt;'<\/span>));\r\n[tbl, ~, idx] = unique(tbl,<span class=\"string\">'rows'<\/span>);                         <span class=\"comment\">% eliminate duplicate rows<\/span>\r\nw = accumarray(idx, 1);                                     <span class=\"comment\">% use count of duplicates as weight<\/span>\r\nG = digraph(tbl.(1), tbl.(2), w);                           <span class=\"comment\">% create a directed graph<\/span>\r\nindeg = indegree(G);                                        <span class=\"comment\">% get in-degrees<\/span>\r\nratio = indeg.\/outdegree(G);                                <span class=\"comment\">% get ratio of in-degrees over out-degrees<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\ncolormap <span class=\"string\">cool<\/span>                                               <span class=\"comment\">% set colormap<\/span>\r\nw = G.Edges.Weight;                                         <span class=\"comment\">% get weights<\/span>\r\nh = plot(G,<span class=\"string\">'MarkerSize'<\/span>,log(indeg+2),<span class=\"string\">'NodeCData'<\/span>,ratio, <span class=\"keyword\">...<\/span><span class=\"comment\"> % plot directional graph<\/span>\r\n    <span class=\"string\">'EdgeColor'<\/span>,[.7 .7 .7],<span class=\"string\">'EdgeAlpha'<\/span>,0.3,<span class=\"string\">'LineWidth'<\/span>,10*w\/max(w));\r\ncaxis([0 3])                                                <span class=\"comment\">% set color axis scaling<\/span>\r\naxis([-2.8 3.3 -4.5 3.7])                                   <span class=\"comment\">% set axis limits<\/span>\r\ntitle({<span class=\"string\">'Migration Pattern'<\/span>; <span class=\"keyword\">...<\/span><span class=\"comment\">                             % add title<\/span>\r\n    <span class=\"string\">'467 cases out of 14,625 responses (3.2%)'<\/span>})\r\nlabelnode(h,cats(rank(1:10)),cats(rank(1:10)))              <span class=\"comment\">% label top 10 nodes<\/span>\r\nnlabels = {<span class=\"string\">'Argentina'<\/span>,<span class=\"string\">'Azerbaijan'<\/span>,<span class=\"string\">'Chile'<\/span>,<span class=\"string\">'Congo'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">    % additional nodes to label<\/span>\r\n    <span class=\"string\">'Cote D''Ivoire'<\/span>,<span class=\"string\">'Croatia'<\/span>,<span class=\"string\">'Greece'<\/span>,<span class=\"string\">'Guyana'<\/span>,<span class=\"string\">'Latvia'<\/span>,<span class=\"string\">'Lesotho'<\/span>, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'Malta'<\/span>,<span class=\"string\">'Other'<\/span>,<span class=\"string\">'Paraguay'<\/span>,<span class=\"string\">'Philippines'<\/span>,<span class=\"string\">'Republic of Serbia'<\/span>,<span class=\"string\">'Romania'<\/span>};\r\nlabelnode(h,nlabels,nlabels);                               <span class=\"comment\">% label additional nodes<\/span>\r\nh = colorbar;                                               <span class=\"comment\">% add colorbar<\/span>\r\nylabel(h, <span class=\"string\">'in-degrees over out-degree ratio'<\/span>)               <span class=\"comment\">% add metric<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_03.png\" alt=\"\"> <h4>Ethnically Diverse English Speakers in US<a name=\"c105553c-1bb2-4250-aa57-76b7a685c7e1\"><\/a><\/h4><p>Let's focus on the US. As noted earlier, most new self-taught coders who responded to this survey were US citizens, but its ethnic makeup is very diverse. More than half of the women are ethnic minorities, and so are 1\/3 of men. They are also predominantly English speakers, given the low ratio of immigrants. However, we should note that the survey itself was in English and promoted via social media in English.<\/p><pre class=\"codeinput\">isminority = part2.AreYouAnEthnicMinorityInYourCountry;     <span class=\"comment\">% get monority status<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nsubplot(1,2,1)                                              <span class=\"comment\">% create a subplot<\/span>\r\nx = gender(country == <span class=\"string\">'United States of America'<\/span> <span class=\"keyword\">...<\/span><span class=\"comment\">        % subset<\/span>\r\n    &amp; isminority == 0);                                     <span class=\"comment\">% us non-minority<\/span>\r\nhistogram(x)                                                <span class=\"comment\">% plot histogram<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nx = gender(country == <span class=\"string\">'United States of America'<\/span> <span class=\"keyword\">...<\/span><span class=\"comment\">        % subset gender<\/span>\r\n    &amp; isminority == 1);                                     <span class=\"comment\">% by us minority<\/span>\r\nhistogram(x)                                                <span class=\"comment\">% plot histogram<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle(<span class=\"string\">'US Gender by Ethnic Category'<\/span>)                       <span class=\"comment\">% add title<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% add y axis label<\/span>\r\nlegend(<span class=\"string\">'Majority'<\/span>,<span class=\"string\">'Minority'<\/span>, <span class=\"string\">'Location'<\/span>,<span class=\"string\">'northwest'<\/span>)       <span class=\"comment\">% add legend<\/span>\r\npart2.WhichLanguageDoYouYouSpeakAtHomeWithYourFamily = <span class=\"keyword\">...<\/span><span class=\"comment\">  % convert to categorical<\/span>\r\n    categorical(part2.WhichLanguageDoYouYouSpeakAtHomeWithYourFamily);\r\nlanuage = part2.WhichLanguageDoYouYouSpeakAtHomeWithYourFamily;<span class=\"comment\">% get language<\/span>\r\nusa = lanuage(country == <span class=\"string\">'United States of America'<\/span>);       <span class=\"comment\">% extract us data<\/span>\r\ncatcount = countcats(usa);                                  <span class=\"comment\">% get category count<\/span>\r\ncats = categories(usa);                                     <span class=\"comment\">% get categories<\/span>\r\n[~, rank] = sort(catcount,<span class=\"string\">'descend'<\/span>);                       <span class=\"comment\">% rank category by count<\/span>\r\nbelow_top10 = setdiff(cats,cats(rank(1:10)));               <span class=\"comment\">% categories below top 10<\/span>\r\nusa = mergecats(usa, below_top10, <span class=\"string\">'Other'<\/span>);                 <span class=\"comment\">% merge them into other<\/span>\r\nusa = reordercats(usa,[cats(rank(1:10)); {<span class=\"string\">'Other'<\/span>}]);       <span class=\"comment\">% reorder cats by ranking<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.XTickLabelRotation = 90;                                 <span class=\"comment\">% rotate x tick label<\/span>\r\nsubplot(1,2,2)                                              <span class=\"comment\">% create a subplot<\/span>\r\nhistogram(usa(gender(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                         % plot histogram<\/span>\r\n    <span class=\"string\">'United States of America'<\/span>) == <span class=\"string\">'male'<\/span>))                 <span class=\"comment\">% subset us language by gender<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nhistogram(usa(gender(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                         % plot histogram<\/span>\r\n    <span class=\"string\">'United States of America'<\/span>) == <span class=\"string\">'female'<\/span>))               <span class=\"comment\">% subset us language by gender<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle(<span class=\"string\">'US Languages by Gender'<\/span>)                             <span class=\"comment\">% add title<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% add y axis label<\/span>\r\nlegend(<span class=\"string\">'Male'<\/span>,<span class=\"string\">'Female'<\/span>)                                     <span class=\"comment\">% add legend<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_04.png\" alt=\"\"> <h4>Many Are Highly Educated and Already Employed in the US<a name=\"b7785136-83a9-47ef-9dda-a7ad564dea88\"><\/a><\/h4><p>We already know that a lot of people who take MOOCs had already earned college degrees and have jobs. This survey also shows the same result.<\/p><pre class=\"codeinput\">part2.What_sTheHighestDegreeOrLevelOfSchoolYouHaveCompleted = <span class=\"keyword\">...<\/span><span class=\"comment\"> % convert to categorical<\/span>\r\n    categorical(part2.What_sTheHighestDegreeOrLevelOfSchoolYouHaveCompleted);\r\ndegree = part2.What_sTheHighestDegreeOrLevelOfSchoolYouHaveCompleted;<span class=\"comment\">% get degree<\/span>\r\nusa = degree(country == <span class=\"string\">'United States of America'<\/span>);        <span class=\"comment\">% extract us data<\/span>\r\ncatcount = countcats(usa);                                  <span class=\"comment\">% get category count<\/span>\r\ncats = categories(usa);                                     <span class=\"comment\">% get categories<\/span>\r\n[~, rank] = sort(catcount,<span class=\"string\">'descend'<\/span>);                       <span class=\"comment\">% rank category by count<\/span>\r\nusa = reordercats(usa,cats(rank));                          <span class=\"comment\">% reorder cats by ranking<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nsubplot(1,2,1)                                              <span class=\"comment\">% create a subplot<\/span>\r\nhistogram(usa(gender(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                         % plot histogram<\/span>\r\n    <span class=\"string\">'United States of America'<\/span>) == <span class=\"string\">'male'<\/span>))                 <span class=\"comment\">% subset us degree by gender<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nhistogram(usa(gender(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                         % plot histogram<\/span>\r\n    <span class=\"string\">'United States of America'<\/span>) == <span class=\"string\">'female'<\/span>))               <span class=\"comment\">% subset us degree by gender<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle(<span class=\"string\">'US Degrees by Gender'<\/span>)                               <span class=\"comment\">% add title<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% add y axis label<\/span>\r\nlegend(<span class=\"string\">'Male'<\/span>,<span class=\"string\">'Female'<\/span>)                                     <span class=\"comment\">% add legend<\/span>\r\npart2.RegardingEmploymentStatus_AreYouCurrently = <span class=\"keyword\">...<\/span><span class=\"comment\">       % convert to categorical<\/span>\r\n    categorical(part2.RegardingEmploymentStatus_AreYouCurrently);\r\nemployment = part2.RegardingEmploymentStatus_AreYouCurrently;<span class=\"comment\">% get employment<\/span>\r\nother = part2.Other;                                        <span class=\"comment\">% get other in employment<\/span>\r\nisstudent = zeros(size(other));                             <span class=\"comment\">% set up an accumulator<\/span>\r\nfun = @(x,y) ~cellfun(@isempty,strfind(lower(x),y));        <span class=\"comment\">% anonymous function handle<\/span>\r\nisstudent(fun(other,<span class=\"string\">'student'<\/span>)) = 1;                        <span class=\"comment\">% flag if 'studnet' is found<\/span>\r\nisstudent(fun(other,<span class=\"string\">'studying'<\/span>)) = 1;                       <span class=\"comment\">% flag if 'studying' is found<\/span>\r\nisstudent(fun(other,<span class=\"string\">'school'<\/span>)) = 1;                         <span class=\"comment\">% flag if 'school' is found<\/span>\r\nisstudent(fun(other,<span class=\"string\">'university'<\/span>)) = 1;                     <span class=\"comment\">% flag if 'university' is found<\/span>\r\nisstudent(fun(other,<span class=\"string\">'degree'<\/span>)) = 1;                         <span class=\"comment\">% flag if 'degree' is found<\/span>\r\nisstudent(fun(other,<span class=\"string\">'phd'<\/span>)) = 1;                            <span class=\"comment\">% flag if 'phd' is found<\/span>\r\nemployment(logical(isstudent)) = <span class=\"string\">'Student'<\/span>;                 <span class=\"comment\">% update employment<\/span>\r\nusa = employment(country == <span class=\"string\">'United States of America'<\/span>);    <span class=\"comment\">% extract us data<\/span>\r\ncatcount = countcats(usa);                                  <span class=\"comment\">% get category count<\/span>\r\ncats = categories(usa);                                     <span class=\"comment\">% get categories<\/span>\r\n[~, rank] = sort(catcount,<span class=\"string\">'descend'<\/span>);                       <span class=\"comment\">% rank category by count<\/span>\r\nusa = reordercats(usa,cats(rank));                          <span class=\"comment\">% reorder cats by ranking<\/span>\r\nsubplot(1,2,2)                                              <span class=\"comment\">% create a subplot<\/span>\r\nhistogram(usa(gender(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                         % plot histogram<\/span>\r\n    <span class=\"string\">'United States of America'<\/span>) == <span class=\"string\">'male'<\/span>))                 <span class=\"comment\">% subset us employment by gender<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nhistogram(usa(gender(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                         % plot histogram<\/span>\r\n    <span class=\"string\">'United States of America'<\/span>) == <span class=\"string\">'female'<\/span>))               <span class=\"comment\">% subset us employment by gender<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle(<span class=\"string\">'US Employment by Gender'<\/span>)                            <span class=\"comment\">% add title<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% add y axis label<\/span>\r\nlegend(<span class=\"string\">'Male'<\/span>,<span class=\"string\">'Female'<\/span>)                                     <span class=\"comment\">% add legend<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_05.png\" alt=\"\"> <h4>Many Already Work In Software Development and IT in US<a name=\"998ba97e-a120-43d0-baa2-0be34c02ebc6\"><\/a><\/h4><p>It turns out that many respondents already work in software development and IT fields and come from very diverse acadamic backgrounds, including both STEM as well as non-STEM subjects. Since the proportion of women tends to be higher in non-STEM majors, this may explain why we see higher than expected female representation in this survey. It appears that female respondents who studied non-STEM majors in undergraduate are now pursuing a career in software development.<\/p><p>Curiously, we also see many computer science majors and they tend to be men. Why are people who already have a computer science background pursuing self-teaching programming? Shouldn't they have learned it in school?<\/p><pre class=\"codeinput\">part2.WhichFieldDoYouWorkIn = <span class=\"keyword\">...<\/span><span class=\"comment\">                           % convert to categorical<\/span>\r\n    categorical(part2.WhichFieldDoYouWorkIn);\r\njob = part2.WhichFieldDoYouWorkIn;                          <span class=\"comment\">% get job<\/span>\r\njob = mergecats(job, {<span class=\"string\">'software development and IT'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">    % merge similar categories<\/span>\r\n    <span class=\"string\">'software development'<\/span>});\r\nus_job = job(country == <span class=\"string\">'United States of America'<\/span>);        <span class=\"comment\">% extract us data<\/span>\r\ncatcount = countcats(us_job );                              <span class=\"comment\">% get category count<\/span>\r\ncats = categories(us_job);                                  <span class=\"comment\">% get categories<\/span>\r\n[~, rank] = sort(catcount,<span class=\"string\">'descend'<\/span>);                       <span class=\"comment\">% rank category by count<\/span>\r\nbelow_top10 = setdiff(cats,cats(rank(1:10)));               <span class=\"comment\">% categories below top 10<\/span>\r\nus_job = mergecats(us_job, below_top10, <span class=\"string\">'Other'<\/span>);           <span class=\"comment\">% merge them into other<\/span>\r\nus_job = reordercats(us_job,[cats(rank(1:10)); {<span class=\"string\">'Other'<\/span>}]); <span class=\"comment\">% reorder cats by ranking<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nsubplot(1,2,1)                                              <span class=\"comment\">% create a subplot<\/span>\r\nhistogram(us_job(gender(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                      % plot histogram<\/span>\r\n    <span class=\"string\">'United States of America'<\/span>) == <span class=\"string\">'male'<\/span>))                 <span class=\"comment\">% subset us subject by gender<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nhistogram(us_job(gender(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                      % plot histogram<\/span>\r\n    <span class=\"string\">'United States of America'<\/span>) == <span class=\"string\">'female'<\/span>))               <span class=\"comment\">% subset us subject by gender<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle(<span class=\"string\">'US Job Field by Gender'<\/span>)                             <span class=\"comment\">% add title<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% add y axis label<\/span>\r\nlegend(<span class=\"string\">'Male'<\/span>,<span class=\"string\">'Female'<\/span>)                                     <span class=\"comment\">% add legend<\/span>\r\npart2.WhatWasTheMainSubjectYouStudiedInUniversity = <span class=\"keyword\">...<\/span><span class=\"comment\">     % convert to categorical<\/span>\r\n    categorical(part2.WhatWasTheMainSubjectYouStudiedInUniversity);\r\nmajor = part2.WhatWasTheMainSubjectYouStudiedInUniversity;  <span class=\"comment\">% get academic major<\/span>\r\nus_maj = major(country == <span class=\"string\">'United States of America'<\/span>);      <span class=\"comment\">% extract us data<\/span>\r\ncatcount = countcats(us_maj);                               <span class=\"comment\">% get category count<\/span>\r\ncats = categories(us_maj);                                  <span class=\"comment\">% get categories<\/span>\r\n[~, rank] = sort(catcount,<span class=\"string\">'descend'<\/span>);                       <span class=\"comment\">% rank category by count<\/span>\r\nbelow_top10 = setdiff(cats,cats(rank(1:10)));               <span class=\"comment\">% categories below top 10<\/span>\r\nus_maj = mergecats(us_maj, below_top10, <span class=\"string\">'Other'<\/span>);           <span class=\"comment\">% merge them into other<\/span>\r\nus_maj = reordercats(us_maj,[cats(rank(1:10)); {<span class=\"string\">'Other'<\/span>}]); <span class=\"comment\">% reorder cats by ranking<\/span>\r\nsubplot(1,2,2)                                              <span class=\"comment\">% create a subplot<\/span>\r\nhistogram(us_maj(gender(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                      % plot histogram<\/span>\r\n    <span class=\"string\">'United States of America'<\/span>) == <span class=\"string\">'male'<\/span>))                 <span class=\"comment\">% subset us subject by gender<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nhistogram(us_maj(gender(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                      % plot histogram<\/span>\r\n    <span class=\"string\">'United States of America'<\/span>) == <span class=\"string\">'female'<\/span>))               <span class=\"comment\">% subset us subject by gender<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle(<span class=\"string\">'US Academic Major by Gender'<\/span>)                        <span class=\"comment\">% add title<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% add y axis label<\/span>\r\nlegend(<span class=\"string\">'Male'<\/span>,<span class=\"string\">'Female'<\/span>, <span class=\"string\">'Location'<\/span>, <span class=\"string\">'northwest'<\/span>)            <span class=\"comment\">% add legend<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_06.png\" alt=\"\"> <p>Here is a quick sanity check. Yes, those computer science majors do seem to work in software development and IT, as indicated by the line width of the edge between them.<\/p><pre class=\"codeinput\">tbl = table(cellstr(us_maj),cellstr(us_job));               <span class=\"comment\">% create table of us major and us job<\/span>\r\ntbl(isundefined(us_maj) | isundefined(us_job) | <span class=\"keyword\">...<\/span><span class=\"comment\">         % remove undefined<\/span>\r\n    us_maj == <span class=\"string\">'Other'<\/span>,:)= [];                               <span class=\"comment\">% remove 'Other' from us major<\/span>\r\n[tbl, ~, idx] = unique(tbl,<span class=\"string\">'rows'<\/span>);                         <span class=\"comment\">% eliminate duplicate rows<\/span>\r\nw = accumarray(idx, 1);                                     <span class=\"comment\">% use count of duplicates as weight<\/span>\r\nG = digraph(tbl.(1), tbl.(2), w);                           <span class=\"comment\">% create a directed graph<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nw = G.Edges.Weight;                                         <span class=\"comment\">% get weights<\/span>\r\nh = plot(G,<span class=\"string\">'Layout'<\/span>,<span class=\"string\">'layered'<\/span>,<span class=\"string\">'LineWidth'<\/span>,5*w\/max(w));      <span class=\"comment\">% plot the directed graph<\/span>\r\nxlim([-2.5 16])                                             <span class=\"comment\">% x-axis limits<\/span>\r\nhighlight(h, unique(tbl.(2)),<span class=\"string\">'NodeColor'<\/span>,[.85 .33 .1])      <span class=\"comment\">% highlight job nodes<\/span>\r\ntitle({<span class=\"string\">'US Majors vs. Job Fields'<\/span>; <span class=\"keyword\">...<\/span><span class=\"comment\">                      % add title<\/span>\r\n    <span class=\"string\">'Line Width Varies by Frequency'<\/span>})\r\ntext(-2, 2, <span class=\"string\">'Majors'<\/span>, <span class=\"string\">'FontWeight'<\/span>,<span class=\"string\">'Bold'<\/span>)                  <span class=\"comment\">% annotate<\/span>\r\ntext(-2, 1, <span class=\"string\">'Job Fields'<\/span>, <span class=\"string\">'FontWeight'<\/span>,<span class=\"string\">'Bold'<\/span>)              <span class=\"comment\">% annotate<\/span>\r\nannotation(<span class=\"string\">'arrow'<\/span>,[.2 .2],[.75 .25], <span class=\"string\">'Color'<\/span>,[0 .45 .75])  <span class=\"comment\">% annotate<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_07.png\" alt=\"\"> <h4>Academic Background in Software Development and IT<a name=\"af8d347c-f184-4f80-83b2-8fcd766c6f61\"><\/a><\/h4><p>Taking a deep dive in Software Development and IT, we see that not everyone that is a computer science major and has a diverse academic background is represented in the industry, and they are probably in different career tracks based on their background. Women represent a higher proportion in English, Psychology and Other.<\/p><pre class=\"codeinput\">us_maj_it = major(country == <span class=\"string\">'United States of America'<\/span> <span class=\"keyword\">...<\/span><span class=\"comment\"> % subset major by country<\/span>\r\n    &amp; job == <span class=\"string\">'software development and IT'<\/span>);                <span class=\"comment\">% and job<\/span>\r\ncatcount = countcats(us_maj_it);                            <span class=\"comment\">% get category count<\/span>\r\ncats = categories(us_maj_it);                               <span class=\"comment\">% get categories<\/span>\r\n[~, rank] = sort(catcount,<span class=\"string\">'descend'<\/span>);                       <span class=\"comment\">% rank category by count<\/span>\r\nbelow_top10 = setdiff(cats,cats(rank(1:10)));               <span class=\"comment\">% categories below top 10<\/span>\r\nus_maj_it = mergecats(us_maj_it, below_top10, <span class=\"string\">'Other'<\/span>);     <span class=\"comment\">% merge them into other<\/span>\r\nus_maj_it = reordercats(us_maj_it,[cats(rank(1:10)); {<span class=\"string\">'Other'<\/span>}]);<span class=\"comment\">% reorder cats by ranking<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nhistogram(us_maj_it(gender(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                   % plot histogram<\/span>\r\n    <span class=\"string\">'United States of America'<\/span> &amp; <span class=\"keyword\">...<\/span><span class=\"comment\">                        % subset us subject by job<\/span>\r\n    job == <span class=\"string\">'software development and IT'<\/span>) == <span class=\"string\">'male'<\/span>))       <span class=\"comment\">% and gender<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nhistogram(us_maj_it(gender(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                   % plot histogram<\/span>\r\n    <span class=\"string\">'United States of America'<\/span> &amp; <span class=\"keyword\">...<\/span><span class=\"comment\">                        % subset us subject by job<\/span>\r\n    job == <span class=\"string\">'software development and IT'<\/span>) == <span class=\"string\">'female'<\/span>))     <span class=\"comment\">% and gender<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.XTickLabelRotation = 90;                                 <span class=\"comment\">% rotate x tick label<\/span>\r\ntitle(<span class=\"string\">'US Software Development and IT - Majors by Gender'<\/span>)  <span class=\"comment\">% add title<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% add y axis label<\/span>\r\nlegend(<span class=\"string\">'Male'<\/span>,<span class=\"string\">'Female'<\/span>, <span class=\"string\">'Location'<\/span>, <span class=\"string\">'northwest'<\/span>)            <span class=\"comment\">% add legend<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_08.png\" alt=\"\"> <h4>Wide Income Gap in Software Development and IT<a name=\"76c5f36d-ef89-4bd4-8657-f6503ceecb58\"><\/a><\/h4><p>Let's compare the income range by Job Field using a <a href=\"https:\/\/www.mathworks.com\/help\/stats\/boxplot.html\">box plot<\/a>. The bottom and top of the box represents the first and third quantiles and the middle red line represents the median, and whiskers represent +\/- 2.7 standard deviations. Red \"+\"s show the outliers.<\/p><p>Compared to other job fields, income range in software development and IT has a wide spread (as indicated by the elongnated box shape and longer whisker), meaning there is good upside potential to do better.<\/p><pre class=\"codeinput\">income = part2.AboutHowMuchMoneyDidYouMakeLastYear_inUSDollars;<span class=\"comment\">% get income<\/span>\r\nincome = str2double(income);                                <span class=\"comment\">% convert to numeric<\/span>\r\nincome(income == 0) = NaN;                                  <span class=\"comment\">% don't count zero<\/span>\r\nus_income = income(country == <span class=\"string\">'United States of America'<\/span>);  <span class=\"comment\">% extract us data<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nboxplot(us_income,us_job)                                   <span class=\"comment\">% create a box plot<\/span>\r\nylim([0 2*10^5])                                            <span class=\"comment\">% set upper limit<\/span>\r\ntitle(<span class=\"string\">'US Income Distribution by Job Field'<\/span>)                <span class=\"comment\">% add title<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.XTickLabelRotation = 90;                                 <span class=\"comment\">% rotate x tick label<\/span>\r\nax.YTickLabel = {<span class=\"string\">'$0'<\/span>,<span class=\"string\">'$50k'<\/span>,<span class=\"string\">'$100k'<\/span>,<span class=\"string\">'$150k'<\/span>,<span class=\"string\">'$200k'<\/span>};      <span class=\"comment\">% set y tick label<\/span>\r\nylabel(<span class=\"string\">'Annual Income'<\/span>)                                     <span class=\"comment\">% add y axis label<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_09.png\" alt=\"\"> <h4>What Affects Income in Software Development and IT?<a name=\"bf0928fd-56a1-4459-ace9-1bae375a3d27\"><\/a><\/h4><p>The first factor that may affect the income in software development and IT is academic background. The box plot shows that Computer Science and Electric Engineering give you the most advantage in getting a higher salary. This is probably the motivation behind the self-learning, how-to-code trend - people want to switch to a more lucrativie career path from their current path or advance more quickly within the same industry.<\/p><pre class=\"codeinput\">us_income_it = income(country == <span class=\"string\">'United States of America'<\/span> <span class=\"keyword\">...<\/span><span class=\"comment\">% subset income by country<\/span>\r\n    &amp; job == <span class=\"string\">'software development and IT'<\/span>);                <span class=\"comment\">% and job<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nboxplot(us_income_it,us_maj_it)                             <span class=\"comment\">% create a box plot<\/span>\r\nylim([0 2*10^5])                                            <span class=\"comment\">% set upper limit<\/span>\r\ntitle({<span class=\"string\">'US Income Distribution by Major'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">.              % add title<\/span>\r\n    <span class=\"string\">'in Software Development and IT'<\/span>})\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.XTickLabelRotation = 90;                                 <span class=\"comment\">% rotate x tick label<\/span>\r\nax.YTickLabel = {<span class=\"string\">'$0'<\/span>,<span class=\"string\">'$50k'<\/span>,<span class=\"string\">'$100k'<\/span>,<span class=\"string\">'$150k'<\/span>,<span class=\"string\">'$200k'<\/span>};      <span class=\"comment\">% set y tick label<\/span>\r\nylabel(<span class=\"string\">'Annual Income'<\/span>)                                     <span class=\"comment\">% add y axis label<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_10.png\" alt=\"\"> <h4>Age Factor<a name=\"00fb1938-9418-402a-8b9a-51b521886462\"><\/a><\/h4><p>Another important factor to consider is age. If you plot the age against income in Software Development and IT, you see a wide income gap among younger Computer Science majors. Some 25 year-olds can be earning $0 vs.$110,000 a year. Income also seems to plateau as you age. You can use <tt><a href=\"https:\/\/www.mathworks.com\/help\/curvefit\/fit.html\">fit<\/a><\/tt> with the <tt><a href=\"https:\/\/www.mathworks.com\/help\/curvefit\/list-of-library-models-for-curve-and-surface-fitting.html#btbcvnl\">exp2<\/a><\/tt> option to <a href=\"https:\/\/www.mathworks.com\/help\/curvefit\/exponential.html\">apply a two-term exponential curve<\/a> to the data so you can see it easily. Perhaps this provides motivation for CS majors to improve their skills and experience as quickly as possible?<\/p><pre class=\"codeinput\">us_age_it = age(country == <span class=\"string\">'United States of America'<\/span> <span class=\"keyword\">...<\/span><span class=\"comment\">   % subset age by country<\/span>\r\n    &amp; job == <span class=\"string\">'software development and IT'<\/span>);                <span class=\"comment\">% and job<\/span>\r\nX = us_age_it(us_maj_it == <span class=\"string\">'Computer Science'<\/span>);             <span class=\"comment\">% subset just CS<\/span>\r\nY = us_income_it(us_maj_it == <span class=\"string\">'Computer Science'<\/span>);          <span class=\"comment\">% subset just CS<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nplot(X, Y,<span class=\"string\">'o'<\/span>)                                              <span class=\"comment\">% plot data<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nmissingrows = isnan(X) | isnan(Y);                          <span class=\"comment\">% find NaNs<\/span>\r\nX(missingrows) = [];                                        <span class=\"comment\">% remove NaNs<\/span>\r\nY(missingrows) = [];                                        <span class=\"comment\">% remove NaNs<\/span>\r\nfitresult = fit(X,Y,<span class=\"string\">'exp2'<\/span>);                                <span class=\"comment\">% fit to exp2<\/span>\r\nplot(fitresult)                                             <span class=\"comment\">% plot curve<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle({<span class=\"string\">'Income by Age Among CS Majors'<\/span>; <span class=\"keyword\">...<\/span><span class=\"comment\">                 % add title<\/span>\r\n    <span class=\"string\">'in Software Development and IT'<\/span>})\r\nxlim([10 60])                                               <span class=\"comment\">% set x axis limits<\/span>\r\nylim([0 2*10^5])                                            <span class=\"comment\">% set y axis limits<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.YTick = 0:50000:200000;                                  <span class=\"comment\">% set y tick<\/span>\r\nax.YTickLabel = {<span class=\"string\">'$0'<\/span>,<span class=\"string\">'$50k'<\/span>,<span class=\"string\">'$100k'<\/span>,<span class=\"string\">'$150k'<\/span>,<span class=\"string\">'$200k'<\/span>};      <span class=\"comment\">% set y tick label<\/span>\r\nxlabel(<span class=\"string\">'Age'<\/span>)                                               <span class=\"comment\">% add x axis label<\/span>\r\nylabel(<span class=\"string\">'Annual Income'<\/span>)                                     <span class=\"comment\">% add y axis label<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_11.png\" alt=\"\"> <h4>Big Companies Not Preferred<a name=\"9092c558-679e-46ac-8390-b6a59f69d33b\"><\/a><\/h4><p>If you look at the future employment those people are looking for, they prefer big companies the least. Since people are not earning a formal degree, they are probably expecting more flexibility in other employment options.<\/p><p>The most popular option is mid-sized companies, but many people are interested in working for a startup, or starting their own businesses, or freelancing. People in software development and IT tend to prefer working for startup or mid-sized companies more. Men tend to prefer doing their own businesses or work for a startup, while women tend to prefer the freelance path.<\/p><pre class=\"codeinput\">part2.want_employment_type = <span class=\"keyword\">...<\/span><span class=\"comment\">                            % convert to categorical<\/span>\r\n    categorical(part2.want_employment_type);\r\ninterested_emp = part2.want_employment_type;                <span class=\"comment\">% get interested employment type<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nsubplot(1,2,1)                                              <span class=\"comment\">% create a subplot<\/span>\r\nus_int_emp_it = interested_emp(country == <span class=\"keyword\">...<\/span><span class=\"comment\">               % subset it by country and job<\/span>\r\n    <span class=\"string\">'United States of America'<\/span> &amp; job == <span class=\"string\">'software development and IT'<\/span>);\r\nus_int_emp_it(isundefined(us_int_emp_it)) = [];             <span class=\"comment\">% remove undefined<\/span>\r\nus_int_emp_it = removecats(us_int_emp_it);                  <span class=\"comment\">% remove unused categories<\/span>\r\nhistogram(us_int_emp_it,<span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)      <span class=\"comment\">% plot histogram<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nus_int_emp_non_it = interested_emp(country == <span class=\"keyword\">...<\/span><span class=\"comment\">           % subset it by country and job<\/span>\r\n    <span class=\"string\">'United States of America'<\/span> &amp; job ~= <span class=\"string\">'software development and IT'<\/span>);\r\nus_int_emp_non_it(isundefined(us_int_emp_non_it)) = [];     <span class=\"comment\">% remove undefined<\/span>\r\nus_int_emp_non_it = removecats(us_int_emp_non_it);          <span class=\"comment\">% remove unused categories<\/span>\r\nhistogram(us_int_emp_non_it,<span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)  <span class=\"comment\">% plot histogram<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle({<span class=\"string\">'US Desired Employment Type'<\/span>;<span class=\"string\">'by Job Field'<\/span>})        <span class=\"comment\">% add title<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.XTickLabelRotation = 90;                                 <span class=\"comment\">% rotate x tick label<\/span>\r\nax.YTick = 0:0.1:0.6;                                       <span class=\"comment\">% set y tick<\/span>\r\nax.YTickLabel = {<span class=\"string\">'0%'<\/span>,<span class=\"string\">'10%'<\/span>,<span class=\"string\">'20%'<\/span>,<span class=\"string\">'30%'<\/span>,<span class=\"string\">'40%'<\/span>,<span class=\"string\">'50%'<\/span>,<span class=\"string\">'60%'<\/span>}; <span class=\"comment\">% set y tick label<\/span>\r\nlegend(<span class=\"string\">'Software Dev and IT'<\/span>, <span class=\"string\">'Others'<\/span>)                     <span class=\"comment\">% add legend<\/span>\r\nylim([0 0.6])                                               <span class=\"comment\">% set y axis limits<\/span>\r\nsubplot(1,2,2)                                              <span class=\"comment\">% create a subplot<\/span>\r\nus_int_emp_m = interested_emp(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                % subset it by country<\/span>\r\n    <span class=\"string\">'United States of America'<\/span> &amp; gender == <span class=\"string\">'male'<\/span>);         <span class=\"comment\">% gender<\/span>\r\nus_int_emp_m(isundefined(us_int_emp_m)) = [];               <span class=\"comment\">% remove undefined<\/span>\r\nus_int_emp_m = removecats(us_int_emp_m);                    <span class=\"comment\">% remove unused categories<\/span>\r\nhistogram(us_int_emp_m,<span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)       <span class=\"comment\">% plot histogram<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nus_int_emp_f = interested_emp(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                % subset it by country<\/span>\r\n    <span class=\"string\">'United States of America'<\/span> &amp; gender == <span class=\"string\">'female'<\/span>);       <span class=\"comment\">% gender<\/span>\r\nus_int_emp_f(isundefined(us_int_emp_f)) = [];               <span class=\"comment\">% remove undefined<\/span>\r\nus_int_emp_f = removecats(us_int_emp_f);                    <span class=\"comment\">% remove unused categories<\/span>\r\nhistogram(us_int_emp_f,<span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)       <span class=\"comment\">% plot histogram<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle({<span class=\"string\">'US Desired Employment Type'<\/span>;<span class=\"string\">'by Gender'<\/span>})           <span class=\"comment\">% add title<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.XTickLabelRotation = 90;                                 <span class=\"comment\">% rotate x tick label<\/span>\r\nax.YTick = 0:0.1:0.6;                                       <span class=\"comment\">% set y tick<\/span>\r\nax.YTickLabel = {<span class=\"string\">'0%'<\/span>,<span class=\"string\">'10%'<\/span>,<span class=\"string\">'20%'<\/span>,<span class=\"string\">'30%'<\/span>,<span class=\"string\">'40%'<\/span>,<span class=\"string\">'50%'<\/span>,<span class=\"string\">'60%'<\/span>}; <span class=\"comment\">% set y tick label<\/span>\r\nlegend(<span class=\"string\">'Male'<\/span>, <span class=\"string\">'Female'<\/span>)                                    <span class=\"comment\">% add legend<\/span>\r\nylim([0 0.6])                                               <span class=\"comment\">% set y axis limits<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_12.png\" alt=\"\"> <h4>Dream Jobs<a name=\"d52a48e9-e8e9-4a37-8a76-64a56d0135b2\"><\/a><\/h4><p>When it comes to actual jobs people are interested in, they are mostly web development positions. People already in software development and IT tend to prefer roles with higher technical skills - <a href=\"https:\/\/en.wikipedia.org\/wiki\/Front_and_back_ends\">Back-End Web Development<\/a>, <a href=\"https:\/\/en.wikipedia.org\/wiki\/DevOps\">DevOps<\/a>, or <a href=\"https:\/\/en.wikipedia.org\/wiki\/System_administrator\">SysAdmin<\/a> rather than <a href=\"https:\/\/en.wikipedia.org\/wiki\/Front-end_web_development\">Front-End Web Development<\/a>, or other non-development roles such as Product Manager or QA Engineer. In terms of gender, women tend to prefer Front-End Web Development and <a href=\"https:\/\/en.wikipedia.org\/wiki\/User_experience_design\">User Experience Design<\/a>.<\/p><pre class=\"codeinput\">int_job = categorical(strtrim(part2.jobs_interested_in));   <span class=\"comment\">% get interested job<\/span>\r\ncatcount = countcats(int_job);                              <span class=\"comment\">% get category count<\/span>\r\ncats = categories(int_job);                                 <span class=\"comment\">% get categories<\/span>\r\n[~, rank] = sort(catcount,<span class=\"string\">'descend'<\/span>);                       <span class=\"comment\">% rank category by count<\/span>\r\nbelow_top10 = setdiff(cats,cats(rank(1:10)));               <span class=\"comment\">% categories below top 10<\/span>\r\nint_job = mergecats(int_job, below_top10,<span class=\"string\">'Other'<\/span>);          <span class=\"comment\">% merge them into other<\/span>\r\nint_job = reordercats(int_job,[cats(rank(1:10));{<span class=\"string\">'Other'<\/span>}]);<span class=\"comment\">% reorder cats by ranking<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nsubplot(1,2,1)                                              <span class=\"comment\">% create a subplot<\/span>\r\nus_int_job_non_it = int_job(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                  % subset int job by country and job<\/span>\r\n     <span class=\"string\">'United States of America'<\/span> &amp; job ~= <span class=\"string\">'software development and IT'<\/span>);\r\nhistogram(us_int_job_non_it, <span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>) <span class=\"comment\">% plot histogram<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nus_int_job_it = int_job(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                      % subset int job by country and job<\/span>\r\n     <span class=\"string\">'United States of America'<\/span> &amp; job == <span class=\"string\">'software development and IT'<\/span>);\r\nhistogram(us_int_job_it, <span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)     <span class=\"comment\">% plot histogram<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle({<span class=\"string\">'US Jobs Interested In'<\/span>;<span class=\"string\">'By Job Field'<\/span>})             <span class=\"comment\">% add title<\/span>\r\nlegend(<span class=\"string\">'All Others'<\/span>,<span class=\"string\">'Software Dev &amp; IT'<\/span>)                    <span class=\"comment\">% add legend<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.YTick = 0:0.1:0.5;                                       <span class=\"comment\">% set y tick<\/span>\r\nax.YTickLabel = {<span class=\"string\">'0%'<\/span>,<span class=\"string\">'10%'<\/span>,<span class=\"string\">'20%'<\/span>,<span class=\"string\">'30%'<\/span>,<span class=\"string\">'40%'<\/span>,<span class=\"string\">'50%'<\/span>};       <span class=\"comment\">% set y tick label<\/span>\r\nylim([0 0.5])                                               <span class=\"comment\">% set y axis limits<\/span>\r\nsubplot(1,2,2)                                              <span class=\"comment\">% create a subplot<\/span>\r\nus_int_job_m = int_job(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                       % subset int job by country<\/span>\r\n     <span class=\"string\">'United States of America'<\/span> &amp; gender == <span class=\"string\">'male'<\/span>);        <span class=\"comment\">% and gender<\/span>\r\nhistogram(us_int_job_m, <span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)      <span class=\"comment\">% plot histogram<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nus_int_job_f = int_job(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                       % subset int job by country<\/span>\r\n     <span class=\"string\">'United States of America'<\/span> &amp; gender == <span class=\"string\">'female'<\/span>);      <span class=\"comment\">% and gender<\/span>\r\nhistogram(us_int_job_f, <span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)      <span class=\"comment\">% plot histogram<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle({<span class=\"string\">'US Jobs Interested In'<\/span>;<span class=\"string\">'By Gender'<\/span>})                <span class=\"comment\">% add title<\/span>\r\nlegend(<span class=\"string\">'Male'<\/span>,<span class=\"string\">'Female'<\/span>)                                     <span class=\"comment\">% add legend<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.YTick = 0:0.1:0.5;                                       <span class=\"comment\">% set y tick<\/span>\r\nax.YTickLabel = {<span class=\"string\">'0%'<\/span>,<span class=\"string\">'10%'<\/span>,<span class=\"string\">'20%'<\/span>,<span class=\"string\">'30%'<\/span>,<span class=\"string\">'40%'<\/span>,<span class=\"string\">'50%'<\/span>};       <span class=\"comment\">% set y tick label<\/span>\r\nylim([0 0.5])                                               <span class=\"comment\">% set y axis limits<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_13.png\" alt=\"\"> <h4>Student Loan Debt<a name=\"418470eb-3592-4984-8457-93fff2abade7\"><\/a><\/h4><p>The survey also answers how much student loan debt respondents carry and how much they spend learning to code. Over 41% of the respondents have student loan debt and the median amount owed is $25,000.  In addition, people do spend during the course of learning to code, and the median total spend is $300. Given that a lot of people have debt, they cannot afford to spend more and add to their deficit, and that also reflects on the more conservative choices in future employment.<\/p><pre class=\"codeinput\">debt = part2.AboutHowMuchDoYouOweInStudentLoans_inUSDollars;<span class=\"comment\">% get student load debt<\/span>\r\ndebt = str2double(debt);                                    <span class=\"comment\">% convert to numeric<\/span>\r\ndebt(debt == 0) = NaN;                                      <span class=\"comment\">% don't count zero<\/span>\r\nus_debt = debt(country == <span class=\"string\">'United States of America'<\/span>);      <span class=\"comment\">% extract us data<\/span>\r\npct_in_debt = sum(~isnan(us_debt))\/length(us_debt)*100;     <span class=\"comment\">% percentage in debt<\/span>\r\nmedian_debt = nanmedian(us_debt)\/1000;                      <span class=\"comment\">% median debt<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nsubplot(1,2,1)                                              <span class=\"comment\">% create a subplot<\/span>\r\nhistogram(us_debt)                                          <span class=\"comment\">% plot histogram<\/span>\r\nxlim([0 2*10^5])                                            <span class=\"comment\">% set y axis limits<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.XTick = 0:50000:200000;                                  <span class=\"comment\">% set x tick<\/span>\r\nax.XTickLabel = {<span class=\"string\">'$0'<\/span>,<span class=\"string\">'$50k'<\/span>,<span class=\"string\">'$100k'<\/span>,<span class=\"string\">'$150k'<\/span>,<span class=\"string\">'$200k'<\/span>};      <span class=\"comment\">% set x tick label<\/span>\r\nxlabel(<span class=\"string\">'Amount Owed'<\/span>)                                       <span class=\"comment\">% add x axis label<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% add y axis label<\/span>\r\ntitle({<span class=\"string\">'US Student Loan Debt'<\/span>; <span class=\"keyword\">...<\/span><span class=\"comment\">                          % add title<\/span>\r\n    sprintf(<span class=\"string\">'%.2f%% in Debt (Median $%dk)'<\/span>,pct_in_debt,median_debt)})\r\nsubplot(1,2,2)                                              <span class=\"comment\">% create a subplot<\/span>\r\nspend = part2.total_spent_learning;                         <span class=\"comment\">% get total spend<\/span>\r\nspend = str2double(spend);                                  <span class=\"comment\">% convert to numeric<\/span>\r\nspend(spend == 0) = NaN;                                    <span class=\"comment\">% don't count zero<\/span>\r\nus_spend = spend(country == <span class=\"string\">'United States of America'<\/span>);    <span class=\"comment\">% extract us data<\/span>\r\nhistogram(us_spend)                                         <span class=\"comment\">% plot histogram<\/span>\r\nxlim([0 3*10^4])                                            <span class=\"comment\">% set y axis limits<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.XTick = 0:10000:30000;                                   <span class=\"comment\">% set x tick<\/span>\r\nax.XTickLabel = {<span class=\"string\">'$0'<\/span>,<span class=\"string\">'$10k'<\/span>,<span class=\"string\">'$20k'<\/span>,<span class=\"string\">'$30k'<\/span>};                <span class=\"comment\">% set x tick label<\/span>\r\nxlabel(<span class=\"string\">'Total Spend'<\/span>)                                       <span class=\"comment\">% add x axis label<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% add y axis label<\/span>\r\ntitle({<span class=\"string\">'US Spend Learning'<\/span>; sprintf(<span class=\"string\">'Median $%d'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">       % add title<\/span>\r\n    nanmedian(us_spend))})\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_14.png\" alt=\"\"> <h4>Women Prefer More Welcoming Venues<a name=\"315af204-2474-485b-b03b-20c27ca8464d\"><\/a><\/h4><p>This survey seems to show more female participation in the \"learn to code\" movement compared to a more tranditional computer science education. When you look at the type of events women prefer, they show strong preference for gender-specific events like \"Girl Develop It\" and \"Women Who Code\". When you look at the online resources, you don't see much difference by gender. It appears that physical presence of males makes women feel unwelcome.<\/p><pre class=\"codeinput\">events_attended = part2.attended_event_types;               <span class=\"comment\">% get events attended<\/span>\r\nevents_attended = cellfun(@(x) strsplit(x,<span class=\"string\">','<\/span>), <span class=\"keyword\">...<\/span><span class=\"comment\">         % split by comma<\/span>\r\n    events_attended,<span class=\"string\">'UniformOutput'<\/span>,false);\r\nevents_attended_flatten = strtrim([events_attended{:}]);    <span class=\"comment\">% un-nest and trim<\/span>\r\n[~,ia,ib] = unique(lower(events_attended_flatten));         <span class=\"comment\">% get indices of uniques<\/span>\r\nevents = events_attended_flatten(ia);                       <span class=\"comment\">% get unique values<\/span>\r\ncount = accumarray(ib,1);                                   <span class=\"comment\">% count unique values<\/span>\r\nevents(count &lt; 100) = [];                                   <span class=\"comment\">% drop unpopular events<\/span>\r\nevents(strcmpi(events,<span class=\"string\">'none'<\/span>)) = [];                        <span class=\"comment\">% drop 'none'<\/span>\r\nevents(cellfun(@isempty,events)) = [];                      <span class=\"comment\">% drop empty cell<\/span>\r\nattended = zeros(size(events_attended,1),length(events));   <span class=\"comment\">% set up accumulator<\/span>\r\n<span class=\"keyword\">for<\/span> i = 1:size(events_attended,1)                           <span class=\"comment\">% loop over events attended<\/span>\r\n    attended(i,:) = ismember(events,strtrim( <span class=\"keyword\">...<\/span><span class=\"comment\">            % find intersection between<\/span>\r\n        events_attended{i}));                               <span class=\"comment\">% events and attended events<\/span>\r\n<span class=\"keyword\">end<\/span>\r\nattended_m = sum(attended(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                    % subset attended by country<\/span>\r\n    <span class=\"string\">'United States of America'<\/span> &amp; gender == <span class=\"string\">'male'<\/span>,:));      <span class=\"comment\">% and gender<\/span>\r\nattended_f = sum(attended(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                    % subset attended by country<\/span>\r\n    <span class=\"string\">'United States of America'<\/span> &amp; gender == <span class=\"string\">'female'<\/span>,:));    <span class=\"comment\">% and gender<\/span>\r\ngender_ratio = attended_m .\/ sum(attended_m);               <span class=\"comment\">% get male ratio by event<\/span>\r\ngender_ratio = [gender_ratio; attended_f.\/sum(attended_f)]; <span class=\"comment\">% add female ratio<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nsubplot(1,2,1)                                              <span class=\"comment\">% create a subplot<\/span>\r\nb = bar(gender_ratio',<span class=\"string\">'FaceColor'<\/span>,[0 .45 .75], <span class=\"keyword\">...<\/span><span class=\"comment\">          % create a bar chart<\/span>\r\n    <span class=\"string\">'FaceAlpha'<\/span>,.6);                                        <span class=\"comment\">% with histogram colors<\/span>\r\nb(2).FaceColor = [.85 .33 .1];                              <span class=\"comment\">% with histogram colors<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.XTickLabel = events;                                     <span class=\"comment\">% set x tick label<\/span>\r\nax.XTickLabelRotation = 90;                                 <span class=\"comment\">% rotate x tick label<\/span>\r\nax.YTick = 0:0.1:0.4;                                       <span class=\"comment\">% set y tick<\/span>\r\nax.YTickLabel = {<span class=\"string\">'0%'<\/span>,<span class=\"string\">'10%'<\/span>,<span class=\"string\">'20%'<\/span>,<span class=\"string\">'30%'<\/span>,<span class=\"string\">'40%'<\/span>};             <span class=\"comment\">% set y tick label<\/span>\r\ntitle(<span class=\"string\">'US Popular Events Attended'<\/span>)                         <span class=\"comment\">% add title<\/span>\r\nlegend(<span class=\"string\">'Male'<\/span>,<span class=\"string\">'Female'<\/span>)                                     <span class=\"comment\">% add legend<\/span>\r\nsubplot(1,2,2)                                              <span class=\"comment\">% create a subplot<\/span>\r\nresources_used = part2.learning_resources;                  <span class=\"comment\">% get resources used<\/span>\r\nresources_used = cellfun(@(x) strsplit(x,<span class=\"string\">','<\/span>), <span class=\"keyword\">...<\/span><span class=\"comment\">          % split by comma<\/span>\r\n    resources_used,<span class=\"string\">'UniformOutput'<\/span>,false);\r\nresources_used_flatten = strtrim([resources_used{:}]);      <span class=\"comment\">% un-nest and trim<\/span>\r\n[~,ia,ib] = unique(lower(resources_used_flatten));          <span class=\"comment\">% get indices of uniques<\/span>\r\nresources = resources_used_flatten(ia);                     <span class=\"comment\">% get unique values<\/span>\r\ncount = accumarray(ib,1);                                   <span class=\"comment\">% count unique values<\/span>\r\nresources(count &lt; 100) = [];                                <span class=\"comment\">% drop unpopular resources<\/span>\r\nresources(cellfun(@isempty,resources)) = [];                <span class=\"comment\">% drop empty cell<\/span>\r\nusage = zeros(size(resources_used,1),length(resources));    <span class=\"comment\">% set up accumulator<\/span>\r\n<span class=\"keyword\">for<\/span> i = 1:size(resources_used,1)                            <span class=\"comment\">% loop over resources used<\/span>\r\n    usage(i,:) = ismember(resources,strtrim( <span class=\"keyword\">...<\/span><span class=\"comment\">            % find intersection between<\/span>\r\n        resources_used{i}));                                <span class=\"comment\">% resources and resource used<\/span>\r\n<span class=\"keyword\">end<\/span>\r\nusage_m = sum(usage(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                          % subset usage by country<\/span>\r\n    <span class=\"string\">'United States of America'<\/span> &amp; gender == <span class=\"string\">'male'<\/span>,:));      <span class=\"comment\">% and gender<\/span>\r\nusage_f = sum(usage(country == <span class=\"keyword\">...<\/span><span class=\"comment\">                          % subset usage by country<\/span>\r\n    <span class=\"string\">'United States of America'<\/span> &amp; gender == <span class=\"string\">'female'<\/span>,:));    <span class=\"comment\">% and gender<\/span>\r\ngender_ratio = usage_m .\/ sum(usage_m);                     <span class=\"comment\">% get male ratio by resource<\/span>\r\ngender_ratio = [gender_ratio; usage_f .\/ sum(usage_f)];     <span class=\"comment\">% add female ratio<\/span>\r\nb = bar(gender_ratio',<span class=\"string\">'FaceColor'<\/span>,[0 .45 .75], <span class=\"keyword\">...<\/span><span class=\"comment\">          % create a bar chart<\/span>\r\n    <span class=\"string\">'FaceAlpha'<\/span>,.6);                                        <span class=\"comment\">% with histogram colors<\/span>\r\nb(2).FaceColor = [.85 .33 .1];                              <span class=\"comment\">% with histogram colors<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.XTickLabel = resources;                                  <span class=\"comment\">% set x tick label<\/span>\r\nax.XTickLabelRotation = 90;                                 <span class=\"comment\">% rotate x tick label<\/span>\r\nax.YTick = 0:0.1:0.3;                                       <span class=\"comment\">% set y tick<\/span>\r\nax.YTickLabel = {<span class=\"string\">'0%'<\/span>,<span class=\"string\">'10%'<\/span>,<span class=\"string\">'20%'<\/span>,<span class=\"string\">'30%'<\/span>};                   <span class=\"comment\">% set y tick label<\/span>\r\ntitle(<span class=\"string\">'US Popular Resources Used'<\/span>)                          <span class=\"comment\">% add title<\/span>\r\nlegend(<span class=\"string\">'Male'<\/span>,<span class=\"string\">'Female'<\/span>)                                     <span class=\"comment\">% add legend<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_15.png\" alt=\"\"> <p>To give an example, I encouraged my daughter to join a robotics competition team in her high school. She talked to her friends because she didn't want to be only girl in the team and a bunch of girls joined the team. When she came home from the first team session, I asked her what she worked on. She said \"we worked on team web page\". It turned out boys worked on the building robots and girls were left out, so they worked on building the team web page. When the kits were delivered to the team, boys just huddled togather among themselves, and didn't bother to include girls. Girls were not consciously excluded, but they felt unwelcome anyway. I suspect similar dynamics may be at play which coding events women go to.<\/p><p>I also wonder the female preference of Front-End Web Development and User Experience Design is also driven by the same issue?<\/p><h4>Summary<a name=\"4badc247-2ff1-4b5a-8e62-c9b79ef12adc\"><\/a><\/h4><p>Perhaps the most intriguing result of this analysis is that the \"learn to code\" movement is effective in closing the gender gap in software development and IT and embraced by the minority community under-served by the traditional educational paths. It also underscores the precarious positions those learners face due to the high student loan debt they carry. Ultimately we don't know how many of them actually achieve employment in their dream job from this survey, and hopefully there is a follow-up to find out whether the \"learn to code\" movement really delivers on its promise.<\/p><p>Do you use any of those \"learn to code\" websites or other MOOCs? What are you learning and what motivates you to take those classes? Please share your experience <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=1687#respond\">here<\/a>!<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_21fb1ea1fbd948fe9a9cab539ed0f627() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='21fb1ea1fbd948fe9a9cab539ed0f627 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 21fb1ea1fbd948fe9a9cab539ed0f627';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2016 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_21fb1ea1fbd948fe9a9cab539ed0f627()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2016a<br><\/p><\/div><!--\r\n21fb1ea1fbd948fe9a9cab539ed0f627 ##### SOURCE BEGIN #####\r\n%% Survey Reveals Diversity in the \"Learn to Code\" Movement\r\n% Do you use any free \"learn to code\" website to teach yourself\r\n% programming? You may already know how to program in MATLAB, but you may\r\n% very well be learning other skills on\r\n% <https:\/\/en.wikipedia.org\/wiki\/Massive_open_online_course MOOCs>.\r\n% \r\n% Today's guest blogger, Toshi, analyzed a publicly available survey data \r\n% to understand the demographic of self-taught coders.  \r\n% \r\n% <<matlab-academy.png>>\r\n%  \r\n%% Load Data\r\n% I came across a Techcrunch article, <http:\/\/techcrunch.com\/2016\/05\/04\/free-code-camp-survey-reveals-demographics-of-self-taught-coders\/ \r\n% Free Code Camp survey reveals demographics of self-taught coders>, and I got \r\n% curious because a lot of people seem to interested in learning how to code, \r\n% and industry and government are also encouraging this trend. But programming \r\n% is hard. Who exactly are the kind of people who have taken the plunge? Our own \r\n% free interactive online programming classes on <https:\/\/matlabacademy.mathworks.com\/ \r\n% MATLAB Academy> or gamified <https:\/\/www.mathworks.com\/matlabcentral\/about\/cody\/ \r\n% MATLAB Cody> are also gaining popularity and I would like to understand what \r\n% motivates this interest. \r\n% \r\n% The survey was conducted anonymously and published on the web and promoted \r\n% via social media from March 28 through May 2, 2016, targeting people who are \r\n% relatively new to programming.\r\n% \r\n% The following analysis shows significant diversity in gender and ethnic \r\n% mix among self-taught coders and the possible impact of MOOCs in opening up \r\n% access for under-served populations by traditional STEM education paths. \r\n% \r\n% I first downloaded the  <https:\/\/github.com\/FreeCodeCamp\/2016-new-coder-survey \r\n% 2016 New Coder Survey result from Github>. I then unzipped the CSV files into \r\n% my current folder. There are two files - part 1 and part 2 - and we will read \r\n% them into separate tables. We could perhaps merge them using |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/innerjoin.html \r\n% innerjoin>|, but in this case I am primarily interested in part 2 only and we \r\n% will be discarding at least 1000 responses from part 1, given the differences \r\n% in number of responses. \r\n\r\nwarning('off','MATLAB:table:ModifiedVarnames')              % suppress warning\r\ncsv = '2016 New Coders Survey Part 1.csv';                  % filename\r\npart1 = readtable(csv);                                     % read into table\r\npart1.Properties.VariableNames = ...                        % format variable names\r\n    regexprep(part1.Properties.VariableNames,'_+$','');     % by removing extra \"_\"\r\ncsv = '2016 New Coders Part 2.csv';                         % filename\r\npart2 = readtable(csv);                                     % read into table \r\npart2.Properties.VariableNames = ...                        % format variable names\r\n    regexprep(part2.Properties.VariableNames,'_+$','');     % by removing extra \"_\"\r\nwarning('on','MATLAB:table:ModifiedVarnames')               % enable warning\r\npart1.SubmitDate_UTC = datetime(part1.SubmitDate_UTC);      % convert date strings to datetime\r\npart2.SubmitDate_UTC = datetime(part2.SubmitDate_UTC);      % convert date strings to datetime\r\ns = sprintf('part1 %d responses from %s thru %s\\n', ...     % summary of part1\r\n    height(part1),datestr(min(part1.SubmitDate_UTC)), ...   % count of responses, start date\r\n    datestr(max(part1.SubmitDate_UTC)));                    % and end date\r\nfprintf('%spart2 %d responses from %s thru %s', ...         % summary of part2\r\n    s,height(part2),datestr(min(part2.SubmitDate_UTC)), ... % count of responses, start date\r\n    datestr(max(part2.SubmitDate_UTC)));                    % and end date\r\n%% Higher Female Representation Than Expected\r\n% Let's start by plotting a histogram of age distrubution. Loren pointed out \r\n% we can use the |omitnan| flag in |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/median.html \r\n% median>| to deal with missing values instead of |<https:\/\/www.mathworks.com\/help\/stats\/nanmedian.html \r\n% nanmedian>|. \r\n% \r\n% The histogram shows that a lot of people who responded to this survey fall \r\n% into the so-called \"millenials\" category. It is interesting to see the number \r\n% of women who responded to this survey, considering the often cited gender gap \r\n% in STEM fields. It is not clear if this reflects the true population or are \r\n% women are over-represented via self-selection? Or somehow self-teaching programming \r\n% more appealing to women than traditional instruction?\r\n\r\nage = part2.HowOldAreYou;                                   % get age from part2\r\ngender = categorical(part2.What_sYourGender);               % get gender from part2 as categorical\r\npart2.What_sYourGender = gender;                            % update table\r\nfigure                                                      % new figure\r\nx = age(age ~= 0 & gender == 'male');                       % subset age by gender\r\nhistogram(x)                                                % plot histogram\r\ntext(50,550, sprintf('Median Age (Male)   : %d', ...        % annotate\r\n    median(x,'omitnan'))) \r\ntext(50,470, sprintf('Mode Age (Male)     : %d',mode(x)))   % annotate\r\nhold on                                                     % don't overwrite\r\nx = age(age ~= 0 & gender == 'female');                     % subset age by gender\r\nhistogram(x)                                                % plot histogram\r\ntext(50,520, sprintf('Median Age (Female): %d', ...         % annotate\r\n    median(x,'omitnan')))  \r\ntext(50,440, sprintf('Mode Age (Female)  : %d',mode(x)))    % annotate\r\nhold off                                                    % restore default\r\ntitle('Age Distribution by Gender')                         % add title\r\nxlabel('Age')                                               % add x axis label\r\nylabel('Count')                                             % add y axis label\r\nlegend('Male','Female')                                     % add legend\r\n%% Mostly Studying in Countries of Citizenship\r\n% Since the survey was done online, anyone could participate. Let's check the \r\n% geographic breakdown. As you would expect, the largest portion of the responses \r\n% came from the US. You can also see that female responses were 40.59% of male \r\n% responses in the US, confirming high female representation in the responses. \r\n% \r\n% China is notably missing from the top 10 countries. Perhaps the \"learn \r\n% to code\" buzz has not caught on there?\r\n\r\npart2.WhichCountryDoYouCurrentlyLiveIn = ...                % convert to categorical\r\n    categorical(part2.WhichCountryDoYouCurrentlyLiveIn);           \r\ncountry = part2.WhichCountryDoYouCurrentlyLiveIn;           % get country of residence\r\ncatcount = countcats(country);                              % get category count\r\ncats = categories(country);                                 % get categories\r\n[~, rank] = sort(catcount,'descend');                       % rank category by count\r\nbelow_top10 = setdiff(cats,cats(rank(1:10)));               % categories below top 10\r\ncountry = mergecats(country, below_top10, 'Other');         % merge them into other \r\ncountry = reordercats(country,[cats(rank(1:10));{'Other'}]);% reorder cats by ranking\r\nratio = sum(country == 'United States of America' & ...     % ratio of female\/male in us\r\n    gender == 'female')\/sum(country == 'United States of America' & gender == 'male');\r\nfigure                                                      % new figure\r\nhistogram(country(gender == 'male'))                        % plot histogram\r\nhold on                                                     % don't overwrite\r\nhistogram(country(gender == 'female'))                      % plot histogram \r\nhold off                                                    % restore default\r\nax = gca;                                                   % get current axes handle\r\nax.XTickLabelRotation = 90;                                 % rotate x tick label\r\ntitle('Country of Residence by Gender')                     % add title\r\nylabel('Count')                                             % add y axis label\r\nlegend('Male','Female')                                     % add legend\r\ntext(1.5, 1900, sprintf('US Female\/Male %.2f%%',ratio*100)) % annotate\r\n%% \r\n% You can also visualize migration patterns by mapping countries of citizenship \r\n% to countries of residence. The number of edges are just 467 - meaning only 467 \r\n% out of all 14,625 responses in part2 are from migrants, and most people live \r\n% and study in their countries of citizenship. If you take the ratio of immigration \r\n% over emigration, US, United Kingdom, Canada, Australia Germany and Russia enjoy \r\n% net gains from any <https:\/\/en.wikipedia.org\/wiki\/Human_capital_flight brain \r\n% drain>.\r\n\r\npart2.WhichCountryAreYouACitizenOf = ...                    % convert to categorical\r\n    categorical(part2.WhichCountryAreYouACitizenOf);           \r\ncitizenship = part2.WhichCountryAreYouACitizenOf;           % get country of citizenship\r\ntbl = table(cellstr(citizenship),cellstr(country));         % create table of residence and citizenship \r\ntbl(isundefined(citizenship) & isundefined(country),:) = [];% drop empty rows\r\ntbl.(1)(strcmp(tbl.(1),'<undefined>')) = ...                % use residence if citizenship is emtpy\r\n    tbl.(2)(strcmp(tbl.(1),'<undefined>'));\r\ntbl.(2)(strcmp(tbl.(2),'<undefined>')) = ...                % use citizenship if residence is emtpy\r\n    tbl.(1)(strcmp(tbl.(2),'<undefined>'));\r\n[tbl, ~, idx] = unique(tbl,'rows');                         % eliminate duplicate rows\r\nw = accumarray(idx, 1);                                     % use count of duplicates as weight\r\nG = digraph(tbl.(1), tbl.(2), w);                           % create a directed graph\r\nindeg = indegree(G);                                        % get in-degrees\r\nratio = indeg.\/outdegree(G);                                % get ratio of in-degrees over out-degrees\r\nfigure                                                      % new figure\r\ncolormap cool                                               % set colormap\r\nw = G.Edges.Weight;                                         % get weights\r\nh = plot(G,'MarkerSize',log(indeg+2),'NodeCData',ratio, ... % plot directional graph\r\n    'EdgeColor',[.7 .7 .7],'EdgeAlpha',0.3,'LineWidth',10*w\/max(w));\r\ncaxis([0 3])                                                % set color axis scaling\r\naxis([-2.8 3.3 -4.5 3.7])                                   % set axis limits\r\ntitle({'Migration Pattern'; ...                             % add title\r\n    '467 cases out of 14,625 responses (3.2%)'}) \r\nlabelnode(h,cats(rank(1:10)),cats(rank(1:10)))              % label top 10 nodes\r\nnlabels = {'Argentina','Azerbaijan','Chile','Congo', ...    % additional nodes to label\r\n    'Cote D''Ivoire','Croatia','Greece','Guyana','Latvia','Lesotho', ...\r\n    'Malta','Other','Paraguay','Philippines','Republic of Serbia','Romania'};\r\nlabelnode(h,nlabels,nlabels);                               % label additional nodes\r\nh = colorbar;                                               % add colorbar\r\nylabel(h, 'in-degrees over out-degree ratio')               % add metric\r\n%% Ethnically Diverse English Speakers in US\r\n% Let's focus on the US. As noted earlier, most new self-taught coders who responded \r\n% to this survey were US citizens, but its ethnic makeup is very diverse. More \r\n% than half of the women are ethnic minorities, and so are 1\/3 of men. They are \r\n% also predominantly English speakers, given the low ratio of immigrants. However, \r\n% we should note that the survey itself was in English and promoted via social \r\n% media in English.\r\n\r\nisminority = part2.AreYouAnEthnicMinorityInYourCountry;     % get monority status\r\nfigure                                                      % new figure\r\nsubplot(1,2,1)                                              % create a subplot\r\nx = gender(country == 'United States of America' ...        % subset \r\n    & isminority == 0);                                     % us non-minority\r\nhistogram(x)                                                % plot histogram\r\nhold on                                                     % don't overwrite\r\nx = gender(country == 'United States of America' ...        % subset gender \r\n    & isminority == 1);                                     % by us minority\r\nhistogram(x)                                                % plot histogram\r\nhold off                                                    % restore default\r\ntitle('US Gender by Ethnic Category')                       % add title\r\nylabel('Count')                                             % add y axis label\r\nlegend('Majority','Minority', 'Location','northwest')       % add legend\r\npart2.WhichLanguageDoYouYouSpeakAtHomeWithYourFamily = ...  % convert to categorical\r\n    categorical(part2.WhichLanguageDoYouYouSpeakAtHomeWithYourFamily);\r\nlanuage = part2.WhichLanguageDoYouYouSpeakAtHomeWithYourFamily;% get language\r\nusa = lanuage(country == 'United States of America');       % extract us data \r\ncatcount = countcats(usa);                                  % get category count\r\ncats = categories(usa);                                     % get categories\r\n[~, rank] = sort(catcount,'descend');                       % rank category by count\r\nbelow_top10 = setdiff(cats,cats(rank(1:10)));               % categories below top 10\r\nusa = mergecats(usa, below_top10, 'Other');                 % merge them into other\r\nusa = reordercats(usa,[cats(rank(1:10)); {'Other'}]);       % reorder cats by ranking\r\nax = gca;                                                   % get current axes handle\r\nax.XTickLabelRotation = 90;                                 % rotate x tick label\r\nsubplot(1,2,2)                                              % create a subplot\r\nhistogram(usa(gender(country == ...                         % plot histogram\r\n    'United States of America') == 'male'))                 % subset us language by gender\r\nhold on                                                     % don't overwrite\r\nhistogram(usa(gender(country == ...                         % plot histogram\r\n    'United States of America') == 'female'))               % subset us language by gender\r\nhold off                                                    % restore default\r\ntitle('US Languages by Gender')                             % add title\r\nylabel('Count')                                             % add y axis label\r\nlegend('Male','Female')                                     % add legend\r\n%% Many Are Highly Educated and Already Employed in the US\r\n% We already know that a lot of people who take MOOCs had already earned college \r\n% degrees and have jobs. This survey also shows the same result. \r\n\r\npart2.What_sTheHighestDegreeOrLevelOfSchoolYouHaveCompleted = ... % convert to categorical\r\n    categorical(part2.What_sTheHighestDegreeOrLevelOfSchoolYouHaveCompleted);\r\ndegree = part2.What_sTheHighestDegreeOrLevelOfSchoolYouHaveCompleted;% get degree\r\nusa = degree(country == 'United States of America');        % extract us data\r\ncatcount = countcats(usa);                                  % get category count\r\ncats = categories(usa);                                     % get categories\r\n[~, rank] = sort(catcount,'descend');                       % rank category by count\r\nusa = reordercats(usa,cats(rank));                          % reorder cats by ranking\r\nfigure                                                      % new figure\r\nsubplot(1,2,1)                                              % create a subplot\r\nhistogram(usa(gender(country == ...                         % plot histogram\r\n    'United States of America') == 'male'))                 % subset us degree by gender\r\nhold on                                                     % don't overwrite\r\nhistogram(usa(gender(country == ...                         % plot histogram\r\n    'United States of America') == 'female'))               % subset us degree by gender\r\nhold off                                                    % restore default\r\ntitle('US Degrees by Gender')                               % add title\r\nylabel('Count')                                             % add y axis label\r\nlegend('Male','Female')                                     % add legend\r\npart2.RegardingEmploymentStatus_AreYouCurrently = ...       % convert to categorical\r\n    categorical(part2.RegardingEmploymentStatus_AreYouCurrently);\r\nemployment = part2.RegardingEmploymentStatus_AreYouCurrently;% get employment\r\nother = part2.Other;                                        % get other in employment\r\nisstudent = zeros(size(other));                             % set up an accumulator\r\nfun = @(x,y) ~cellfun(@isempty,strfind(lower(x),y));        % anonymous function handle\r\nisstudent(fun(other,'student')) = 1;                        % flag if 'studnet' is found\r\nisstudent(fun(other,'studying')) = 1;                       % flag if 'studying' is found\r\nisstudent(fun(other,'school')) = 1;                         % flag if 'school' is found\r\nisstudent(fun(other,'university')) = 1;                     % flag if 'university' is found\r\nisstudent(fun(other,'degree')) = 1;                         % flag if 'degree' is found\r\nisstudent(fun(other,'phd')) = 1;                            % flag if 'phd' is found\r\nemployment(logical(isstudent)) = 'Student';                 % update employment\r\nusa = employment(country == 'United States of America');    % extract us data\r\ncatcount = countcats(usa);                                  % get category count\r\ncats = categories(usa);                                     % get categories\r\n[~, rank] = sort(catcount,'descend');                       % rank category by count\r\nusa = reordercats(usa,cats(rank));                          % reorder cats by ranking\r\nsubplot(1,2,2)                                              % create a subplot\r\nhistogram(usa(gender(country == ...                         % plot histogram\r\n    'United States of America') == 'male'))                 % subset us employment by gender\r\nhold on                                                     % don't overwrite\r\nhistogram(usa(gender(country == ...                         % plot histogram\r\n    'United States of America') == 'female'))               % subset us employment by gender\r\nhold off                                                    % restore default\r\ntitle('US Employment by Gender')                            % add title\r\nylabel('Count')                                             % add y axis label\r\nlegend('Male','Female')                                     % add legend\r\n%% Many Already Work In Software Development and IT in US\r\n% It turns out that many respondents already work in software development and \r\n% IT fields and come from very diverse acadamic backgrounds, including both STEM \r\n% as well as non-STEM subjects. Since the proportion of women tends to be higher \r\n% in non-STEM majors, this may explain why we see higher than expected female \r\n% representation in this survey. It appears that female respondents who studied \r\n% non-STEM majors in undergraduate are now pursuing a career in software development. \r\n% \r\n% Curiously, we also see many computer science majors and they tend to be \r\n% men. Why are people who already have a computer science background pursuing \r\n% self-teaching programming? Shouldn't they have learned it in school?\r\n\r\npart2.WhichFieldDoYouWorkIn = ...                           % convert to categorical\r\n    categorical(part2.WhichFieldDoYouWorkIn);               \r\njob = part2.WhichFieldDoYouWorkIn;                          % get job\r\njob = mergecats(job, {'software development and IT', ...    % merge similar categories\r\n    'software development'});\r\nus_job = job(country == 'United States of America');        % extract us data\r\ncatcount = countcats(us_job );                              % get category count\r\ncats = categories(us_job);                                  % get categories\r\n[~, rank] = sort(catcount,'descend');                       % rank category by count\r\nbelow_top10 = setdiff(cats,cats(rank(1:10)));               % categories below top 10\r\nus_job = mergecats(us_job, below_top10, 'Other');           % merge them into other\r\nus_job = reordercats(us_job,[cats(rank(1:10)); {'Other'}]); % reorder cats by ranking\r\nfigure                                                      % new figure\r\nsubplot(1,2,1)                                              % create a subplot\r\nhistogram(us_job(gender(country == ...                      % plot histogram\r\n    'United States of America') == 'male'))                 % subset us subject by gender\r\nhold on                                                     % don't overwrite\r\nhistogram(us_job(gender(country == ...                      % plot histogram\r\n    'United States of America') == 'female'))               % subset us subject by gender\r\nhold off                                                    % restore default\r\ntitle('US Job Field by Gender')                             % add title\r\nylabel('Count')                                             % add y axis label\r\nlegend('Male','Female')                                     % add legend\r\npart2.WhatWasTheMainSubjectYouStudiedInUniversity = ...     % convert to categorical\r\n    categorical(part2.WhatWasTheMainSubjectYouStudiedInUniversity);  \r\nmajor = part2.WhatWasTheMainSubjectYouStudiedInUniversity;  % get academic major\r\nus_maj = major(country == 'United States of America');      % extract us data\r\ncatcount = countcats(us_maj);                               % get category count\r\ncats = categories(us_maj);                                  % get categories\r\n[~, rank] = sort(catcount,'descend');                       % rank category by count\r\nbelow_top10 = setdiff(cats,cats(rank(1:10)));               % categories below top 10\r\nus_maj = mergecats(us_maj, below_top10, 'Other');           % merge them into other\r\nus_maj = reordercats(us_maj,[cats(rank(1:10)); {'Other'}]); % reorder cats by ranking\r\nsubplot(1,2,2)                                              % create a subplot\r\nhistogram(us_maj(gender(country == ...                      % plot histogram\r\n    'United States of America') == 'male'))                 % subset us subject by gender\r\nhold on                                                     % don't overwrite\r\nhistogram(us_maj(gender(country == ...                      % plot histogram\r\n    'United States of America') == 'female'))               % subset us subject by gender\r\nhold off                                                    % restore default\r\ntitle('US Academic Major by Gender')                        % add title\r\nylabel('Count')                                             % add y axis label\r\nlegend('Male','Female', 'Location', 'northwest')            % add legend\r\n%% \r\n% Here is a quick sanity check. Yes, those computer science majors do seem \r\n% to work in software development and IT, as indicated by the line width of the \r\n% edge between them. \r\n\r\ntbl = table(cellstr(us_maj),cellstr(us_job));               % create table of us major and us job\r\ntbl(isundefined(us_maj) | isundefined(us_job) | ...         % remove undefined\r\n    us_maj == 'Other',:)= [];                               % remove 'Other' from us major\r\n[tbl, ~, idx] = unique(tbl,'rows');                         % eliminate duplicate rows\r\nw = accumarray(idx, 1);                                     % use count of duplicates as weight\r\nG = digraph(tbl.(1), tbl.(2), w);                           % create a directed graph\r\nfigure                                                      % new figure\r\nw = G.Edges.Weight;                                         % get weights\r\nh = plot(G,'Layout','layered','LineWidth',5*w\/max(w));      % plot the directed graph\r\nxlim([-2.5 16])                                             % x-axis limits\r\nhighlight(h, unique(tbl.(2)),'NodeColor',[.85 .33 .1])      % highlight job nodes\r\ntitle({'US Majors vs. Job Fields'; ...                      % add title\r\n    'Line Width Varies by Frequency'})                           \r\ntext(-2, 2, 'Majors', 'FontWeight','Bold')                  % annotate\r\ntext(-2, 1, 'Job Fields', 'FontWeight','Bold')              % annotate\r\nannotation('arrow',[.2 .2],[.75 .25], 'Color',[0 .45 .75])  % annotate\r\n%% Academic Background in Software Development and IT\r\n% Taking a deep dive in Software Development and IT, we see that not everyone \r\n% that is a computer science major and has a diverse academic background is represented \r\n% in the industry, and they are probably in different career tracks based on their \r\n% background. Women represent a higher proportion in English, Psychology and Other. \r\n\r\nus_maj_it = major(country == 'United States of America' ... % subset major by country\r\n    & job == 'software development and IT');                % and job\r\ncatcount = countcats(us_maj_it);                            % get category count\r\ncats = categories(us_maj_it);                               % get categories\r\n[~, rank] = sort(catcount,'descend');                       % rank category by count\r\nbelow_top10 = setdiff(cats,cats(rank(1:10)));               % categories below top 10\r\nus_maj_it = mergecats(us_maj_it, below_top10, 'Other');     % merge them into other\r\nus_maj_it = reordercats(us_maj_it,[cats(rank(1:10)); {'Other'}]);% reorder cats by ranking\r\nfigure                                                      % new figure\r\nhistogram(us_maj_it(gender(country == ...                   % plot histogram\r\n    'United States of America' & ...                        % subset us subject by job\r\n    job == 'software development and IT') == 'male'))       % and gender\r\nhold on                                                     % don't overwrite\r\nhistogram(us_maj_it(gender(country == ...                   % plot histogram\r\n    'United States of America' & ...                        % subset us subject by job\r\n    job == 'software development and IT') == 'female'))     % and gender\r\nhold off                                                    % restore default\r\nax = gca;                                                   % get current axes handle\r\nax.XTickLabelRotation = 90;                                 % rotate x tick label\r\ntitle('US Software Development and IT - Majors by Gender')  % add title\r\nylabel('Count')                                             % add y axis label\r\nlegend('Male','Female', 'Location', 'northwest')            % add legend\r\n%% Wide Income Gap in Software Development and IT\r\n% Let's compare the income range by Job Field using a <https:\/\/www.mathworks.com\/help\/stats\/boxplot.html \r\n% box plot>. The bottom and top of the box represents the first and third quantiles \r\n% and the middle red line represents the median, and whiskers represent +\/- 2.7 \r\n% standard deviations. Red \"+\"s show the outliers.  \r\n% \r\n% Compared to other job fields, income range in software development and \r\n% IT has a wide spread (as indicated by the elongnated box shape and longer whisker), \r\n% meaning there is good upside potential to do better. \r\n\r\nincome = part2.AboutHowMuchMoneyDidYouMakeLastYear_inUSDollars;% get income\r\nincome = str2double(income);                                % convert to numeric\r\nincome(income == 0) = NaN;                                  % don't count zero\r\nus_income = income(country == 'United States of America');  % extract us data\r\nfigure                                                      % new figure\r\nboxplot(us_income,us_job)                                   % create a box plot\r\nylim([0 2*10^5])                                            % set upper limit\r\ntitle('US Income Distribution by Job Field')                % add title\r\nax = gca;                                                   % get current axes handle\r\nax.XTickLabelRotation = 90;                                 % rotate x tick label\r\nax.YTickLabel = {'$0','$50k','$100k','$150k','$200k'};      % set y tick label\r\nylabel('Annual Income')                                     % add y axis label\r\n%% What Affects Income in Software Development and IT?\r\n% The first factor that may affect the income in software development and IT \r\n% is academic background. The box plot shows that Computer Science and Electric \r\n% Engineering give you the most advantage in getting a higher salary. This is \r\n% probably the motivation behind the self-learning, how-to-code trend - people \r\n% want to switch to a more lucrativie career path from their current path or advance \r\n% more quickly within the same industry. \r\n\r\nus_income_it = income(country == 'United States of America' ...% subset income by country\r\n    & job == 'software development and IT');                % and job\r\nfigure                                                      % new figure\r\nboxplot(us_income_it,us_maj_it)                             % create a box plot\r\nylim([0 2*10^5])                                            % set upper limit\r\ntitle({'US Income Distribution by Major', ....              % add title\r\n    'in Software Development and IT'})\r\nax = gca;                                                   % get current axes handle\r\nax.XTickLabelRotation = 90;                                 % rotate x tick label\r\nax.YTickLabel = {'$0','$50k','$100k','$150k','$200k'};      % set y tick label\r\nylabel('Annual Income')                                     % add y axis label\r\n%% Age Factor\r\n% Another important factor to consider is age. If you plot the age against\r\n% income in Software Development and IT, you see a wide income gap among\r\n% younger Computer Science majors. Some 25 year-olds can be earning $0\r\n% vs.$110,000 a year. Income also seems to plateau as you age. You can use\r\n% |<https:\/\/www.mathworks.com\/help\/curvefit\/fit.html fit>| with the\r\n% |<https:\/\/www.mathworks.com\/help\/curvefit\/list-of-library-models-for-curve-and-surface-fitting.html#btbcvnl\r\n% exp2>| option to <https:\/\/www.mathworks.com\/help\/curvefit\/exponential.html\r\n% apply a two-term exponential curve> to the data so you can see it easily.\r\n% Perhaps this provides motivation for CS majors to improve their skills\r\n% and experience as quickly as possible?\r\n\r\nus_age_it = age(country == 'United States of America' ...   % subset age by country\r\n    & job == 'software development and IT');                % and job\r\nX = us_age_it(us_maj_it == 'Computer Science');             % subset just CS\r\nY = us_income_it(us_maj_it == 'Computer Science');          % subset just CS\r\nfigure                                                      % new figure\r\nplot(X, Y,'o')                                              % plot data\r\nhold on                                                     % don't overwrite\r\nmissingrows = isnan(X) | isnan(Y);                          % find NaNs\r\nX(missingrows) = [];                                        % remove NaNs\r\nY(missingrows) = [];                                        % remove NaNs\r\nfitresult = fit(X,Y,'exp2');                                % fit to exp2\r\nplot(fitresult)                                             % plot curve\r\nhold off                                                    % restore default\r\ntitle({'Income by Age Among CS Majors'; ...                 % add title\r\n    'in Software Development and IT'})\r\nxlim([10 60])                                               % set x axis limits\r\nylim([0 2*10^5])                                            % set y axis limits\r\nax = gca;                                                   % get current axes handle\r\nax.YTick = 0:50000:200000;                                  % set y tick\r\nax.YTickLabel = {'$0','$50k','$100k','$150k','$200k'};      % set y tick label\r\nxlabel('Age')                                               % add x axis label\r\nylabel('Annual Income')                                     % add y axis label\r\n%% Big Companies Not Preferred\r\n% If you look at the future employment those people are looking for, they prefer \r\n% big companies the least. Since people are not earning a formal degree, they \r\n% are probably expecting more flexibility in other employment options. \r\n% \r\n% The most popular option is mid-sized companies, but many people are interested \r\n% in working for a startup, or starting their own businesses, or freelancing. \r\n% People in software development and IT tend to prefer working for startup or \r\n% mid-sized companies more. Men tend to prefer doing their own businesses or work \r\n% for a startup, while women tend to prefer the freelance path. \r\n\r\npart2.want_employment_type = ...                            % convert to categorical\r\n    categorical(part2.want_employment_type);    \r\ninterested_emp = part2.want_employment_type;                % get interested employment type\r\nfigure                                                      % new figure\r\nsubplot(1,2,1)                                              % create a subplot\r\nus_int_emp_it = interested_emp(country == ...               % subset it by country and job\r\n    'United States of America' & job == 'software development and IT');\r\nus_int_emp_it(isundefined(us_int_emp_it)) = [];             % remove undefined\r\nus_int_emp_it = removecats(us_int_emp_it);                  % remove unused categories\r\nhistogram(us_int_emp_it,'Normalization','probability')      % plot histogram\r\nhold on                                                     % don't overwrite\r\nus_int_emp_non_it = interested_emp(country == ...           % subset it by country and job\r\n    'United States of America' & job ~= 'software development and IT');\r\nus_int_emp_non_it(isundefined(us_int_emp_non_it)) = [];     % remove undefined\r\nus_int_emp_non_it = removecats(us_int_emp_non_it);          % remove unused categories\r\nhistogram(us_int_emp_non_it,'Normalization','probability')  % plot histogram\r\nhold off                                                    % restore default\r\ntitle({'US Desired Employment Type';'by Job Field'})        % add title\r\nax = gca;                                                   % get current axes handle\r\nax.XTickLabelRotation = 90;                                 % rotate x tick label\r\nax.YTick = 0:0.1:0.6;                                       % set y tick\r\nax.YTickLabel = {'0%','10%','20%','30%','40%','50%','60%'}; % set y tick label\r\nlegend('Software Dev and IT', 'Others')                     % add legend\r\nylim([0 0.6])                                               % set y axis limits\r\nsubplot(1,2,2)                                              % create a subplot\r\nus_int_emp_m = interested_emp(country == ...                % subset it by country\r\n    'United States of America' & gender == 'male');         % gender\r\nus_int_emp_m(isundefined(us_int_emp_m)) = [];               % remove undefined\r\nus_int_emp_m = removecats(us_int_emp_m);                    % remove unused categories\r\nhistogram(us_int_emp_m,'Normalization','probability')       % plot histogram\r\nhold on                                                     % don't overwrite\r\nus_int_emp_f = interested_emp(country == ...                % subset it by country\r\n    'United States of America' & gender == 'female');       % gender\r\nus_int_emp_f(isundefined(us_int_emp_f)) = [];               % remove undefined\r\nus_int_emp_f = removecats(us_int_emp_f);                    % remove unused categories\r\nhistogram(us_int_emp_f,'Normalization','probability')       % plot histogram\r\nhold off                                                    % restore default\r\ntitle({'US Desired Employment Type';'by Gender'})           % add title\r\nax = gca;                                                   % get current axes handle\r\nax.XTickLabelRotation = 90;                                 % rotate x tick label\r\nax.YTick = 0:0.1:0.6;                                       % set y tick\r\nax.YTickLabel = {'0%','10%','20%','30%','40%','50%','60%'}; % set y tick label\r\nlegend('Male', 'Female')                                    % add legend\r\nylim([0 0.6])                                               % set y axis limits\r\n%% Dream Jobs\r\n% When it comes to actual jobs people are interested in, they are mostly web \r\n% development positions. People already in software development and IT tend to \r\n% prefer roles with higher technical skills - <https:\/\/en.wikipedia.org\/wiki\/Front_and_back_ends \r\n% Back-End Web Development>, <https:\/\/en.wikipedia.org\/wiki\/DevOps DevOps>, or \r\n% <https:\/\/en.wikipedia.org\/wiki\/System_administrator SysAdmin> rather than <https:\/\/en.wikipedia.org\/wiki\/Front-end_web_development \r\n% Front-End Web Development>, or other non-development roles such as Product Manager \r\n% or QA Engineer. In terms of gender, women tend to prefer Front-End Web Development \r\n% and <https:\/\/en.wikipedia.org\/wiki\/User_experience_design User Experience Design>.\r\n\r\nint_job = categorical(strtrim(part2.jobs_interested_in));   % get interested job\r\ncatcount = countcats(int_job);                              % get category count\r\ncats = categories(int_job);                                 % get categories\r\n[~, rank] = sort(catcount,'descend');                       % rank category by count\r\nbelow_top10 = setdiff(cats,cats(rank(1:10)));               % categories below top 10\r\nint_job = mergecats(int_job, below_top10,'Other');          % merge them into other\r\nint_job = reordercats(int_job,[cats(rank(1:10));{'Other'}]);% reorder cats by ranking\r\nfigure                                                      % new figure\r\nsubplot(1,2,1)                                              % create a subplot\r\nus_int_job_non_it = int_job(country == ...                  % subset int job by country and job\r\n     'United States of America' & job ~= 'software development and IT');\r\nhistogram(us_int_job_non_it, 'Normalization','probability') % plot histogram\r\nhold on                                                     % don't overwrite\r\nus_int_job_it = int_job(country == ...                      % subset int job by country and job\r\n     'United States of America' & job == 'software development and IT');               \r\nhistogram(us_int_job_it, 'Normalization','probability')     % plot histogram\r\nhold off                                                    % restore default\r\ntitle({'US Jobs Interested In';'By Job Field'})             % add title\r\nlegend('All Others','Software Dev & IT')                    % add legend\r\nax = gca;                                                   % get current axes handle\r\nax.YTick = 0:0.1:0.5;                                       % set y tick\r\nax.YTickLabel = {'0%','10%','20%','30%','40%','50%'};       % set y tick label\r\nylim([0 0.5])                                               % set y axis limits\r\nsubplot(1,2,2)                                              % create a subplot\r\nus_int_job_m = int_job(country == ...                       % subset int job by country\r\n     'United States of America' & gender == 'male');        % and gender\r\nhistogram(us_int_job_m, 'Normalization','probability')      % plot histogram\r\nhold on                                                     % don't overwrite\r\nus_int_job_f = int_job(country == ...                       % subset int job by country\r\n     'United States of America' & gender == 'female');      % and gender\r\nhistogram(us_int_job_f, 'Normalization','probability')      % plot histogram\r\nhold off                                                    % restore default\r\ntitle({'US Jobs Interested In';'By Gender'})                % add title\r\nlegend('Male','Female')                                     % add legend\r\nax = gca;                                                   % get current axes handle\r\nax.YTick = 0:0.1:0.5;                                       % set y tick\r\nax.YTickLabel = {'0%','10%','20%','30%','40%','50%'};       % set y tick label\r\nylim([0 0.5])                                               % set y axis limits\r\n%% Student Loan Debt\r\n% The survey also answers how much student loan debt respondents carry and how \r\n% much they spend learning to code. Over 41% of the respondents have student loan \r\n% debt and the median amount owed is $25,000.  In addition, people do spend during \r\n% the course of learning to code, and the median total spend is $300. Given that \r\n% a lot of people have debt, they cannot afford to spend more and add to their \r\n% deficit, and that also reflects on the more conservative choices in future employment. \r\n\r\ndebt = part2.AboutHowMuchDoYouOweInStudentLoans_inUSDollars;% get student load debt\r\ndebt = str2double(debt);                                    % convert to numeric\r\ndebt(debt == 0) = NaN;                                      % don't count zero\r\nus_debt = debt(country == 'United States of America');      % extract us data\r\npct_in_debt = sum(~isnan(us_debt))\/length(us_debt)*100;     % percentage in debt\r\nmedian_debt = nanmedian(us_debt)\/1000;                      % median debt\r\nfigure                                                      % new figure\r\nsubplot(1,2,1)                                              % create a subplot\r\nhistogram(us_debt)                                          % plot histogram\r\nxlim([0 2*10^5])                                            % set y axis limits\r\nax = gca;                                                   % get current axes handle\r\nax.XTick = 0:50000:200000;                                  % set x tick\r\nax.XTickLabel = {'$0','$50k','$100k','$150k','$200k'};      % set x tick label\r\nxlabel('Amount Owed')                                       % add x axis label\r\nylabel('Count')                                             % add y axis label\r\ntitle({'US Student Loan Debt'; ...                          % add title\r\n    sprintf('%.2f%% in Debt (Median $%dk)',pct_in_debt,median_debt)})                               \r\nsubplot(1,2,2)                                              % create a subplot\r\nspend = part2.total_spent_learning;                         % get total spend\r\nspend = str2double(spend);                                  % convert to numeric\r\nspend(spend == 0) = NaN;                                    % don't count zero\r\nus_spend = spend(country == 'United States of America');    % extract us data\r\nhistogram(us_spend)                                         % plot histogram\r\nxlim([0 3*10^4])                                            % set y axis limits\r\nax = gca;                                                   % get current axes handle\r\nax.XTick = 0:10000:30000;                                   % set x tick\r\nax.XTickLabel = {'$0','$10k','$20k','$30k'};                % set x tick label\r\nxlabel('Total Spend')                                       % add x axis label\r\nylabel('Count')                                             % add y axis label\r\ntitle({'US Spend Learning'; sprintf('Median $%d', ...       % add title\r\n    nanmedian(us_spend))})                                  \r\n%% Women Prefer More Welcoming Venues\r\n% This survey seems to show more female participation in the \"learn to code\" \r\n% movement compared to a more tranditional computer science education. When you \r\n% look at the type of events women prefer, they show strong preference for gender-specific \r\n% events like \"Girl Develop It\" and \"Women Who Code\". When you look at the online \r\n% resources, you don't see much difference by gender. It appears that physical \r\n% presence of males makes women feel unwelcome.  \r\n\r\nevents_attended = part2.attended_event_types;               % get events attended\r\nevents_attended = cellfun(@(x) strsplit(x,','), ...         % split by comma\r\n    events_attended,'UniformOutput',false);\r\nevents_attended_flatten = strtrim([events_attended{:}]);    % un-nest and trim\r\n[~,ia,ib] = unique(lower(events_attended_flatten));         % get indices of uniques\r\nevents = events_attended_flatten(ia);                       % get unique values\r\ncount = accumarray(ib,1);                                   % count unique values\r\nevents(count < 100) = [];                                   % drop unpopular events\r\nevents(strcmpi(events,'none')) = [];                        % drop 'none'\r\nevents(cellfun(@isempty,events)) = [];                      % drop empty cell\r\nattended = zeros(size(events_attended,1),length(events));   % set up accumulator\r\nfor i = 1:size(events_attended,1)                           % loop over events attended\r\n    attended(i,:) = ismember(events,strtrim( ...            % find intersection between\r\n        events_attended{i}));                               % events and attended events\r\nend\r\nattended_m = sum(attended(country == ...                    % subset attended by country\r\n    'United States of America' & gender == 'male',:));      % and gender\r\nattended_f = sum(attended(country == ...                    % subset attended by country\r\n    'United States of America' & gender == 'female',:));    % and gender\r\ngender_ratio = attended_m .\/ sum(attended_m);               % get male ratio by event\r\ngender_ratio = [gender_ratio; attended_f.\/sum(attended_f)]; % add female ratio\r\nfigure                                                      % new figure\r\nsubplot(1,2,1)                                              % create a subplot\r\nb = bar(gender_ratio','FaceColor',[0 .45 .75], ...          % create a bar chart\r\n    'FaceAlpha',.6);                                        % with histogram colors\r\nb(2).FaceColor = [.85 .33 .1];                              % with histogram colors\r\nax = gca;                                                   % get current axes handle\r\nax.XTickLabel = events;                                     % set x tick label\r\nax.XTickLabelRotation = 90;                                 % rotate x tick label\r\nax.YTick = 0:0.1:0.4;                                       % set y tick\r\nax.YTickLabel = {'0%','10%','20%','30%','40%'};             % set y tick label\r\ntitle('US Popular Events Attended')                         % add title\r\nlegend('Male','Female')                                     % add legend\r\nsubplot(1,2,2)                                              % create a subplot\r\nresources_used = part2.learning_resources;                  % get resources used\r\nresources_used = cellfun(@(x) strsplit(x,','), ...          % split by comma\r\n    resources_used,'UniformOutput',false);\r\nresources_used_flatten = strtrim([resources_used{:}]);      % un-nest and trim\r\n[~,ia,ib] = unique(lower(resources_used_flatten));          % get indices of uniques\r\nresources = resources_used_flatten(ia);                     % get unique values\r\ncount = accumarray(ib,1);                                   % count unique values\r\nresources(count < 100) = [];                                % drop unpopular resources\r\nresources(cellfun(@isempty,resources)) = [];                % drop empty cell\r\nusage = zeros(size(resources_used,1),length(resources));    % set up accumulator\r\nfor i = 1:size(resources_used,1)                            % loop over resources used\r\n    usage(i,:) = ismember(resources,strtrim( ...            % find intersection between\r\n        resources_used{i}));                                % resources and resource used\r\nend\r\nusage_m = sum(usage(country == ...                          % subset usage by country\r\n    'United States of America' & gender == 'male',:));      % and gender\r\nusage_f = sum(usage(country == ...                          % subset usage by country\r\n    'United States of America' & gender == 'female',:));    % and gender\r\ngender_ratio = usage_m .\/ sum(usage_m);                     % get male ratio by resource\r\ngender_ratio = [gender_ratio; usage_f .\/ sum(usage_f)];     % add female ratio\r\nb = bar(gender_ratio','FaceColor',[0 .45 .75], ...          % create a bar chart\r\n    'FaceAlpha',.6);                                        % with histogram colors\r\nb(2).FaceColor = [.85 .33 .1];                              % with histogram colors\r\nax = gca;                                                   % get current axes handle\r\nax.XTickLabel = resources;                                  % set x tick label\r\nax.XTickLabelRotation = 90;                                 % rotate x tick label\r\nax.YTick = 0:0.1:0.3;                                       % set y tick\r\nax.YTickLabel = {'0%','10%','20%','30%'};                   % set y tick label\r\ntitle('US Popular Resources Used')                          % add title\r\nlegend('Male','Female')                                     % add legend\r\n%% \r\n% To give an example, I encouraged my daughter to join a robotics competition \r\n% team in her high school. She talked to her friends because she didn't want to \r\n% be only girl in the team and a bunch of girls joined the team. When she came \r\n% home from the first team session, I asked her what she worked on. She said \"we \r\n% worked on team web page\". It turned out boys worked on the building robots and \r\n% girls were left out, so they worked on building the team web page. When the \r\n% kits were delivered to the team, boys just huddled togather among themselves, \r\n% and didn't bother to include girls. Girls were not consciously excluded, but \r\n% they felt unwelcome anyway. I suspect similar dynamics may be at play which \r\n% coding events women go to. \r\n% \r\n% I also wonder the female preference of Front-End Web Development and User \r\n% Experience Design is also driven by the same issue?\r\n%% Summary\r\n% Perhaps the most intriguing result of this analysis is that the \"learn to \r\n% code\" movement is effective in closing the gender gap in software development \r\n% and IT and embraced by the minority community under-served by the traditional \r\n% educational paths. It also underscores the precarious positions those learners \r\n% face due to the high student loan debt they carry. Ultimately we don't know \r\n% how many of them actually achieve employment in their dream job from this survey, \r\n% and hopefully there is a follow-up to find out whether the \"learn to code\" movement \r\n% really delivers on its promise. \r\n% \r\n% Do you use any of those \"learn to code\" websites or other MOOCs? What are\r\n% you learning and what motivates you to take those classes? Please share\r\n% your experience <https:\/\/blogs.mathworks.com\/loren\/?p=1687#respond here>!\r\n% \r\n% \r\n% \r\n% \r\n% \r\n%\r\n##### SOURCE END ##### 21fb1ea1fbd948fe9a9cab539ed0f627\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/newcoderFinal_15.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>Do you use any free \"learn to code\" website to teach yourself programming? You may already know how to program in MATLAB, but you may very well be learning other skills on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Massive_open_online_course\">MOOCs<\/a>.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2016\/06\/27\/survey-reveals-diversity-in-the-learn-to-code-movement\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[61,48],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1687"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=1687"}],"version-history":[{"count":2,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1687\/revisions"}],"predecessor-version":[{"id":1689,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1687\/revisions\/1689"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=1687"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=1687"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=1687"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}