{"id":1023,"date":"2014-10-16T07:45:31","date_gmt":"2014-10-16T12:45:31","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=1023"},"modified":"2017-01-06T11:14:45","modified_gmt":"2017-01-06T16:14:45","slug":"taking-the-pulse-of-moocs","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2014\/10\/16\/taking-the-pulse-of-moocs\/","title":{"rendered":"Taking the Pulse of MOOCs"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p><a href=\"https:\/\/www.coursera.org\/\">Coursera<\/a> is a technology platform that kickstarted the current <a href=\"http:\/\/en.wikipedia.org\/wiki\/Massive_open_online_course\">MOOCs<\/a> boom. Even though there are more MOOCs players now, it still remains one of the leading companies in this space. But how are they doing these days for delivering higher education to the masses online?<\/p><p>Today's guest blogger, <a href=\"\">Toshi Takeuchi<\/a>, would like to share an analysis using Courera's data.<\/p><p>I am a big fan of MOOCs and I benefited a lot from free online courses on Coursera, such as Stanford's <a href=\"https:\/\/www.coursera.org\/course\/ml\">Machine Learning<\/a> course. Like many websites these days, Coursera offers its data through <a href=\"http:\/\/en.wikipedia.org\/wiki\/Representational_state_transfer\">REST APIs<\/a>. Coursera offers a number of APIs, but Catalog APIs are available without OAuth authentication. We can find out the details of courses offered by Coursera with these APIs.<\/p><p>We can try to answer questions like <i>\"how do STEM and non-STEM courses break down among universities?<\/i>\"<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#6ba741d7-c882-4c11-9473-9bdfb1dafb6c\">JSON support in R2014b<\/a><\/li><li><a href=\"#f136380a-9773-4271-b18d-1a44e73000c4\">Plotting courses vs sessions by university<\/a><\/li><li><a href=\"#594ba15d-021e-4695-8eb8-7a96d9f7014c\">Plotting STEM ratios by university<\/a><\/li><li><a href=\"#35ea18f6-c5bf-40a3-ad10-35e95cac7b9e\">Plotting ratio of courses per category<\/a><\/li><li><a href=\"#ad13358e-feb9-4e4e-87a9-5f3f44ce1c54\">Summary<\/a><\/li><\/ul><\/div><h4>JSON support in R2014b<a name=\"6ba741d7-c882-4c11-9473-9bdfb1dafb6c\"><\/a><\/h4><p><a href=\"http:\/\/en.wikipedia.org\/wiki\/JSON\">JSON<\/a> is a very common data format for REST APIs, and Coursera's APIs also returns results in JSON format. MATLAB now supports JSON out of the box in <a href=\"https:\/\/www.mathworks.com\/products\/matlab\/whatsnew.html\">R2014b<\/a>. You could always use JSON from within MATLAB by taking advantage of user contributed MATLAB programs on <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/\">File Exchange<\/a>, but built-in JSON support makes it easy for us to share scripts that use JSON, because we don't have to worry about dependencies.<\/p><p>Let's try the new feature using Coursera APIs. Calling a REST API is very simple with <tt>webread<\/tt>.<\/p><pre class=\"codeinput\">restApi=<span class=\"string\">'https:\/\/api.coursera.org\/api\/catalog.v1\/courses'<\/span>;\r\nparams = <span class=\"string\">'sessions,universities,categories'<\/span>;\r\nresp=webread(restApi,<span class=\"string\">'includes'<\/span>,params,weboptions(<span class=\"string\">'Timeout'<\/span>,60));\r\n<\/pre><p><tt>webread<\/tt> returns the JSON response as a structure array. The data is further processed in a separate script <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/processData.m\"><tt>processData.m<\/tt><\/a> - check out the details if interested.<\/p><p>We need to decide which categories represent STEM subjects. When there are multiple categories assigned to a given course, we treat it as a STEM course as long as one of them is included in STEM categories.<\/p><pre class=\"codeinput\">processData\r\n<\/pre><pre class=\"codeoutput\">STEM categories\r\n    'Computer Science: Theory'\r\n    'Economics &amp; Finance'\r\n    'Medicine'\r\n    'Mathematics'\r\n    'Physical &amp; Earth Sciences'\r\n    'Biology &amp; Life Sciences'\r\n    'Computer Science: Systems &amp; Security'\r\n    'Computer Science: Software Engineering'\r\n    'Engineering'\r\n    'Statistics and Data Analysis'\r\n    'Computer Science: Artificial Intelligence'\r\n    'Physics'\r\n    'Chemistry'\r\n    'Energy &amp; Earth Sciences'\r\n<\/pre><h4>Plotting courses vs sessions by university<a name=\"f136380a-9773-4271-b18d-1a44e73000c4\"><\/a><\/h4><p>As a sanity check, let's plot the number of courses vs. number of sessions by university. A single course can be offered repeatedly in multiple sessions. Therefore you can determine the longevity or age of a given course by the count of sessions.<\/p><p>If it is a new course, or it was not repeated, then you only have one session per course. We can use this as the baseline, and check how universities scaled up their courses relative to this baseline.<\/p><p>R2014b comes with new MATLAB Graphics System, but you can still use the familiar commands for plotting.<\/p><pre class=\"codeinput\"><span class=\"comment\">% group by number of courses<\/span>\r\ngrouping = ones(height(universities),1)*2;\r\ngrouping(universities.courses &gt; 25) =  1;\r\ngrouping(universities.courses &lt;= 10) = 3;\r\n\r\n<span class=\"comment\">% plot<\/span>\r\nfigure\r\ngscatter(universities.courses,universities.sessions,grouping)\r\nh = refline(1,0); set(h,<span class=\"string\">'Color'<\/span>,<span class=\"string\">'m'<\/span>,<span class=\"string\">'LineStyle'<\/span>,<span class=\"string\">':'<\/span>)\r\nh = refline(2,0); set(h,<span class=\"string\">'Color'<\/span>,<span class=\"string\">'m'<\/span>,<span class=\"string\">'LineStyle'<\/span>,<span class=\"string\">':'<\/span>)\r\nh = refline(3,0); set(h,<span class=\"string\">'Color'<\/span>,<span class=\"string\">'m'<\/span>,<span class=\"string\">'LineStyle'<\/span>,<span class=\"string\">':'<\/span>)\r\nh = refline(6,0); set(h,<span class=\"string\">'Color'<\/span>,<span class=\"string\">'m'<\/span>,<span class=\"string\">'LineStyle'<\/span>,<span class=\"string\">':'<\/span>)\r\nxlabel(<span class=\"string\">'Number of Courses'<\/span>); ylabel(<span class=\"string\">'Number of Sessions'<\/span>);\r\ntitle(<span class=\"string\">'\\fontsize{14} Courses by Sessions by University'<\/span>);\r\nlegend(<span class=\"string\">'Universities with 25+ courses'<\/span>,<span class=\"string\">'Universities with 10+ courses'<\/span>,<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'Universities with 1-10 courses'<\/span>,<span class=\"string\">'Ref line: 1 session per course'<\/span>,<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'Ref line: 2 sessions per course'<\/span>,<span class=\"string\">'Ref line: 3 sessions per course'<\/span>,<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'Ref line: 6 sessions per course'<\/span>,<span class=\"string\">'Location'<\/span>,<span class=\"string\">'NorthWest'<\/span>)\r\n<span class=\"comment\">% add university names<\/span>\r\n<span class=\"keyword\">for<\/span> i = 1:height(universities)\r\n    <span class=\"keyword\">if<\/span> universities.courses(i) &gt; 10 &amp;&amp; universities.sessions(i) &gt; 20\r\n        text(universities.courses(i),universities.sessions(i),<span class=\"keyword\">...<\/span>\r\n            universities.shortName{i},<span class=\"string\">'FontSize'<\/span>,12)\r\n    <span class=\"keyword\">end<\/span>\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/coursera_01.png\" alt=\"\"> <p>You can see that Stanford, Penn (University of Pennsylvania), JHU (Johns Hopkins), and Duke are leading the pack. They are the early adopters, based on the number of sessions. It is interesting to see PKU (Peking University) leading international institutions. They offer a number of courses in Chinese. Coursera didn't start international partnership until recently, so it is quite remarkable the PKU has broadened their online content in relatively short time. More recent entrants are on the left with fewer courses and sessions.<\/p><p>Established players are trying to scale up by repeating the sessions. JHU seems to be particularly aggressive in terms of the number of courses they offer and how they are repeated as sessions.<\/p><h4>Plotting STEM ratios by university<a name=\"594ba15d-021e-4695-8eb8-7a96d9f7014c\"><\/a><\/h4><p>Let's plot the number of courses by ratio of STEM courses by university. This will tell us which schools are making investments in online education content, and whether they focus on STEM or non-STEM subjects. The size of the marker indicates the total number of sessions they are associated with, so it also gives us how long they have been involved in Coursera. Notice the <tt>parula<\/tt> colormap used in the colorbar, the new default colormap in R2014b.<\/p><pre class=\"codeinput\"><span class=\"comment\">% use sesion count for setting marker sizes<\/span>\r\nmarkerSize = universities.sessions;\r\n<span class=\"comment\">% we need to scale the marker size<\/span>\r\nmarkerSize = (markerSize - min(markerSize))\/(max(markerSize)-min(markerSize));\r\nmarkerSize = markerSize * 1000; markerSize(markerSize == 0) = 1;\r\n<span class=\"comment\">% change the tick labels to reflect the original values<\/span>\r\nbarticks = num2cell(20:20:200);\r\n<span class=\"comment\">% create a scatter plot<\/span>\r\nfigure\r\nscatter(universities.courses,universities.stem_ratio,markerSize,markerSize,<span class=\"string\">'fill'<\/span>)\r\nxlim([0 40])\r\nh = colorbar(<span class=\"string\">'TickLabels'<\/span>,barticks);\r\nh.Label.String = <span class=\"string\">'\\fontsize{11}Number of Sessions'<\/span>;\r\ntitle(<span class=\"string\">'\\fontsize{14} Ratio of STEM courses by University on Coursera'<\/span>)\r\nxlabel(<span class=\"string\">'\\fontsize{11}Number of Courses'<\/span>); ylabel(<span class=\"string\">'\\fontsize{11}Ratio of STEM Courses'<\/span>);\r\n<span class=\"comment\">% add university names<\/span>\r\n<span class=\"keyword\">for<\/span> i = 1:height(universities)\r\n    <span class=\"keyword\">if<\/span> universities.stem_ratio(i) ~= 0 &amp;&amp; universities.stem_ratio(i) ~= 1 &amp;&amp; universities.courses(i) &gt;= 5\r\n        text(universities.courses(i),universities.stem_ratio(i),universities.shortName{i},<span class=\"string\">'FontSize'<\/span>,12)\r\n    <span class=\"keyword\">end<\/span>\r\n<span class=\"keyword\">end<\/span>\r\n<span class=\"comment\">% add reference lines<\/span>\r\nline([25 25],[0 1],<span class=\"string\">'LineStyle'<\/span>,<span class=\"string\">':'<\/span>)\r\nline([10 10],[0 1],<span class=\"string\">'LineStyle'<\/span>,<span class=\"string\">':'<\/span>)\r\nline([0 40],[0.5 0.5],<span class=\"string\">'LineStyle'<\/span>,<span class=\"string\">':'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/coursera_02.png\" alt=\"\"> <p>Stanford is very heavy on STEM subjects, while others are more balanced. More recent entrants on the left have a wider variance in how STEM heavy their courses are. Perhaps rate of adaption is different among different academic disciplines?<\/p><h4>Plotting ratio of courses per category<a name=\"35ea18f6-c5bf-40a3-ad10-35e95cac7b9e\"><\/a><\/h4><p>We can plot the ratio of courses per category in order to see the relative representation of academic disciplines on Coursera. A course can belong to multiple categories, and in such cases a count is split equally across the included categories. Note that you can now rotate axis tick labels in R2014b.<\/p><pre class=\"codeinput\"><span class=\"comment\">% get the count of categories by university<\/span>\r\ncatByUniv = zeros(height(universities),height(categories));\r\n<span class=\"keyword\">for<\/span> i = 1:length(T.categories)\r\n    row = ismember(universities.id,T.universities(i));\r\n    col = ismember(categories.id,T.categories{i});\r\n    catByUniv(row,col) = catByUniv(row,col) + 1\/length(T.categories{i});\r\n<span class=\"keyword\">end<\/span>\r\n\r\n<span class=\"comment\">% segment the universities by number of courses<\/span>\r\ncatByTiers  = [sum(catByUniv(grouping == 1,:));<span class=\"keyword\">...<\/span>\r\n    sum(catByUniv(grouping == 2,:)); sum(catByUniv(grouping == 3,:))];\r\n<span class=\"comment\">% get the ranking of categories by number of courses<\/span>\r\n[~,ranking] = sort(sum(catByUniv(universities.courses &gt; 25,:)),<span class=\"string\">'descend'<\/span>);\r\n<span class=\"comment\">% get the ratio of courses by category<\/span>\r\ncatByTiers = bsxfun(@rdivide,catByTiers,sum(catByTiers,2));\r\n\r\n<span class=\"comment\">% plot a bar graph<\/span>\r\nfigure\r\nxticks = [{<span class=\"string\">''<\/span>};categories.name(ranking);{<span class=\"string\">''<\/span>}];\r\nh = bar(catByTiers(:,ranking)'); xlim([0 26]);\r\nax = gca; set(ax,<span class=\"string\">'XTick'<\/span>,0:26);set(ax,<span class=\"string\">'XTickLabel'<\/span>,xticks);set(ax,<span class=\"string\">'XTickLabelRotation'<\/span>,270);\r\ntitle(<span class=\"string\">'\\fontsize{14} Ratio of Courses Per Category'<\/span>)\r\nlegend(<span class=\"string\">'Universites with 25+ courses'<\/span>,<span class=\"string\">'Universites with 10+ courses'<\/span>,<span class=\"string\">'Universites with 1-10 courses'<\/span>,<span class=\"string\">'Location'<\/span>,<span class=\"string\">'Best'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/coursera_03.png\" alt=\"\"> <p>It looks like there was more STEM bias among the early adopters (universities with a lot of courses) but new entrants (universities with fewer courses) tend to have more non-STEM courses. Categories like Social Sciences, Humanities, Business and Management, Education, Teacher Professioal Development, Music, Film and Audio are on the rise.<\/p><h4>Summary<a name=\"ad13358e-feb9-4e4e-87a9-5f3f44ce1c54\"><\/a><\/h4><p>Why do we see this non-STEM shift? There are a number of possible explanations.<\/p><div><ul><li>In the beginning, Coursera courses relied on autograders. They were well suited for quantitative STEM subjects, but not for non-STEM subjects.<\/li><li>Autograders were custom built for respective courses and they are in fact full <a href=\"http:\/\/en.wikipedia.org\/wiki\/Software_as_a_service\">SaaS<\/a> applications. It was difficult to scale the number of courses if you needed to build for each course a custom SaaS app that can withstand substantial peak traffic near the deadline - <i>this human behavior is pretty universal<\/i><\/li><li>Later, Coursera introduced a crowd sourced essay grading system that can be used across multiple courses. This freed universities from the burden of creating custom SaaS apps.<\/li><li>This led to rapid expansion of course offerings and made non-STEM subjects viable. In fact, I took a number of STEM courses from JHU, and they tend to use essay grading system rather than autograders.<\/li><\/ul><\/div><p>There are questions we cannot answer with the data at hand. For example, is the shift driven by the convenience of supply side (universities) or by the demand for non-STEM subjects by the public?<\/p><p>There are no strict prerequisites for Coursera courses, but the bar is still high for STEM courses. Therefore it is quite possible that the potential market size is larger for non-STEM subjects.<\/p><p>You also saw how easy it is to use REST API with JSON response within R2014b, and got a quick look at some of the new features of updated MATLAB Graphics System. <a href=\"https:\/\/www.mathworks.com\/downloads\/web_downloads\/get_release?release=R2014b\">Download the new release<\/a>, try those new features yourself and share what you find <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=1023#respond\">here<\/a>!<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_38cfc284b8eb4d9e9b8d2ee8f58c0c91() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='38cfc284b8eb4d9e9b8d2ee8f58c0c91 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 38cfc284b8eb4d9e9b8d2ee8f58c0c91';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2014 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_38cfc284b8eb4d9e9b8d2ee8f58c0c91()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2014b<br><\/p><\/div><!--\r\n38cfc284b8eb4d9e9b8d2ee8f58c0c91 ##### SOURCE BEGIN #####\r\n%% Taking the Pulse of MOOCs\r\n% <https:\/\/www.coursera.org\/ Coursera> is a technology platform that\r\n% kickstarted the current\r\n% <http:\/\/en.wikipedia.org\/wiki\/Massive_open_online_course MOOCs> boom.\r\n% Even though there are more MOOCs players now, it still remains one of the\r\n% leading companies in this space. But how are they doing these days for\r\n% delivering higher education to the masses online?\r\n%\r\n% Today's guest blogger, < Toshi\r\n% Takeuchi>, would like to share an analysis using Courera's data.\r\n% \r\n% I am a big fan of MOOCs and I benefited a lot from free online courses on\r\n% Coursera, such as Stanford's <https:\/\/www.coursera.org\/course\/ml Machine\r\n% Learning> course. Like many websites these days, Coursera offers its data\r\n% through <http:\/\/en.wikipedia.org\/wiki\/Representational_state_transfer\r\n% REST APIs>. Coursera offers a number of APIs, but\r\n% <https:\/\/tech.coursera.org\/app-platform\/catalog\/ Catalog APIs> are\r\n% available without OAuth authentication. We can find out the details of\r\n% courses offered by Coursera with these APIs.\r\n% \r\n% We can try to answer questions like _\"how do STEM and non-STEM courses\r\n% break down among universities?_\"\r\n\r\n%% JSON support in R2014b\r\n% <http:\/\/en.wikipedia.org\/wiki\/JSON JSON> is a very common data format for\r\n% REST APIs, and Coursera's APIs also returns results in JSON format. MATLAB\r\n% now supports JSON out of the box in\r\n% <https:\/\/www.mathworks.com\/products\/matlab\/whatsnew.html R2014b>. You\r\n% could always use JSON from within MATLAB by taking advantage of user\r\n% contributed MATLAB programs on\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/ File Exchange>, but\r\n% built-in JSON support makes it easy for us to share scripts that use\r\n% JSON, because we don't have to worry about dependencies.\r\n%\r\n% Let's try the new feature using Coursera APIs. Calling a REST API is very\r\n% simple with |webread|.\r\n\r\nrestApi='https:\/\/api.coursera.org\/api\/catalog.v1\/courses';\r\nparams = 'sessions,universities,categories';\r\nresp=webread(restApi,'includes',params,weboptions('Timeout',60)); \r\n\r\n%%\r\n% |webread| returns the JSON response as a structure array. The data is\r\n% further processed in a separate script |processData.m| - check out the\r\n% details if interested.\r\n% \r\n% We need to decide which categories represent STEM subjects. When there\r\n% are multiple categories assigned to a given course, we treat it as a STEM\r\n% course as long as one of them is included in STEM categories.\r\n\r\nprocessData\r\n\r\n%% Plotting courses vs sessions by university\r\n% As a sanity check, let's plot the number of courses vs. number of\r\n% sessions by university. A single course can be offered repeatedly in\r\n% multiple sessions. Therefore you can determine the longevity or age of a\r\n% given course by the count of sessions. \r\n%\r\n% If it is a new course, or it was not repeated, then you only have one\r\n% session per course. We can use this as the baseline, and check how\r\n% universities scaled up their courses relative to this baseline.\r\n%\r\n% R2014b comes with new MATLAB Graphics System, but you can still use the\r\n% familiar commands for plotting.\r\n\r\n% group by number of courses\r\ngrouping = ones(height(universities),1)*2;\r\ngrouping(universities.courses > 25) =  1;\r\ngrouping(universities.courses <= 10) = 3;\r\n\r\n% plot\r\nfigure\r\ngscatter(universities.courses,universities.sessions,grouping)\r\nh = refline(1,0); set(h,'Color','m','LineStyle',':')\r\nh = refline(2,0); set(h,'Color','m','LineStyle',':')\r\nh = refline(3,0); set(h,'Color','m','LineStyle',':')\r\nh = refline(6,0); set(h,'Color','m','LineStyle',':')\r\nxlabel('Number of Courses'); ylabel('Number of Sessions'); \r\ntitle('\\fontsize{14} Courses by Sessions by University');\r\nlegend('Universities with 25+ courses','Universities with 10+ courses',...\r\n    'Universities with 1-10 courses','Ref line: 1 session per course',...\r\n    'Ref line: 2 sessions per course','Ref line: 3 sessions per course',...\r\n    'Ref line: 6 sessions per course','Location','NorthWest')\r\n% add university names\r\nfor i = 1:height(universities)\r\n    if universities.courses(i) > 10 && universities.sessions(i) > 20\r\n        text(universities.courses(i),universities.sessions(i),...\r\n            universities.shortName{i},'FontSize',12)\r\n    end\r\nend\r\n\r\n%% \r\n% You can see that Stanford, Penn (University of Pennsylvania), JHU (Johns\r\n% Hopkins), and Duke are leading the pack. They are the early adopters,\r\n% based on the number of sessions. It is interesting to see PKU (Peking\r\n% University) leading international institutions. They offer a number of\r\n% courses in Chinese. Coursera didn't start international partnership until\r\n% recently, so it is quite remarkable the PKU has broadened their online\r\n% content in relatively short time. More recent entrants are on the left\r\n% with fewer courses and sessions.\r\n% \r\n% Established players are trying to scale up by repeating the sessions. JHU\r\n% seems to be particularly aggressive in terms of the number of courses\r\n% they offer and how they are repeated as sessions.\r\n\r\n%% Plotting STEM ratios by university\r\n% Let's plot the number of courses by ratio of STEM courses by university.\r\n% This will tell us which schools are making investments in online\r\n% education content, and whether they focus on STEM or non-STEM subjects.\r\n% The size of the marker indicates the total number of sessions they are\r\n% associated with, so it also gives us how long they have been involved in\r\n% Coursera. Notice the |parula| colormap used in the colorbar, the new\r\n% default colormap in R2014b.\r\n\r\n% use sesion count for setting marker sizes\r\nmarkerSize = universities.sessions;\r\n% we need to scale the marker size\r\nmarkerSize = (markerSize - min(markerSize))\/(max(markerSize)-min(markerSize));\r\nmarkerSize = markerSize * 1000; markerSize(markerSize == 0) = 1;\r\n% change the tick labels to reflect the original values\r\nbarticks = num2cell(20:20:200);\r\n% create a scatter plot\r\nfigure\r\nscatter(universities.courses,universities.stem_ratio,markerSize,markerSize,'fill')\r\nxlim([0 40])\r\nh = colorbar('TickLabels',barticks);\r\nh.Label.String = '\\fontsize{11}Number of Sessions';\r\ntitle('\\fontsize{14} Ratio of STEM courses by University on Coursera')\r\nxlabel('\\fontsize{11}Number of Courses'); ylabel('\\fontsize{11}Ratio of STEM Courses');\r\n% add university names\r\nfor i = 1:height(universities)\r\n    if universities.stem_ratio(i) ~= 0 && universities.stem_ratio(i) ~= 1 && universities.courses(i) >= 5\r\n        text(universities.courses(i),universities.stem_ratio(i),universities.shortName{i},'FontSize',12)\r\n    end\r\nend\r\n% add reference lines\r\nline([25 25],[0 1],'LineStyle',':')\r\nline([10 10],[0 1],'LineStyle',':')\r\nline([0 40],[0.5 0.5],'LineStyle',':')\r\n\r\n%%\r\n% Stanford is very heavy on STEM subjects, while others are more\r\n% balanced. More recent entrants on the left have a wider variance in how \r\n% STEM heavy their courses are. Perhaps rate of adaption is different among\r\n% different academic disciplines? \r\n\r\n%% Plotting ratio of courses per category\r\n% We can plot the ratio of courses per category in order to see the\r\n% relative representation of academic disciplines on Coursera. A course can\r\n% belong to multiple categories, and in such cases a count is split equally\r\n% across the included categories. Note that you can now rotate axis tick\r\n% labels in R2014b. \r\n\r\n% get the count of categories by university\r\ncatByUniv = zeros(height(universities),height(categories));\r\nfor i = 1:length(T.categories)\r\n    row = ismember(universities.id,T.universities(i));\r\n    col = ismember(categories.id,T.categories{i});\r\n    catByUniv(row,col) = catByUniv(row,col) + 1\/length(T.categories{i});\r\nend\r\n\r\n% segment the universities by number of courses\r\ncatByTiers  = [sum(catByUniv(grouping == 1,:));...\r\n    sum(catByUniv(grouping == 2,:)); sum(catByUniv(grouping == 3,:))];\r\n% get the ranking of categories by number of courses\r\n[~,ranking] = sort(sum(catByUniv(universities.courses > 25,:)),'descend');\r\n% get the ratio of courses by category\r\ncatByTiers = bsxfun(@rdivide,catByTiers,sum(catByTiers,2));\r\n\r\n% plot a bar graph\r\nfigure\r\nxticks = [{''};categories.name(ranking);{''}];\r\nh = bar(catByTiers(:,ranking)'); xlim([0 26]);\r\nax = gca; set(ax,'XTick',0:26);set(ax,'XTickLabel',xticks);set(ax,'XTickLabelRotation',270);\r\ntitle('\\fontsize{14} Ratio of Courses Per Category')\r\nlegend('Universites with 25+ courses','Universites with 10+ courses','Universites with 1-10 courses','Location','Best')\r\n\r\n%%\r\n% It looks like there was more STEM bias among the early adopters\r\n% (universities with a lot of courses) but new entrants (universities with\r\n% fewer courses) tend to have more non-STEM courses. Categories like Social\r\n% Sciences, Humanities, Business and Management, Education, Teacher\r\n% Professioal Development, Music, Film and Audio are on the rise. \r\n\r\n%% Summary\r\n% Why do we see this non-STEM shift? There are a number of possible\r\n% explanations. \r\n%\r\n% * In the beginning, Coursera courses relied on autograders. They were\r\n% well suited for quantitative STEM subjects, but not for non-STEM\r\n% subjects. \r\n% * Autograders were custom built for respective courses and they are in\r\n% fact full <http:\/\/en.wikipedia.org\/wiki\/Software_as_a_service SaaS>\r\n% applications. It was difficult to scale the number of courses if you\r\n% needed to build for each course a custom SaaS app that can withstand\r\n% substantial peak traffic near the deadline - _this human behavior is\r\n% pretty universal_\r\n% * Later, Coursera introduced a crowd sourced essay grading system that\r\n% can be used across multiple courses. This freed universities from the\r\n% burden of creating custom SaaS apps. \r\n% * This led to rapid expansion of course offerings and made non-STEM\r\n% subjects viable. In fact, I took a number of STEM courses from JHU,\r\n% and they tend to use essay grading system rather than autograders.\r\n% \r\n% There are questions we cannot answer with the data at hand. For example,\r\n% is the shift driven by the convenience of supply side (universities) or\r\n% by the demand for non-STEM subjects by the public?\r\n%\r\n% There are no strict prerequisites for Coursera courses, but the bar is\r\n% still high for STEM courses. Therefore it is quite possible that the\r\n% potential market size is larger for non-STEM subjects. \r\n%\r\n% You also saw how easy it is to use REST API with JSON response within\r\n% R2014b, and got a quick look at some of the new features of updated\r\n% MATLAB Graphics System.\r\n% <https:\/\/www.mathworks.com\/downloads\/web_downloads\/get_release?release=R2014b\r\n% Download the new release>, try those new features yourself and share what\r\n% you find <https:\/\/blogs.mathworks.com\/loren\/?p=1023#respond here>!\r\n\r\n\r\n\r\n##### SOURCE END ##### 38cfc284b8eb4d9e9b8d2ee8f58c0c91\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/coursera_03.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p><a href=\"https:\/\/www.coursera.org\/\">Coursera<\/a> is a technology platform that kickstarted the current <a href=\"http:\/\/en.wikipedia.org\/wiki\/Massive_open_online_course\">MOOCs<\/a> boom. Even though there are more MOOCs players now, it still remains one of the leading companies in this space. But how are they doing these days for delivering higher education to the masses online?... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2014\/10\/16\/taking-the-pulse-of-moocs\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[6,40],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1023"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=1023"}],"version-history":[{"count":8,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1023\/revisions"}],"predecessor-version":[{"id":2187,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1023\/revisions\/2187"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=1023"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=1023"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=1023"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}