{"id":1159,"date":"2015-04-22T08:10:27","date_gmt":"2015-04-22T13:10:27","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=1159"},"modified":"2021-10-04T08:31:23","modified_gmt":"2021-10-04T12:31:23","slug":"the-netflix-prize-and-production-machine-learning-systems-an-insider-look","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2015\/04\/22\/the-netflix-prize-and-production-machine-learning-systems-an-insider-look\/","title":{"rendered":"The Netflix Prize and Production Machine Learning Systems: An Insider Look"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>Do you watch movies on Netflix? Binge-watch TV series? Do you use their movie recommendations? Today's guest blogger, Toshi Takeuchi, shares an interesting blog post he saw about how Netflix uses machine learning for movie recommendations.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2015\/movie_streaming.jpg\" alt=\"\"> <\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#15307a1e-375b-4d8f-af2b-c85b8dd490e7\">How Recommender Systems Work<\/a><\/li><li><a href=\"#390faa49-a84f-4399-bb8e-ce3402b96044\">What Netflix did with the winning solutions<\/a><\/li><li><a href=\"#65392b6c-976b-4fe0-a8eb-65812fbb619c\">So, Was It Worth $1M?<\/a><\/li><li><a href=\"#5f2973e3-32ab-4e07-a9a2-d7dba1b065e4\">Lessons Learned: New Metrics<\/a><\/li><li><a href=\"#81f69641-539c-4d43-a3e7-2ff463d3a3c4\">Lessons Learned: System Architecture<\/a><\/li><li><a href=\"#b5a98583-cfb2-434e-a7db-b21f47c6ade9\">MATLAB on Hadoop and MATLAB Production Server<\/a><\/li><li><a href=\"#873fc3a9-9d87-477e-af65-72d3d7353a47\">Closing<\/a><\/li><\/ul><\/div><p>Back in 2006 Netflix announced a famed machine learning and data mining competition \"Netflix Prize\" with a $1 million award, finally claimed in 2009. It was a turning post that led to Kaggle and other data science compeititions we see today.<\/p><p>With all the publicity and media attention it got, was it really worth $1 million for Netflix? What did they do with the winning solutions in the end? I came across a very interesting <i>insider<\/i> blog post <a href=\"http:\/\/techblog.netflix.com\/2012\/04\/netflix-recommendations-beyond-5-stars.html\">\"Netflix Recommendations: Beyond the 5 stars\"<\/a> that reveals practical insights about what really matters not just for recommender systems but also generally for any real world commercial machine learning applications.<\/p><h4>How Recommender Systems Work<a name=\"15307a1e-375b-4d8f-af2b-c85b8dd490e7\"><\/a><\/h4><p>The goal of the NetFlix Prize was to crowdsource a movie recommendation algorithm that delivers 10%+ improvement in prediction accuracy over the existing system. If you use Netflix, you see movies listed under \"movies you may like\" or \"more movies like so-and-so\", etc. These days such recommendations are a huge part of internet retail businesses.<\/p><p>It is probably useful to study a very simple example recommendation system based on a well known algorithm called <a href=\"http:\/\/en.wikipedia.org\/wiki\/Collaborative_filtering\">Collaborative Filtering<\/a>. Here is a toy dataset of movie ratings from 6 fictitious users (columns) for 6 movies released in 2014 (rows).<\/p><pre class=\"codeinput\">movies = {<span class=\"string\">'Big Hero 6'<\/span>,<span class=\"string\">'Birdman'<\/span>,<span class=\"string\">'Boyhood'<\/span>,<span class=\"string\">'Gone Girl'<\/span>,<span class=\"string\">'The LEGO Movie'<\/span>,<span class=\"string\">'Whiplash'<\/span>};\r\nusers = {<span class=\"string\">'Kevin'<\/span>,<span class=\"string\">'Jay'<\/span>,<span class=\"string\">'Ross'<\/span>,<span class=\"string\">'Spencer'<\/span>,<span class=\"string\">'Megan'<\/span>, <span class=\"string\">'Scott'<\/span>};\r\nratings = [1.0, 4.0, 3.0, 2.0, NaN, 1.0;\r\n           5.0, 1.0, 1.0, 4.0, 5.0, NaN;\r\n           NaN, 2.0, 2.0, 5.0, 4.0, 5.0;\r\n           5.0, NaN, 3.0, 5.0, 4.0, 4.0;\r\n           3.0, 2.0, NaN, 3.0, 3.0, 3.0;\r\n           4.0, 3.0, 3.0, NaN, 4.0, 4.0];\r\n<\/pre><p>The idea behind Collaborative Filtering is that you can use the ratings from users who share similar tastes to predict ratings for unrated items. To get an intuition, let's compare the ratings by pairs of users over movies they both rated. The plot of ratings represents their preference space. The best-fit line should go up to the right if the relationship is positive, and it should go down if not.<\/p><pre class=\"codeinput\">figure\r\nsubplot(2,1,1)\r\nscatter(ratings(:,1),ratings(:,2),<span class=\"string\">'filled'<\/span>)\r\nlsline\r\nxlim([0 6]); ylim([0 6])\r\ntitle(<span class=\"string\">'Movie Preference Space by Two Users'<\/span>)\r\nxlabel(<span class=\"string\">'Kevin''s ratings'<\/span>); ylabel(<span class=\"string\">'Jay''s ratings'<\/span>)\r\n<span class=\"keyword\">for<\/span> i = 1:size(ratings,1)\r\n    text(ratings(i,1)+0.05,ratings(i,2),movies{i})\r\n<span class=\"keyword\">end<\/span>\r\nsubplot(2,1,2)\r\nscatter(ratings(:,1),ratings(:,4),<span class=\"string\">'filled'<\/span>)\r\nlsline\r\nxlim([0 6]); ylim([0 6])\r\nxlabel(<span class=\"string\">'Kevin''s ratings'<\/span>); ylabel(<span class=\"string\">'Spencer''s ratings'<\/span>)\r\n<span class=\"keyword\">for<\/span> i = 1:size(ratings,1)\r\n    text(ratings(i,1)+0.05,ratings(i,4),movies{i})\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2015\/netflix_01.png\" alt=\"\"> <p>By looking at the slope of the best-fit lines, you can tell that Kevin and Jay don't share similar tastes because their ratings are negatively correlated. Kevin and Spencer, on the other hand, seem to like similar movies.<\/p><p>One popular measure of similarity in Collaborative Filtering is Pearson's correlation coefficient (<a href=\"https:\/\/blogs.mathworks.com\/loren\/2015\/04\/08\/can-you-find-love-through-text-analytics\/\">cosine similarity<\/a> is another one). It ranges from 1 to -1 where 1 is positive correlation, 0 is no correlation, and -1 is negative correlation. We compute the pairwise correlation of users using rows with no missing values. What are the similarity scores between Kevin and Jay and Kevin and Spencer?<\/p><pre class=\"codeinput\">sims = corr(ratings, <span class=\"string\">'rows'<\/span>, <span class=\"string\">'pairwise'<\/span>);\r\nfprintf(<span class=\"string\">'Similarity between Kevin and Jay:     %.2f\\n'<\/span>,sims(1,2))\r\nfprintf(<span class=\"string\">'Similarity between Kevin and Spencer:  %.2f\\n'<\/span>,sims(1,4))\r\n<\/pre><pre class=\"codeoutput\">Similarity between Kevin and Jay:     -0.83\r\nSimilarity between Kevin and Spencer:  0.94\r\n<\/pre><p>Because Kevin and Jay have very different tastes, their similarity is negative. Kevin and Spencer, on the other hand, share highly similar tastes. Users who share similar tastes are called neighbors and we can predict ratings of unrated items by combining their existing ratings for other items. But we need to find those neighbors first. Let's find the neighbors for Kevin.<\/p><pre class=\"codeinput\">sims = sims - eye(length(users)); <span class=\"comment\">% set self-correlations to 0<\/span>\r\nkevin_corrs = sims(1,:);\r\n[ngh_corr, ngh_idx] = sort(kevin_corrs,<span class=\"string\">'descend'<\/span>);\r\nngh_corr\r\n<\/pre><pre class=\"codeoutput\">ngh_corr =\r\n    0.9661    0.9439    0.8528         0   -0.4402   -0.8315\r\n<\/pre><p>Kevin has three neighbors who have a high correlation with him. We can use their ratings and correlation scores to predict Kevin's ratings. The weighted average method is a basic approach to make predictios. Because the rating scale can be different among individuals, we need to use mean-centered ratings rather than raw ratings. Kevin hasn't rated 'Boyhood' yet. Would he like it?<\/p><pre class=\"codeinput\">kevin_mu = nanmean(ratings(:,1));           <span class=\"comment\">% Kevin's average rating<\/span>\r\nngh_corr(4:end) = [];                       <span class=\"comment\">% drop non-neighbors<\/span>\r\nngh_idx(4:end) = [];                        <span class=\"comment\">% drop non-neighbors<\/span>\r\nngh_mu = nanmean(ratings(:,ngh_idx),1);       <span class=\"comment\">% neighbor average ratings<\/span>\r\nPredicted = nan(length(movies),1);          <span class=\"comment\">% initialize an accumulator<\/span>\r\n\r\n<span class=\"keyword\">for<\/span> i = 1:length(movies)                    <span class=\"comment\">% loop over movies<\/span>\r\n    ngh_r = ratings(i,ngh_idx);             <span class=\"comment\">% neighbor ratings for the movie<\/span>\r\n    isRated = ~isnan(ngh_r);                <span class=\"comment\">% only use neighbors who rated<\/span>\r\n    meanCentered =<span class=\"keyword\">...<\/span><span class=\"comment\">                       % mean centered weighted average<\/span>\r\n        (ngh_r(isRated) - ngh_mu(isRated)) * ngh_corr(isRated)'<span class=\"keyword\">...<\/span>\r\n        \/ sum(ngh_corr(isRated));\r\n    Predicted(i) = kevin_mu + meanCentered; <span class=\"comment\">% add Kevin's average<\/span>\r\n<span class=\"keyword\">end<\/span>\r\n\r\nActual = ratings(:,1);                      <span class=\"comment\">% Kevin's actual ratings<\/span>\r\ntable(Actual, Predicted,<span class=\"string\">'RowNames'<\/span>,movies)  <span class=\"comment\">% compare them to predicted<\/span>\r\n\r\nfprintf(<span class=\"string\">'Predicted rating for \"%s\": %.d\\n'<\/span>,movies{3},round(Predicted(3)))\r\n<\/pre><pre class=\"codeoutput\">ans = \r\n                      Actual    Predicted\r\n                      ______    _________\r\n    Big Hero 6          1       1.4965   \r\n    Birdman             5       4.1797   \r\n    Boyhood           NaN       4.5695   \r\n    Gone Girl           5       4.2198   \r\n    The LEGO Movie      3       2.8781   \r\n    Whiplash            4       3.9187   \r\nPredicted rating for \"Boyhood\": 5\r\n<\/pre><p>Looks like Kevin would rate 'Boyhood' as a 5-star movie. Now, how accurate was our prediction? The metric used in the Netflix Prize was Root Mean Square Error (RMSE). Let's apply it to this case.<\/p><pre class=\"codeinput\">RMSE = sqrt(nanmean((Predicted - Actual).^2))\r\n<\/pre><pre class=\"codeoutput\">RMSE =\r\n    0.5567\r\n<\/pre><p>Now you saw how basic Collaborative Filtering worked, but an actual commercial system is naturally much more complex. For example, you don't usually see raw predicted ratings on the web pages. Recommendations are typically delivered as a top-N list. If so, is RMSE really a meaningful metric? For a competition, Netflix had to pick a single number metric to determine the winner. But choosing this metric had its consequences.<\/p><h4>What Netflix did with the winning solutions<a name=\"390faa49-a84f-4399-bb8e-ce3402b96044\"><\/a><\/h4><p>The goal of the competition was to crowdsource improved algorithms. $1M for 3 years of R&amp;D? One of the teams spent more than 2000 hours of work to deliver 8.43% improvement. Combined, a lot of people spent enormous amounts of time for a slim hope of the prize. Did Netflix use the solutions in the end?<\/p><p>They did adopt one solution with 8.43% improvement, but they had to overcome its limitations first.<\/p><div><ul><li>The number of ratings in the competition dataset was 100 million, but the actual production system had over 5 billion<\/li><li>The competition dataset was static, but the number of ratings in the production system keeps growing (4 million ratings per day when the blog post was written)<\/li><\/ul><\/div><p>They didn't adopt the grand prize solution that achieved 10% improvement.<\/p><div><ul><li>Additional accuracy gains that we measured did not justify the engineering effort needed to bring them into a production environment<\/li><li>The Netflix business model changed from DVD rental to streaming and that changed the way data is collected and recommendations are delivered.<\/li><\/ul><\/div><p>Why additional accuracy gains may not be worth the effort? For example, you could improve the RMSE by closing the prediction gaps in lower ratings - movies people would hate. Does that help end users? Does that increase revenue for Netflix? Probably not. What you can see here is that, in a production system, <b>scalability<\/b> and <b>adaptability<\/b> to changing business needs are bigger challenges than RMSE.<\/p><p>Had Netflix chosen for the competition some metrics more aligned with the needs of the production system, they might have had an easier time adopting the resulting solutions.<\/p><h4>So, Was It Worth $1M?<a name=\"65392b6c-976b-4fe0-a8eb-65812fbb619c\"><\/a><\/h4><p>You often learn more from what didn't go well than what went well. Netflix was not able to take full advantage of the winning solutions, but they certainly appear to have learned good lessons based on how they now operate now. I would say  they got well more than $1M's worth from this bold experiment.  Let's see how Netflix is taking advantage of lessons learned.<\/p><h4>Lessons Learned: New Metrics<a name=\"5f2973e3-32ab-4e07-a9a2-d7dba1b065e4\"><\/a><\/h4><p>When Netlix talks about their current system, it is notable what they highlight.<\/p><div><ul><li>\"75% of what people watch is from some sort of recommendation\"<\/li><li>\"continuously optimizing the member experience and have measured significant gains in member satisfaction\"<\/li><\/ul><\/div><p>What they now care about is usage, user experience, user satisfaction, and user retention. Naturally they are also well aligned with the bottomline of Netflix as well. The second bullet point refers to A\/B testing Netflix is conducting on the live production system. That means they are constantly changing the system...<\/p><h4>Lessons Learned: System Architecture<a name=\"81f69641-539c-4d43-a3e7-2ff463d3a3c4\"><\/a><\/h4><p>\"Coming up with a software architecture that handles large volumes of existing data, is responsive to user interactions, and makes it easy to experiment with new recommendation approaches is not a trivial task.\" writes the Netflix blogger in <a href=\"http:\/\/techblog.netflix.com\/2013\/03\/system-architectures-for.html\">\"System Architectures for Personalization and Recommendation\"<\/a>. You can also see more detail in <a href=\"http:\/\/www.slideshare.net\/justinbasilico\/lessons-learned-from-building-machine-learning-software-at-netflix\">this presentation<\/a>.<\/p><p>One of the techniques used in the winning solutions in the Netflix Prize was an ensemble method called <a href=\"http:\/\/arxiv.org\/abs\/0911.0460\">Feature-Weighted Linear Stacking<\/a>. Netflix apparently adopted a form of linear stacking technique to combine the predictions from multiple predictive models to produce final recommendations. The blog post gives an example: if <tt>u<\/tt> = user, <tt>v<\/tt> = video item, <tt>p<\/tt> = popularity, <tt>r<\/tt> = predicted rating, <tt>b<\/tt> = intercept, and <tt>w1<\/tt> and <tt>w2<\/tt> are respective weights, then a simple combination of popularity score and predicted ratings by Collaborative Filtering can be expressed as:<\/p><p>$$f_{rank}(u,m) = w1 \\cdot  p(v) + w2 \\cdot r(u,v) + b$$<\/p><p>And you can just add more terms to this linear equation as you develop more predictive models, each running on its subsystem... This is a very flexible architecture and makes it easy to run an A\/B test - it is just a matter of changing weights!<\/p><p>Netflix uses three layers of service to achieve this - offline, nearline and online to overcome this challenge.<\/p><div><ul><li>Offline to process data - pre-computes the time-consuming steps in a batch process.<\/li><li>Online to process requests -  responds to user action instantaneously by taking advantage of the Offlne and Nearline outputs<\/li><li>Nearline to process events - bridges the two sub-systems by pre-computing frequently performed operations and caching the result in advance of active user action<\/li><\/ul><\/div><p>As a simple example for an offline process, let's use the <a href=\"http:\/\/grouplens.org\/datasets\/movielens\/\">MovieLens<\/a> dataset with 100K ratings from 943 users on 1682 movies.<\/p><pre class=\"codeinput\">data_dir = <span class=\"string\">'ml-100k'<\/span>;\r\n\r\n<span class=\"keyword\">if<\/span> exist(data_dir,<span class=\"string\">'dir'<\/span>) ~= 7\r\n    unzip(<span class=\"string\">'http:\/\/files.grouplens.org\/datasets\/movielens\/ml-100k.zip'<\/span>)\r\n<span class=\"keyword\">end<\/span>\r\n\r\ndata = readtable(fullfile(data_dir,<span class=\"string\">'u.data'<\/span>),<span class=\"string\">'FileType'<\/span>,<span class=\"string\">'text'<\/span>,<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'ReadVariableNames'<\/span>,false,<span class=\"string\">'Format'<\/span>,<span class=\"string\">'%f%f%f%f'<\/span>);\r\ndata.Properties.VariableNames = {<span class=\"string\">'user_id'<\/span>,<span class=\"string\">'movie_id'<\/span>,<span class=\"string\">'rating'<\/span>,<span class=\"string\">'timestamp'<\/span>};\r\n\r\nsp_mur = sparse(data.movie_id,data.user_id,data.rating);\r\n\r\n[m,n] = size(sp_mur);\r\nfigure\r\nspy(sp_mur)\r\ntitle(<span class=\"string\">'Movie-User Matrix Sparsity Pattern'<\/span>)\r\nxlabel(<span class=\"string\">'users'<\/span>)\r\nylabel(<span class=\"string\">'movies'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2015\/netflix_02.png\" alt=\"\"> <p>Older movies in lower row indices have fairly dense ratings but the newer movies with higher row indices are really sparse. You can also see that older users in lower column indices don&#8217;t rate newer movies at all &#8211; it seems they have stopped using the service.<\/p><p>You can take advantage of sparsity by computing directly on the sparse matrix. For example, you can pre-compute the Pearson correlation scores and mean centered ratings, and, once we have an active user, they can be used to compute the neighborhood and generate recommendations for that user.<\/p><p>It turns out you need to update the pre-computation of neighborhood and mean-centered ratings fairly frequently because most users don't rate many movies and one new rating can shift values a lot. For this reason, the item-based approach is used more commonly than user-based approach we studied earlier. Only big difference is that you compute similarity based on movies rather than users. It is just a matter of transposing the mean-centered matrix, but you need to update this less frequently because item-based scores are more stable.<\/p><pre class=\"codeinput\">meanRatings = sum(sp_mur,2).\/sum(sp_mur~=0,2);\r\n[i,j,v] = find(sp_mur);\r\nmeanCentered = sparse(i,j,v - meanRatings(i),m,n);\r\nsims = corr(meanCentered', <span class=\"string\">'rows'<\/span>, <span class=\"string\">'pairwise'<\/span>);\r\n<\/pre><h4>MATLAB on Hadoop and MATLAB Production Server<a name=\"b5a98583-cfb2-434e-a7db-b21f47c6ade9\"><\/a><\/h4><p>We can still do this in-memory with a 100K dataset, but for larger datasets, MATLAB can run MapReduce jobs with MATLAB Distributed Computing Server on Hadoop clusters for offline jobs. MATLAB Production Server enables rapid deployment of MATLAB code in Production environment and it may be a good choice for nearline or online uses where concurrent A\/B testing is performed, because you avoid dual implementation and enable rapid system update based on the test result.<\/p><p>For an example of how you use MapReduce in MATLAB, check out <a href=\"https:\/\/blogs.mathworks.com\/loren\/2015\/02\/25\/scaling-market-basket-analysis-with-mapreduce\/\">\"Scaling Market Basket Analysis with MapReduce\"<\/a>. Incidentally, Collaborative Filtering and Association Rule Mining are related concepts. In Market Basket Analysis, we consider transactions as basic units. In Collaborative Filtering, we consider users as basic units. You may be able to repurpose the MapReduce code from that post with appropriate modifications.<\/p><p>To learn more about MATLAB capabilities, please consult the following resources.<\/p><div><ul><li><a href=\"https:\/\/www.mathworks.com\/products\/matlab-production-server\/\">MATLAB Production Server<\/a><\/li><li><a href=\"https:\/\/www.mathworks.com\/solutions\/data-analytics.html\">Data Analytics<\/a><\/li><\/ul><\/div><h4>Closing<a name=\"873fc3a9-9d87-477e-af65-72d3d7353a47\"><\/a><\/h4><p>As noted earlier, Collaborative Filtering and <a href=\"https:\/\/blogs.mathworks.com\/loren\/2015\/01\/29\/introduction-to-market-basket-analysis\/\">Market Basket Analysis<\/a> are closely related applications of data mining and machine learning. Both aim to learn from the customer behaviors, but Market Basket Analysis aims for high frequency transactions while Collaborative Filtering enables personalized recommendations. Naturally, you cannot restock your store shevles for individual shoppers, but you can in ecommerce. You can also think of <a href=\"https:\/\/blogs.mathworks.com\/loren\/2015\/04\/08\/can-you-find-love-through-text-analytics\/\">Latent Semantic Analysis<\/a> as a related technique applied in text analytics domain.<\/p><p>Just as Market Basket Analysis found its way into web usage data mining, you can probably use Colaborative Filtering for the web data analysis. There may be applications in other fields as well. Any thoughts about your creative use of \"Collaborative Filtering\" with your data? Let us know <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=1159#respond\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_f05c09e9b55644208fa22fbfed31a46f() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='f05c09e9b55644208fa22fbfed31a46f ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' f05c09e9b55644208fa22fbfed31a46f';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2015 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_f05c09e9b55644208fa22fbfed31a46f()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2015a<br><\/p><p class=\"footer\"><br>\r\n      Published with MATLAB&reg; R2015a<br><\/p><\/div><!--\r\nf05c09e9b55644208fa22fbfed31a46f ##### SOURCE BEGIN #####\r\n%% The Netflix Prize and Production Machine Learning Systems: An Insider Look\r\n% Do you watch movies on Netflix? Binge-watch TV series? Do you use their\r\n% movie recommendations? Today's guest blogger, Toshi Takeuchi, shares an\r\n% interesting blog post he saw about how Netflix uses machine learning for\r\n% movie recommendations.\r\n% \r\n% <<movie_streaming.jpg>>\r\n%\r\n\r\n%%\r\n% Back in 2006 Netflix announced a famed machine learning and data mining\r\n% competition \"Netflix Prize\" with a $1 million award, finally claimed in\r\n% 2009. It was a turning post that led to Kaggle and other data science\r\n% compeititions we see today.\r\n%\r\n% With all the publicity and media attention it got, was it really worth $1\r\n% million for Netflix? What did they do with the winning solutions in the\r\n% end? I came across a very interesting _insider_ blog post\r\n% <http:\/\/techblog.netflix.com\/2012\/04\/netflix-recommendations-beyond-5-stars.html\r\n% \"Netflix Recommendations: Beyond the 5 stars\"> that reveals \r\n% practical insights about what really matters not just for recommender\r\n% systems but also generally for any real world commercial machine learning\r\n% applications.\r\n%\r\n%% How Recommender Systems Work\r\n% The goal of the NetFlix Prize was to crowdsource a movie recommendation\r\n% algorithm that delivers 10%+ improvement in prediction accuracy over the\r\n% existing system. If you use Netflix, you see movies listed under \"movies\r\n% you may like\" or \"more movies like so-and-so\", etc. These days such\r\n% recommendations are a huge part of internet retail businesses.\r\n% \r\n% It is probably useful to study a very simple example recommendation\r\n% system based on a well known algorithm called\r\n% <http:\/\/en.wikipedia.org\/wiki\/Collaborative_filtering Collaborative\r\n% Filtering>. Here is a toy dataset of movie ratings from 6 fictitious\r\n% users (columns) for 6 movies released in 2014 (rows).\r\n\r\nmovies = {'Big Hero 6','Birdman','Boyhood','Gone Girl','The LEGO Movie','Whiplash'};\r\nusers = {'Kevin','Jay','Ross','Spencer','Megan', 'Scott'};\r\nratings = [1.0, 4.0, 3.0, 2.0, NaN, 1.0;\r\n           5.0, 1.0, 1.0, 4.0, 5.0, NaN;\r\n           NaN, 2.0, 2.0, 5.0, 4.0, 5.0;\r\n           5.0, NaN, 3.0, 5.0, 4.0, 4.0;\r\n           3.0, 2.0, NaN, 3.0, 3.0, 3.0;\r\n           4.0, 3.0, 3.0, NaN, 4.0, 4.0];\r\n\r\n%%\r\n% The idea behind Collaborative Filtering is that you can use the ratings\r\n% from users who share similar tastes to predict ratings for unrated items.\r\n% To get an intuition, let's compare the ratings by pairs of users over\r\n% movies they both rated. The plot of ratings represents their preference\r\n% space. The best-fit line should go up to the right if the relationship is\r\n% positive, and it should go down if not.\r\n\r\nfigure\r\nsubplot(2,1,1)\r\nscatter(ratings(:,1),ratings(:,2),'filled')\r\nlsline\r\nxlim([0 6]); ylim([0 6])\r\ntitle('Movie Preference Space by Two Users')\r\nxlabel('Kevin''s ratings'); ylabel('Jay''s ratings')\r\nfor i = 1:size(ratings,1)\r\n    text(ratings(i,1)+0.05,ratings(i,2),movies{i})\r\nend\r\nsubplot(2,1,2)\r\nscatter(ratings(:,1),ratings(:,4),'filled')\r\nlsline\r\nxlim([0 6]); ylim([0 6])\r\nxlabel('Kevin''s ratings'); ylabel('Spencer''s ratings')\r\nfor i = 1:size(ratings,1)\r\n    text(ratings(i,1)+0.05,ratings(i,4),movies{i})\r\nend\r\n\r\n%%\r\n% By looking at the slope of the best-fit lines, you can tell that Kevin\r\n% and Jay don't share similar tastes because their ratings are negatively\r\n% correlated. Kevin and Spencer, on the other hand, seem to like similar\r\n% movies. \r\n%\r\n% One popular measure of similarity in Collaborative Filtering is Pearson's\r\n% correlation coefficient\r\n% (<https:\/\/blogs.mathworks.com\/loren\/2015\/04\/08\/can-you-find-love-through-text-analytics\/ cosine\r\n% similarity> is another one). It ranges from 1 to -1 where 1 is positive\r\n% correlation, 0 is no correlation, and -1 is negative correlation. We\r\n% compute the pairwise correlation of users using rows with no missing\r\n% values. What are the similarity scores between Kevin and Jay and Kevin\r\n% and Spencer?\r\n\r\nsims = corr(ratings, 'rows', 'pairwise');\r\nfprintf('Similarity between Kevin and Jay:     %.2f\\n',sims(1,2))\r\nfprintf('Similarity between Kevin and Spencer:  %.2f\\n',sims(1,4))\r\n\r\n%%\r\n% Because Kevin and Jay have very different tastes, their similarity is\r\n% negative. Kevin and Spencer, on the other hand, share highly similar\r\n% tastes. Users who share similar tastes are called neighbors and we can\r\n% predict ratings of unrated items by combining their existing ratings for\r\n% other items. But we need to find those neighbors first. Let's find the\r\n% neighbors for Kevin.\r\n\r\nsims = sims - eye(length(users)); % set self-correlations to 0\r\nkevin_corrs = sims(1,:);\r\n[ngh_corr, ngh_idx] = sort(kevin_corrs,'descend');\r\nngh_corr\r\n\r\n%%\r\n% Kevin has three neighbors who have a high correlation with him. We can\r\n% use their ratings and correlation scores to predict Kevin's ratings. The\r\n% weighted average method is a basic approach to make predictios. Because\r\n% the rating scale can be different among individuals, we need to use\r\n% mean-centered ratings rather than raw ratings. Kevin hasn't rated\r\n% 'Boyhood' yet. Would he like it?\r\n\r\nkevin_mu = nanmean(ratings(:,1));           % Kevin's average rating\r\nngh_corr(4:end) = [];                       % drop non-neighbors\r\nngh_idx(4:end) = [];                        % drop non-neighbors\r\nngh_mu = nanmean(ratings(:,ngh_idx),1);       % neighbor average ratings\r\nPredicted = nan(length(movies),1);          % initialize an accumulator\r\n\r\nfor i = 1:length(movies)                    % loop over movies\r\n    ngh_r = ratings(i,ngh_idx);             % neighbor ratings for the movie\r\n    isRated = ~isnan(ngh_r);                % only use neighbors who rated\r\n    meanCentered =...                       % mean centered weighted average\r\n        (ngh_r(isRated) - ngh_mu(isRated)) * ngh_corr(isRated)'...\r\n        \/ sum(ngh_corr(isRated));\r\n    Predicted(i) = kevin_mu + meanCentered; % add Kevin's average\r\nend\r\n\r\nActual = ratings(:,1);                      % Kevin's actual ratings\r\ntable(Actual, Predicted,'RowNames',movies)  % compare them to predicted\r\n\r\nfprintf('Predicted rating for \"%s\": %.d\\n',movies{3},round(Predicted(3)))\r\n\r\n%%\r\n% Looks like Kevin would rate 'Boyhood' as a 5-star movie. Now, how\r\n% accurate was our prediction? The metric used in the Netflix Prize was\r\n% Root Mean Square Error (RMSE). Let's apply it to this case.\r\n\r\nRMSE = sqrt(nanmean((Predicted - Actual).^2))\r\n\r\n%%\r\n% Now you saw how basic Collaborative Filtering worked, but an actual\r\n% commercial system is naturally much more complex. For example, you don't\r\n% usually see raw predicted ratings on the web pages. Recommendations are\r\n% typically delivered as a top-N list. If so, is RMSE really a meaningful\r\n% metric? For a competition, Netflix had to pick a single number metric to\r\n% determine the winner. But choosing this metric had its consequences.\r\n\r\n%% What Netflix did with the winning solutions\r\n% The goal of the competition was to crowdsource improved algorithms. $1M\r\n% for 3 years of R&D? One of the teams spent more than 2000 hours of work\r\n% to deliver 8.43% improvement. Combined, a lot of people spent enormous\r\n% amounts of time for a slim hope of the prize. Did Netflix use the\r\n% solutions in the end?\r\n%\r\n% They did adopt one solution with 8.43% improvement, but they had to\r\n% overcome its limitations first.\r\n% \r\n% * The number of ratings in the competition dataset was 100 million, but\r\n% the actual production system had over 5 billion\r\n% * The competition dataset was static, but the number of ratings in the\r\n% production system keeps growing (4 million ratings per day when the blog\r\n% post was written)\r\n%\r\n% They didn't adopt the grand prize solution that achieved 10% improvement.\r\n% \r\n% * Additional accuracy gains that we measured did not justify the\r\n% engineering effort needed to bring them into a production environment\r\n% * The Netflix business model changed from DVD rental to streaming and\r\n% that changed the way data is collected and recommendations are delivered.\r\n%\r\n% Why additional accuracy gains may not be worth the effort? For example,\r\n% you could improve the RMSE by closing the prediction gaps in lower\r\n% ratings - movies people would hate. Does that help end users? Does that\r\n% increase revenue for Netflix? Probably not. What you can see here is\r\n% that, in a production system, *scalability* and *adaptability* to\r\n% changing business needs are bigger challenges than RMSE.\r\n% \r\n% Had Netflix chosen for the competition some metrics more aligned with the\r\n% needs of the production system, they might have had an easier time\r\n% adopting the resulting solutions.\r\n% \r\n%% So, Was It Worth $1M?\r\n% You often learn more from what didn't go well than what went well.\r\n% Netflix was not able to take full advantage of the winning solutions, but\r\n% they certainly appear to have learned good lessons based on how they now\r\n% operate now. I would say  they got well more than $1M's worth from this\r\n% bold experiment.  Let's see how Netflix is taking advantage of lessons\r\n% learned.\r\n%\r\n%% Lessons Learned: New Metrics\r\n% When Netlix talks about their current system, it is notable what they\r\n% highlight.\r\n% \r\n% * \"75% of what people watch is from some sort of recommendation\"\r\n% * \"continuously optimizing the member experience and have measured\r\n% significant gains in member satisfaction\"\r\n% \r\n% What they now care about is usage, user experience, user satisfaction,\r\n% and user retention. Naturally they are also well aligned with the\r\n% bottomline of Netflix as well. The second bullet point refers to A\/B\r\n% testing Netflix is conducting on the live production system. That means\r\n% they are constantly changing the system...\r\n%\r\n%% Lessons Learned: System Architecture\r\n% \"Coming up with a software architecture that handles large volumes of\r\n% existing data, is responsive to user interactions, and makes it easy to\r\n% experiment with new recommendation approaches is not a trivial task.\"\r\n% writes the Netflix blogger in\r\n% <http:\/\/techblog.netflix.com\/2013\/03\/system-architectures-for.html\r\n% \"System Architectures for Personalization and Recommendation\">. You can\r\n% also see more detail in\r\n% <http:\/\/www.slideshare.net\/justinbasilico\/lessons-learned-from-building-machine-learning-software-at-netflix\r\n% this presentation>.\r\n%\r\n% One of the techniques used in the winning solutions in the Netflix Prize\r\n% was an ensemble method called <http:\/\/arxiv.org\/abs\/0911.0460\r\n% Feature-Weighted Linear Stacking>. Netflix apparently adopted a form of\r\n% linear stacking technique to combine the predictions from multiple\r\n% predictive models to produce final recommendations. The blog post gives\r\n% an example: if |u| = user, |v| = video item, |p| = popularity, |r| =\r\n% predicted rating, |b| = intercept, and |w1| and |w2| are respective\r\n% weights, then a simple combination of popularity score and predicted\r\n% ratings by Collaborative Filtering can be expressed as:\r\n% \r\n% $$f_{rank}(u,m) = w1 \\cdot  p(v) + w2 \\cdot r(u,v) + b$$\r\n% \r\n% And you can just add more terms to this linear equation as you develop\r\n% more predictive models, each running on its subsystem... This is a very\r\n% flexible architecture and makes it easy to run an A\/B test - it is just a\r\n% matter of changing weights!\r\n%\r\n% Netflix uses three layers of service to achieve this - offline, nearline\r\n% and online to overcome this challenge.\r\n%\r\n% * Offline to process data - pre-computes the time-consuming steps in a\r\n% batch process.\r\n% * Online to process requests -  responds to user action instantaneously\r\n% by taking advantage of the Offlne and Nearline outputs\r\n% * Nearline to process events - bridges the two sub-systems by\r\n% pre-computing frequently performed operations and caching the result in\r\n% advance of active user action\r\n% \r\n% As a simple example for an offline process, let's use the\r\n% <http:\/\/grouplens.org\/datasets\/movielens\/ MovieLens> dataset with 100K\r\n% ratings from 943 users on 1682 movies.\r\n\r\ndata_dir = 'ml-100k';\r\n\r\nif exist(data_dir,'dir') ~= 7\r\n    unzip('http:\/\/files.grouplens.org\/datasets\/movielens\/ml-100k.zip')\r\nend\r\n\r\ndata = readtable(fullfile(data_dir,'u.data'),'FileType','text',...\r\n    'ReadVariableNames',false,'Format','%f%f%f%f');\r\ndata.Properties.VariableNames = {'user_id','movie_id','rating','timestamp'};\r\n\r\nsp_mur = sparse(data.movie_id,data.user_id,data.rating);\r\n\r\n[m,n] = size(sp_mur);\r\nfigure\r\nspy(sp_mur)\r\ntitle('Movie-User Matrix Sparsity Pattern')\r\nxlabel('users')\r\nylabel('movies')\r\n\r\n%%\r\n% Older movies in lower row indices have fairly dense ratings but the newer\r\n% movies with higher row indices are really sparse. You can also see that\r\n% older users in lower column indices don\u00e2\u20ac&#x2122;t rate newer movies at all \u00e2\u20ac\u201c it\r\n% seems they have stopped using the service.\r\n% \r\n% You can take advantage of sparsity by computing directly on the sparse\r\n% matrix. For example, you can pre-compute the Pearson correlation scores\r\n% and mean centered ratings, and, once we have an active user, they can be\r\n% used to compute the neighborhood and generate recommendations for that\r\n% user.\r\n%\r\n% It turns out you need to update the pre-computation of neighborhood and\r\n% mean-centered ratings fairly frequently because most users don't rate\r\n% many movies and one new rating can shift values a lot. For this reason,\r\n% the item-based approach is used more commonly than user-based approach we\r\n% studied earlier. Only big difference is that you compute similarity based\r\n% on movies rather than users. It is just a matter of transposing the\r\n% mean-centered matrix, but you need to update this less frequently because\r\n% item-based scores are more stable.\r\n\r\nmeanRatings = sum(sp_mur,2).\/sum(sp_mur~=0,2);\r\n[i,j,v] = find(sp_mur);\r\nmeanCentered = sparse(i,j,v - meanRatings(i),m,n);\r\nsims = corr(meanCentered', 'rows', 'pairwise');\r\n\r\n%% MATLAB on Hadoop and MATLAB Production Server\r\n% We can still do this in-memory with a 100K dataset, but for larger\r\n% datasets, MATLAB can run MapReduce jobs with MATLAB Distributed Computing\r\n% Server on Hadoop clusters for offline jobs. MATLAB Production Server\r\n% enables rapid deployment of MATLAB code in Production environment and it\r\n% may be a good choice for nearline or online uses where concurrent A\/B\r\n% testing is performed, because you avoid dual implementation and enable\r\n% rapid system update based on the test result.\r\n%\r\n% For an example of how you use MapReduce in MATLAB, check out\r\n% <https:\/\/blogs.mathworks.com\/loren\/2015\/02\/25\/scaling-market-basket-analysis-with-mapreduce\/\r\n% \"Scaling Market Basket Analysis with MapReduce\">. Incidentally,\r\n% Collaborative Filtering and Association Rule Mining are related concepts.\r\n% In Market Basket Analysis, we consider transactions as basic units. In\r\n% Collaborative Filtering, we consider users as basic units. You may be\r\n% able to repurpose the MapReduce code from that post with appropriate\r\n% modifications.\r\n%\r\n% To learn more about MATLAB capabilities, please consult the following\r\n% resources.\r\n%\r\n% * <https:\/\/www.mathworks.com\/discovery\/matlab-mapreduce-hadoop.html MATLAB\r\n% MapReduce and Hadoop>\r\n% * <https:\/\/www.mathworks.com\/products\/matlab-production-server\/ MATLAB\r\n% Production Server>\r\n% * <https:\/\/www.mathworks.com\/solutions\/data-analytics.html Data Analytics>\r\n%\r\n%% Closing\r\n% As noted earlier, Collaborative Filtering and\r\n% <https:\/\/blogs.mathworks.com\/loren\/2015\/01\/29\/introduction-to-market-basket-analysis\/\r\n% Market Basket Analysis> are closely related applications of data mining\r\n% and machine learning. Both aim to learn from the customer behaviors, but\r\n% Market Basket Analysis aims for high frequency transactions while\r\n% Collaborative Filtering enables personalized recommendations. Naturally,\r\n% you cannot restock your store shevles for individual shoppers, but you\r\n% can in ecommerce. You can also think of\r\n% <https:\/\/blogs.mathworks.com\/loren\/2015\/04\/08\/can-you-find-love-through-text-analytics\/ Latent Semantic\r\n% Analysis> as a related technique applied in text analytics domain.\r\n% \r\n% Just as Market Basket Analysis found its way into web usage data mining,\r\n% you can probably use Colaborative Filtering for the web data analysis.\r\n% There may be applications in other fields as well. Any thoughts about\r\n% your creative use of \"Collaborative Filtering\" with your data? Let us\r\n% know <https:\/\/blogs.mathworks.com\/loren\/?p=1159#respond here>.\r\n% \r\n\r\n\r\n\r\n##### SOURCE END ##### f05c09e9b55644208fa22fbfed31a46f\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2015\/netflix_02.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>Do you watch movies on Netflix? Binge-watch TV series? Do you use their movie recommendations? Today's guest blogger, Toshi Takeuchi, shares an interesting blog post he saw about how Netflix uses machine learning for movie recommendations.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2015\/04\/22\/the-netflix-prize-and-production-machine-learning-systems-an-insider-look\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[63,62,66,43,36,48],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1159"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=1159"}],"version-history":[{"count":5,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1159\/revisions"}],"predecessor-version":[{"id":4731,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1159\/revisions\/4731"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=1159"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=1159"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=1159"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}