{"id":2043,"date":"2016-10-10T08:59:13","date_gmt":"2016-10-10T13:59:13","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=2043"},"modified":"2019-05-30T13:56:55","modified_gmt":"2019-05-30T18:56:55","slug":"multi-armed-bandit-problem-and-exploration-vs-exploitation-trade-off","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2016\/10\/10\/multi-armed-bandit-problem-and-exploration-vs-exploitation-trade-off\/","title":{"rendered":"Multi-Armed Bandit Problem and Exploration vs. Exploitation Trade-off"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>Casino slot machines have a playful nickname - \"one-armed bandit\" - because of the single lever it has and our tendency to lose money when we play them. They also inspire the creativity of researchers. Today's guest blogger, <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/951521\">Toshi Takeuchi<\/a>, will introduce an interesting thought experiment known as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Multi-armed_bandit\">multi-armed bandits<\/a>.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/multiarmedbandit.jpg\" alt=\"\"> <\/p><p><a href=\"http:\/\/research.microsoft.com\/en-us\/projects\/bandits\/\">image source<\/a><\/p><p>Ordinary slot machines have only one lever. What if you had multiple levers to pull, each with different payout. This is a multi-armed bandit. You don't know which lever has the highest payout - you just have to try different levers to see which one works best, but for how long? If you keep pulling the low payout lever, you forego more rewards, but you won't know which lever is good until you try a sufficient number of times.<\/p><p>This interesting thought experiment, originally developed during World War II, was so intractable that allied scientists wanted to drop this problem over Germany so that German scientists could also waste their time on it. Since then, a number of strategies have been developed to solve this problem, but apparently many people are still working on it <a title=\"https:\/\/blogs.mathworks.com\/loren\/2016\/08\/08\/text-mining-machine-learning-research-papers-with-matlab\/ (link no longer works)\">given the number of papers on this topic in NIPS 2015<\/a>.<\/p><p>This is because bandit algorithms are related to the field of <a href=\"https:\/\/www.mathworks.com\/solutions\/machine-learning.html\">machine learning<\/a> called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Reinforcement_learning\">reinforcement learning<\/a>. Rather than <a href=\"https:\/\/www.mathworks.com\/discovery\/supervised-learning.html\">learning from explicit training data<\/a>, or <a href=\"https:\/\/www.mathworks.com\/discovery\/unsupervised-learning.html\">discovering patterns in static data<\/a>, reinforcement learning discovers the best option from trial and error with live examples. The multi-armed bandits in particular focus on the question of exploration vs. exploitation trade-off - how many resources should be spent in trial and error vs. maximizing the benefit.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#367d95e0-4941-4866-9d60-382a4f6751b2\">Example - Choosing the Optimal Color for Buy Now Button<\/a><\/li><li><a href=\"#c28ac03b-9356-4f5e-816e-138c2dd83065\">Epsilon-Greedy Strategy<\/a><\/li><li><a href=\"#812f012e-1a76-4863-aab3-30aa15da298f\">Testing Button Colors with a Three-Armed Bandit<\/a><\/li><li><a href=\"#2ab57c5a-b737-4f2d-aa43-ef54164d8d41\">The First Trial<\/a><\/li><li><a href=\"#214f70ae-2a6b-4c3c-97ae-9db4911db2d8\">Repeat Trials<\/a><\/li><li><a href=\"#1cc5e1a0-b6f7-4e5a-b028-cf4f62d399da\">Estimated vs. Actual Expected Rewards<\/a><\/li><li><a href=\"#ceb7b284-2f35-4426-8fae-8b0b0481c6b1\">Regret<\/a><\/li><li><a href=\"#7848d65e-3606-4906-b592-c7560c75ff3f\">Creating Live Demo<\/a><\/li><li><a href=\"#f7f0010d-a0df-4bb9-9368-8d438cbf10d7\">Epsilon-Decreasing Strategy<\/a><\/li><li><a href=\"#70af5a27-d0ac-4915-94aa-c9cf56f6b0e3\">Test the New Strategy<\/a><\/li><li><a href=\"#5ae3d660-cea2-4b29-933e-15c2cec5d1dd\">Updating the Live Demo<\/a><\/li><li><a href=\"#0767fa53-58a5-4429-a10d-b22ed357c61b\">Real World Uses<\/a><\/li><li><a href=\"#50f3f134-0447-4a03-b81a-da037e4f30c7\">Summary - Is life a bandit problem?<\/a><\/li><\/ul><\/div><h4>Example - Choosing the Optimal Color for Buy Now Button<a name=\"367d95e0-4941-4866-9d60-382a4f6751b2\"><\/a><\/h4><p>Let's take a look at a couple of the basic popular strategies here using an example from a blog post <a href=\"http:\/\/stevehanov.ca\/blog\/index.php?id=132\">20 lines of code that will beat A\/B testing every time<\/a>. It talks about how you can use bandit algorithms instead of plain vanilla A\/B testing in website optimization and provides a concise pseudo code.<\/p><p>We will use an example similar to the one used in the blog post: <b>should a \"Buy Now\" button be colored blue, orange, or yellow, to maximize the clicks?<\/b> You can see <a href=\"http:\/\/jsfiddle.net\/Toshiaki\/u9ufo2z8\/show\/\">a live demo of this problem in action<\/a> powered by <a href=\"https:\/\/www.mathworks.com\/products\/matlab-production-server\/\">MATLAB Production Server<\/a> running on AWS.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/buy_now.png\" alt=\"\"> <\/p><h4>Epsilon-Greedy Strategy<a name=\"c28ac03b-9356-4f5e-816e-138c2dd83065\"><\/a><\/h4><p>This strategy lets you choose an arm at random with uniform probability for a fraction $\\epsilon$ of the trials (exploration), and the best arm is selected $(1 - \\epsilon)$ of the trials (exploitation). This is implemented in the <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/eGreedy.m\">eGreedy<\/a><\/tt> class as the <tt>choose<\/tt> method. The usual value for $\\epsilon$ is 0.1 or 10% of the trials.<\/p><p>What is the \"best\" arm? At each trial, the arm we choose to pull may or may not give a reward. The best arm has the highest expectation of reward based on the rewards received up to that point. We keep track of the expected rewards for respective arms as scores.<\/p><pre class=\"codeinput\">dbtype <span class=\"string\">'eGreedy.m'<\/span> <span class=\"string\">31:38<\/span>\r\n<\/pre><pre class=\"codeoutput\">\r\n31            function choice = choose(self)          % choose an arm\r\n32                if rand &lt;= self.epsilon             % (epsilon) times\r\n33                    choice = randi(self.n_arms);    % explore all arms\r\n34                else                                % (1 - epsilon) times\r\n35                    [~, choice] = ...               % exploit the best arm\r\n36                        max(self.scores(end,:));\r\n37                end       \r\n38            end\r\n<\/pre><h4>Testing Button Colors with a Three-Armed Bandit<a name=\"812f012e-1a76-4863-aab3-30aa15da298f\"><\/a><\/h4><p>In our example, we can think of those buttons as choices of levers we can pull, and we get a reward of click or no click with an unknown probability with each color. This is known as a binary reward.  The <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/bandit.m\">bandit<\/a><\/tt> class creates an object with three arms with unknown probabilities, which in this case represent respective \"Buy Now\" buttons. Let's call that object <tt>buttons<\/tt>. The object provides the <tt>test<\/tt> method that returns a reward of 1 or 0 for a chosen button we show, which represents a click (1) or no click (0).<\/p><p>We will estimate the unknown click probabilities of those buttons through our trials using the <tt>eGreedy<\/tt> class, which we will initialize as the <tt>myGreedy<\/tt> object with $\\epsilon$ set to 0.1 by default. Initial expected rewards are also set uniformly to 1 for all arms, which means we expect buttons to get clicked 100% of times.<\/p><pre class=\"codeinput\">cols = {<span class=\"string\">'Blue'<\/span>,<span class=\"string\">'Orange'<\/span>,<span class=\"string\">'Yellow'<\/span>};          <span class=\"comment\">% button colors as arms<\/span>\r\nbuttons = bandit();                         <span class=\"comment\">% initialize bandit with 3 arms<\/span>\r\nmyGreedy = eGreedy(buttons.N);              <span class=\"comment\">% epsilon-greedy with epsilon = 0.1<\/span>\r\ns = array2table(myGreedy.scores);           <span class=\"comment\">% initial expected reward<\/span>\r\ns.Properties.VariableNames = cols           <span class=\"comment\">% name columns by color<\/span>\r\n<\/pre><pre class=\"codeoutput\">s = \r\n    Blue    Orange    Yellow\r\n    ____    ______    ______\r\n    1       1         1     \r\n<\/pre><h4>The First Trial<a name=\"2ab57c5a-b737-4f2d-aa43-ef54164d8d41\"><\/a><\/h4><p>In trials, we will choose a button based on the epsilon-greedy strategy using the <tt>choose<\/tt> method in the <tt>myGreedy<\/tt> object. Because the initial expected rewards are uniformly set to 1, one of the three colors will be selected randomly when a web visitor comes to a page.<\/p><pre class=\"codeinput\">rng(1);                                     <span class=\"comment\">% for reproducibility<\/span>\r\ncolor = myGreedy.choose()                   <span class=\"comment\">% choose a button color<\/span>\r\n<\/pre><pre class=\"codeoutput\">color =\r\n     1\r\n<\/pre><p>We will then display the blue button to the visitor on the page and this is implemented using the <tt>test<\/tt> method, which is equivalent to pulling the lever on a slot machine. The reward (click) is given with the hidden probability of the chosen button.<\/p><pre class=\"codeinput\">click = buttons.test(color)                 <span class=\"comment\">% display the button<\/span>\r\n<\/pre><pre class=\"codeoutput\">click =\r\n  logical\r\n   0\r\n<\/pre><p>Based on the actual result, we will update our expected reward for the chosen button. Since we didn't get the reward with the blue button, the expected reward for that button is now reduced:  $Blue = 1\/2 = 0.5$<\/p><pre class=\"codeinput\">myGreedy.update(color, click);              <span class=\"comment\">% update expected reward<\/span>\r\ns = array2table(myGreedy.scores);           <span class=\"comment\">% new expected reward<\/span>\r\ns.Properties.VariableNames = cols;          <span class=\"comment\">% name columns<\/span>\r\ns.Properties.RowNames={<span class=\"string\">'Initial'<\/span>,<span class=\"string\">'Trial_1'<\/span>} <span class=\"comment\">% name rows<\/span>\r\n<\/pre><pre class=\"codeoutput\">s = \r\n               Blue    Orange    Yellow\r\n               ____    ______    ______\r\n    Initial      1     1         1     \r\n    Trial_1    0.5     1         1     \r\n<\/pre><h4>Repeat Trials<a name=\"214f70ae-2a6b-4c3c-97ae-9db4911db2d8\"><\/a><\/h4><p>Let's continue for more trials and plot the changes in expected rewards. Initially the values fluctuate a lot, but eventually they settle into certain ranges. It looks like orange has the highest expected reward and blue has a slightly lower one.<\/p><pre class=\"codeinput\">n_trials = 500;                             <span class=\"comment\">% number of trials<\/span>\r\n<span class=\"keyword\">for<\/span> i = 1:n_trials - 1                      <span class=\"comment\">% subtract completed trial<\/span>\r\n    color = myGreedy.choose();              <span class=\"comment\">% choose a color<\/span>\r\n    click = buttons.test(color);            <span class=\"comment\">% test the button<\/span>\r\n    myGreedy.update(color, click);          <span class=\"comment\">% update expected reward<\/span>\r\n<span class=\"keyword\">end<\/span>\r\nname = <span class=\"string\">'Epsilon-Greedy Strategy'<\/span>;           <span class=\"comment\">% title<\/span>\r\nmyGreedy.plot(name, cols)                   <span class=\"comment\">% plot expected rewards<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/multi_armed_bandits_01.png\" alt=\"\"> <h4>Estimated vs. Actual Expected Rewards<a name=\"1cc5e1a0-b6f7-4e5a-b028-cf4f62d399da\"><\/a><\/h4><p>We will never know the actual probabilities in real world situations, but this is just a simulation, so we know the actual parameters we used to generate random rewards. Let's see how close we came to estimating them using the epsilon-greedy strategy. The estimates are off, but it still managed to identify the best button correctly.<\/p><pre class=\"codeinput\">array2table([myGreedy.scores(end,:); buttons.probs], <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'VariableNames'<\/span>,cols, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'RowNames'<\/span>,{<span class=\"string\">'Estimated'<\/span>,<span class=\"string\">'Actual'<\/span>})\r\n<\/pre><pre class=\"codeoutput\">ans = \r\n                  Blue      Orange     Yellow \r\n                 _______    _______    _______\r\n    Estimated    0.32353    0.32667    0.21053\r\n    Actual          0.28       0.32       0.26\r\n<\/pre><h4>Regret<a name=\"ceb7b284-2f35-4426-8fae-8b0b0481c6b1\"><\/a><\/h4><p>Let's consider alternative scenarios. Had we done this test under a standard A\/B testing protocol, we would have shown three buttons with equal proportions first to pursue pure exploration and then switch to pure exploitation by only showing the best button. This is also known as <b>the epsilon-first strategy<\/b>. Let's assume we did the A\/B testing with the first 300 trials and only showed the orange button thereafter.<\/p><p>Had we known the best button to use ahead of time, we would have shown the orange button all the time as the optimal strategy.<\/p><pre class=\"codeinput\">strategies =  {<span class=\"string\">'Epsilon-First'<\/span>,<span class=\"string\">'Epsilon-Greedy'<\/span>,<span class=\"string\">'Optimal'<\/span>};\r\ntrials = array2table([100 300 100; <span class=\"keyword\">...<\/span><span class=\"comment\">      % number of trials<\/span>\r\n    myGreedy.trials; [0 n_trials 0]]);\r\ntrials.Properties.VariableNames = cols;     <span class=\"comment\">% name cols<\/span>\r\ntrials.Properties.RowNames = strategies     <span class=\"comment\">% name rows<\/span>\r\n<\/pre><pre class=\"codeoutput\">trials = \r\n                      Blue    Orange    Yellow\r\n                      ____    ______    ______\r\n    Epsilon-First     100     300       100   \r\n    Epsilon-Greedy     33     449        18   \r\n    Optimal             0     500         0   \r\n<\/pre><p>We can calculate the expected clicks (rewards) from each strategy and see how many fewer clicks we got using the epsilon-greedy vs. the epsilon-first (A\/B testing). These differences are known as \"regrets\". You can see that the epsilon-greedy came pretty close to the optimal.<\/p><pre class=\"codeinput\">clicks = sum(bsxfun(@times, <span class=\"keyword\">...<\/span><span class=\"comment\">             % get expected number<\/span>\r\n    table2array(trials), buttons.probs), 2);<span class=\"comment\">% of clicks<\/span>\r\nRegret = clicks - clicks(end);              <span class=\"comment\">% subtract optimal clicks<\/span>\r\nRegret =  table(Regret, <span class=\"string\">'RowNames'<\/span>, strategies)\r\n<\/pre><pre class=\"codeoutput\">Regret = \r\n                      Regret\r\n                      ______\r\n    Epsilon-First      -10  \r\n    Epsilon-Greedy    -2.4  \r\n    Optimal              0  \r\n<\/pre><h4>Creating Live Demo<a name=\"7848d65e-3606-4906-b592-c7560c75ff3f\"><\/a><\/h4><p>My colleague Arvind Hosagrahara in <a href=\"https:\/\/www.mathworks.com\/services\/consulting\/\">consulting services<\/a> helped me to create the live demo linked earlier. Please read Arvind's excellent blog post <a href=\"https:\/\/blogs.mathworks.com\/developer\/2016\/07\/13\/build-a-product-build-a-service\/\">Build a product, build a service<\/a> for more details about how to create a web app with MATLAB. Here is the quick summary of what I did.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/deploy.png\" alt=\"\"> <\/p><div><ol><li>Create two wrapper functions <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/pickColor.m\">pickColor<\/a><\/tt> and <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/scoreButton.m\">scoreButton<\/a><\/tt> to call my <tt>eGreedy<\/tt> class<\/li><li>Compile the wrapper functions with <tt>eGreedy<\/tt> and other dependencies into a CTF file with <a href=\"https:\/\/www.mathworks.com\/products\/matlab-compiler-sdk\/\">MATLAB Compiler SDK<\/a><\/li><li>Deploy the CTF file to Mthe ATLAB Production Server Arvind is running on AWS by uploading the file<\/li><li>Create the web front end on jsFiddle (See <a href=\"http:\/\/jsfiddle.net\/Toshiaki\/u9ufo2z8\/\">the source code<\/a>, particularly the JAVASCRIPT section)<\/li><\/ol><\/div><p>Don't worry if you don't understand JavaScript. In a real world project, you would work with a team of web developers and IT people and they will take care of the web front end and IT back end, and you can just focus on your MATLAB code. The beauty of this approach is that you can use your MATLAB code directly itself, rather than re-doing it in C\/++, .NET or Java.<\/p><h4>Epsilon-Decreasing Strategy<a name=\"f7f0010d-a0df-4bb9-9368-8d438cbf10d7\"><\/a><\/h4><p>Now back to the algorithm. One issue with the epsilon-greedy strategy is how to set the value of $\\epsilon$, and how we continue exploring suboptimal arms at that rate even after the algorithm identifies the optimal choice. Rather than using a fixed value for $\\epsilon$, we can start with a high value that decreases over time. This way, we can favor exploration initially, and then favor exploitation later on. We can easily create a new class <tt>eDcrease<\/tt> from <tt>eGreedy<\/tt> to accomplish that.<\/p><p>The tricky part is how we should define our function that decreases the $\\epsilon$ value. Here is one way of doing it. It starts with pure exploration but we allocate more trials for exploitation as the experiment progresses, with gradually decreasing s of $\\epsilon$ . I tried different decreasing functions but it seems like the rate of decrease shouldn't be too quick to get the best result.<\/p><pre class=\"codeinput\">f = @(x) 1 .\/ (x .^ .5);                    <span class=\"comment\">% decreasing function<\/span>\r\nx = 1:n_trials;                             <span class=\"comment\">% number of trials<\/span>\r\nfigure                                      <span class=\"comment\">% new figure<\/span>\r\nplot(x,f(x))                                <span class=\"comment\">% plot epsilon value<\/span>\r\nxlabel(<span class=\"string\">'Trials'<\/span>)                            <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">'Epsilon'<\/span>)                           <span class=\"comment\">% y-axis label<\/span>\r\ntitle(<span class=\"string\">'Epsilon Decreasing Function'<\/span>)        <span class=\"comment\">% title<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/multi_armed_bandits_02.png\" alt=\"\"> <h4>Test the New Strategy<a name=\"70af5a27-d0ac-4915-94aa-c9cf56f6b0e3\"><\/a><\/h4><p>Let's now run this for the same number of trials. It looks like it also identified the orange button as the best option, but it spent more time in exploration initially.<\/p><pre class=\"codeinput\">rng(1);                                     <span class=\"comment\">% for reproducibility<\/span>\r\nmyDec = eDecrease(buttons.N, f);            <span class=\"comment\">% initialize object<\/span>\r\n<span class=\"keyword\">for<\/span> i = 1:n_trials                          <span class=\"comment\">% repeat trials<\/span>\r\n    color = myDec.choose();                 <span class=\"comment\">% choose a color<\/span>\r\n    click = buttons.test(color);            <span class=\"comment\">% test the button<\/span>\r\n    myDec.update(color, click);             <span class=\"comment\">% update expected reward<\/span>\r\n<span class=\"keyword\">end<\/span>\r\nname = <span class=\"string\">'Epsilon-Decreasing Strategy'<\/span>;       <span class=\"comment\">% title<\/span>\r\nmyDec.plot(name, cols)                      <span class=\"comment\">% plot expected rewards<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/multi_armed_bandits_03.png\" alt=\"\"> <p>The estimated expected rewards are also closer to the actual values.<\/p><pre class=\"codeinput\">array2table([myDec.scores(end,:);buttons.probs], <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'VariableNames'<\/span>,cols, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'RowNames'<\/span>,{<span class=\"string\">'Estimated'<\/span>,<span class=\"string\">'Actual'<\/span>})\r\n<\/pre><pre class=\"codeoutput\">ans = \r\n                  Blue      Orange     Yellow \r\n                 _______    _______    _______\r\n    Estimated    0.27586    0.33148    0.25217\r\n    Actual          0.28       0.32       0.26\r\n<\/pre><p>Unfortunately, the result shows that this strategy is not doing as well as the epsilon-greedy one.<\/p><pre class=\"codeinput\">strategies =  {<span class=\"string\">'Epsilon-First'<\/span>,<span class=\"string\">'Epsilon-Greedy'<\/span>,<span class=\"string\">'Epsilon-Decreasing'<\/span>,<span class=\"string\">'Optimal'<\/span>};\r\ntrials = array2table([[100 300 100]; <span class=\"keyword\">...<\/span><span class=\"comment\">    % number of trials<\/span>\r\n    myGreedy.trials; myDec.trials; [0 n_trials 0]]);\r\nclicks = sum(bsxfun(@times, <span class=\"keyword\">...<\/span><span class=\"comment\">             % get expected number<\/span>\r\n    table2array(trials), buttons.probs), 2);<span class=\"comment\">% of clicks<\/span>\r\nRegret = clicks - clicks(end);              <span class=\"comment\">% subtract optimal clicks<\/span>\r\nRegret =  table(Regret, <span class=\"string\">'RowNames'<\/span>, strategies)\r\n<\/pre><pre class=\"codeoutput\">Regret = \r\n                          Regret\r\n                          ______\r\n    Epsilon-First           -10 \r\n    Epsilon-Greedy         -2.4 \r\n    Epsilon-Decreasing    -7.96 \r\n    Optimal                   0 \r\n<\/pre><h4>Updating the Live Demo<a name=\"5ae3d660-cea2-4b29-933e-15c2cec5d1dd\"><\/a><\/h4><p>If the epsilon-decreasing strategy had worked, I might have wanted to update my live demo. Had I implemented the epsilon-greedy algorithm in other languages, I would have had to re-code all over again. Luckily, I am using my original MATLAB code, so this is what I would do:<\/p><div><ol><li>Change the wrapper functions to call <tt>eDecrease<\/tt> rather than <tt>eGreedy<\/tt> and re-deploy them to MATLAB Production Server.<\/li><li>No change to the web front end<\/li><li>No change to the IT back end<\/li><\/ol><\/div><p>If you are working in a team, this means you can update your code freely without asking your front end or back end people to help you out. Sweet! <b>This is a powerful workflow advantage you gain from using MATLAB Production Server<\/b>.<\/p><h4>Real World Uses<a name=\"0767fa53-58a5-4429-a10d-b22ed357c61b\"><\/a><\/h4><p>In the examples we used here, we consider the case of choosing the best color for the \"Buy Now\" button on a web page but you can see this would be useful, for example, to choose which ads a news aggregator website should show on the search page - <a href=\"https:\/\/www.inverse.com\/article\/13762-how-the-multi-armed-bandit-determines-what-ads-and-stories-you-see-online\">the Washington Post<\/a> apparently uses a bandit algorithm for that.<\/p><p>Recommender systems typically face the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Cold_start\">cold start problem<\/a> - for example, <a href=\"https:\/\/blogs.mathworks.com\/loren\/2015\/04\/22\/the-netflix-prize-and-production-machine-learning-systems-an-insider-look\/\">Netflix<\/a> has no data for how people would rate newly released movies. It can still recommend such movies through a trial and error process as well.<\/p><p>We also talked about the advantage of MATLAB Production Server in your workflow. One of the pain points Netflix experienced is the problem of dual implementations - you prototype your algorithm in one language such as MATLAB, and you deploy it to production systems in another - such as C\/C++, .NET, or Java. This means you have to re-code every time you change your algorithm. You can prototype your algorithm in MATLAB, and that code can be deployed to production system quite easily with MATLAB Production Sever.<\/p><h4>Summary - Is life a bandit problem?<a name=\"50f3f134-0447-4a03-b81a-da037e4f30c7\"><\/a><\/h4><p><a href=\"https:\/\/blogs.mathworks.com\/community\/\">MATLAB Community blog<\/a> host Ned saw an early draft of this post and sent me <a href=\"http:\/\/longnow.org\/seminars\/02016\/jun\/20\/algorithms-live\/\">this video<\/a> about using algorithms to make life decisions like when to stop looking for an ideal apartment or spouse. It is a different genre of algorithm called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Optimal_stopping\">optimal stopping<\/a>, but the video proposes that you can apply it in life. Maybe you can also find a way to apply bandit algorithms in life. Let us know how you would use this algorithm <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=2043#respond\">here<\/a>!<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_5a5fa2bf9d9b434fa7e32bc3c02e6cdb() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='5a5fa2bf9d9b434fa7e32bc3c02e6cdb ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 5a5fa2bf9d9b434fa7e32bc3c02e6cdb';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2016 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_5a5fa2bf9d9b434fa7e32bc3c02e6cdb()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2016b<br><\/p><\/div><!--\r\n5a5fa2bf9d9b434fa7e32bc3c02e6cdb ##### SOURCE BEGIN #####\r\n%% Multi-Armed Bandit Problem and Exploration vs. Exploitation Trade-off\r\n% Casino slot machines have a playful nickname - \"one-armed bandit\" -\r\n% because of the single lever it has and our tendency to lose money when we\r\n% play them. They also inspire the creativity of researchers. Today's guest\r\n% blogger, <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/951521\r\n% Toshi Takeuchi>, will introduce an interesting thought experiment known\r\n% as <https:\/\/en.wikipedia.org\/wiki\/Multi-armed_bandit multi-armed\r\n% bandits>.\r\n% \r\n% <<multiarmedbandit.jpg>>\r\n% \r\n% <http:\/\/research.microsoft.com\/en-us\/projects\/bandits\/ image source>\r\n% \r\n% Ordinary slot machines have only one lever. What if you had multiple\r\n% levers to pull, each with different payout. This is a multi-armed bandit.\r\n% You don't know which lever has the highest payout - you just have to try\r\n% different levers to see which one works best, but for how long? If you\r\n% keep pulling the low payout lever, you forego more rewards, but you won't\r\n% know which lever is good until you try a sufficient number of times.\r\n% \r\n% This interesting thought experiment, originally developed during World\r\n% War II, was so intractable that allied scientists wanted to drop this\r\n% problem over Germany so that German scientists could also waste their\r\n% time on it. Since then, a number of strategies have been developed to solve\r\n% this problem, but apparently many people are still working on it\r\n% <https:\/\/blogs.mathworks.com\/loren\/2016\/08\/08\/text-mining-machine-learning-research-papers-with-matlab\/\r\n% given the number of papers on this topic in NIPS 2015>.\r\n% \r\n% This is because bandit algorithms are related to the field of\r\n% <https:\/\/www.mathworks.com\/solutions\/machine-learning.html machine learning>\r\n% called <https:\/\/en.wikipedia.org\/wiki\/Reinforcement_learning\r\n% reinforcement learning>. Rather than\r\n% <https:\/\/www.mathworks.com\/discovery\/supervised-learning.html learning\r\n% from explicit training data>, or\r\n% <https:\/\/www.mathworks.com\/discovery\/unsupervised-learning.html\r\n% discovering patterns in static data>, reinforcement learning discovers\r\n% the best option from trial and error with live examples. The multi-armed\r\n% bandits in particular focus on the question of exploration vs.\r\n% exploitation trade-off - how many resources should be spent in trial and\r\n% error vs. maximizing the benefit.\r\n%\r\n%% Example - Choosing the Optimal Color for Buy Now Button\r\n% Let's take a look at a couple of the basic popular strategies here using\r\n% an example from a blog post <http:\/\/stevehanov.ca\/blog\/index.php?id=132\r\n% 20 lines of code that will beat A\/B testing every time>. It talks about\r\n% how you can use bandit algorithms instead of plain vanilla A\/B testing in\r\n% website optimization and provides a concise pseudo code.\r\n% \r\n% We will use an example similar to the one used in the blog post: *should\r\n% a \"Buy Now\" button be colored blue, orange, or yellow, to maximize the\r\n% clicks?* You can see <http:\/\/jsfiddle.net\/Toshiaki\/u9ufo2z8\/show\/ a live\r\n% demo of this problem in action> powered by\r\n% <https:\/\/www.mathworks.com\/products\/matlab-production-server\/ MATLAB\r\n% Production Server> running on AWS.\r\n% \r\n% <<buy_now.png>>\r\n% \r\n%% Epsilon-Greedy Strategy\r\n% This strategy lets you choose an arm at random with uniform probability\r\n% for a fraction $\\epsilon$ of the trials (exploration), and the best arm\r\n% is selected $(1 - \\epsilon)$ of the trials (exploitation). This is\r\n% implemented in the\r\n% |<https:\/\/blogs.mathworks.com\/images\/loren\/2016\/eGreedy.m eGreedy>| class\r\n% as the |choose| method. The usual value for $\\epsilon$ is 0.1 or 10% of\r\n% the trials.\r\n% \r\n% What is the \"best\" arm? At each trial, the arm we choose to pull may or\r\n% may not give a reward. The best arm has the highest expectation of reward\r\n% based on the rewards received up to that point. We keep track of the\r\n% expected rewards for respective arms as scores.\r\n\r\ndbtype 'eGreedy.m' 31:38\r\n\r\n%% Testing Button Colors with a Three-Armed Bandit\r\n% In our example, we can think of those buttons as choices of levers we can\r\n% pull, and we get a reward of click or no click with an unknown\r\n% probability with each color. This is known as a binary reward.  The\r\n% |<https:\/\/blogs.mathworks.com\/images\/loren\/2016\/bandit.m bandit>| class\r\n% creates an object with three arms with unknown probabilities, which in\r\n% this case represent respective \"Buy Now\" buttons. Let's call that object\r\n% |buttons|. The object provides the |test| method that returns a reward of\r\n% 1 or 0 for a chosen button we show, which represents a click (1) or no\r\n% click (0).\r\n% \r\n% We will estimate the unknown click probabilities of those buttons through\r\n% our trials using the |eGreedy| class, which we will initialize as the\r\n% |myGreedy| object with $\\epsilon$ set to 0.1 by default. Initial expected\r\n% rewards are also set uniformly to 1 for all arms, which means we expect\r\n% buttons to get clicked 100% of times.\r\n\r\ncols = {'Blue','Orange','Yellow'};          % button colors as arms\r\nbuttons = bandit();                         % initialize bandit with 3 arms\r\nmyGreedy = eGreedy(buttons.N);              % epsilon-greedy with epsilon = 0.1\r\ns = array2table(myGreedy.scores);           % initial expected reward\r\ns.Properties.VariableNames = cols           % name columns by color\r\n\r\n%% The First Trial\r\n% In trials, we will choose a button based on the epsilon-greedy strategy\r\n% using the |choose| method in the |myGreedy| object. Because the initial expected\r\n% rewards are uniformly set to 1, one of the three colors will be selected\r\n% randomly when a web visitor comes to a page.\r\n\r\nrng(1);                                     % for reproducibility\r\ncolor = myGreedy.choose()                   % choose a button color\r\n\r\n%% \r\n% We will then display the blue button to the visitor on the page and this\r\n% is implemented using the |test| method, which is equivalent to pulling the\r\n% lever on a slot machine. The reward (click) is given with the hidden\r\n% probability of the chosen button.\r\n\r\nclick = buttons.test(color)                 % display the button\r\n\r\n%% \r\n% Based on the actual result, we will update our expected reward for the\r\n% chosen button. Since we didn't get the reward with the blue button, the\r\n% expected reward for that button is now reduced:  $Blue = 1\/2 = 0.5$\r\n\r\nmyGreedy.update(color, click);              % update expected reward\r\ns = array2table(myGreedy.scores);           % new expected reward\r\ns.Properties.VariableNames = cols;          % name columns\r\ns.Properties.RowNames={'Initial','Trial_1'} % name rows\r\n\r\n%% Repeat Trials\r\n% Let's continue for more trials and plot the changes in expected rewards.\r\n% Initially the values fluctuate a lot, but eventually they settle into\r\n% certain ranges. It looks like orange has the highest expected reward and\r\n% blue has a slightly lower one.\r\n\r\nn_trials = 500;                             % number of trials\r\nfor i = 1:n_trials - 1                      % subtract completed trial\r\n    color = myGreedy.choose();              % choose a color\r\n    click = buttons.test(color);            % test the button\r\n    myGreedy.update(color, click);          % update expected reward\r\nend\r\nname = 'Epsilon-Greedy Strategy';           % title\r\nmyGreedy.plot(name, cols)                   % plot expected rewards\r\n\r\n%% Estimated vs. Actual Expected Rewards\r\n% We will never know the actual probabilities in real world situations,\r\n% but this is just a simulation, so we know the actual parameters we used\r\n% to generate random rewards. Let's see how close we came to estimating them\r\n% using the epsilon-greedy strategy. The estimates are off, but it still\r\n% managed to identify the best button correctly.\r\n\r\narray2table([myGreedy.scores(end,:); buttons.probs], ...\r\n    'VariableNames',cols, ...\r\n    'RowNames',{'Estimated','Actual'})\r\n\r\n%% Regret\r\n% Let's consider alternative scenarios. Had we done this test under a\r\n% standard A\/B testing protocol, we would have shown three buttons with equal\r\n% proportions first to pursue pure exploration and then switch to pure\r\n% exploitation by only showing the best button. This is also known as *the\r\n% epsilon-first strategy*. Let's assume we did the A\/B testing with the\r\n% first 300 trials and only showed the orange button thereafter.\r\n% \r\n% Had we known the best button to use ahead of time, we would have shown\r\n% the orange button all the time as the optimal strategy.\r\n\r\nstrategies =  {'Epsilon-First','Epsilon-Greedy','Optimal'};\r\ntrials = array2table([100 300 100; ...      % number of trials\r\n    myGreedy.trials; [0 n_trials 0]]);\r\ntrials.Properties.VariableNames = cols;     % name cols\r\ntrials.Properties.RowNames = strategies     % name rows\r\n\r\n%% \r\n% We can calculate the expected clicks (rewards) from each strategy and\r\n% see how many fewer clicks we got using the epsilon-greedy vs. the\r\n% epsilon-first (A\/B testing). These differences are known as \"regrets\".\r\n% You can see that the epsilon-greedy came pretty close to the optimal.\r\n\r\nclicks = sum(bsxfun(@times, ...             % get expected number\r\n    table2array(trials), buttons.probs), 2);% of clicks\r\nRegret = clicks - clicks(end);              % subtract optimal clicks\r\nRegret =  table(Regret, 'RowNames', strategies)\r\n\r\n%% Creating Live Demo\r\n% My colleague Arvind Hosagrahara in\r\n% <https:\/\/www.mathworks.com\/services\/consulting\/ consulting services> helped\r\n% me to create the live demo linked earlier. Please read Arvind's excellent\r\n% blog post\r\n% <https:\/\/blogs.mathworks.com\/developer\/2016\/07\/13\/build-a-product-build-a-service\/\r\n% Build a product, build a service> for more details about how to create a\r\n% web app with MATLAB. Here is the quick summary of what I did.\r\n%\r\n% <<deploy.png>>\r\n% \r\n% # Create two wrapper functions\r\n% |<https:\/\/blogs.mathworks.com\/images\/loren\/2016\/pickColor.m pickColor>|\r\n% and |<https:\/\/blogs.mathworks.com\/images\/loren\/2016\/scoreButton.m\r\n% scoreButton>| to call my |eGreedy| class\r\n% # Compile the wrapper functions with |eGreedy| and other dependencies\r\n% into a CTF file with\r\n% <https:\/\/www.mathworks.com\/products\/matlab-compiler-sdk\/ MATLAB Compiler\r\n% SDK>\r\n% # Deploy the CTF file to Mthe ATLAB Production Server Arvind is running\r\n% on AWS by uploading the file\r\n% # Create the web front end on jsFiddle (See\r\n% <http:\/\/jsfiddle.net\/Toshiaki\/u9ufo2z8\/ the source code>, particularly\r\n% the JAVASCRIPT section)\r\n%\r\n% Don't worry if you don't understand JavaScript. In a real world project,\r\n% you would work with a team of web developers and IT people and they will\r\n% take care of the web front end and IT back end, and you can just focus on\r\n% your MATLAB code. The beauty of this approach is that you can use your\r\n% MATLAB code directly itself, rather than re-doing it in C\/++, .NET or Java.\r\n%\r\n%% Epsilon-Decreasing Strategy\r\n% Now back to the algorithm. One issue with the epsilon-greedy strategy is\r\n% how to set the value of $\\epsilon$, and how we continue exploring\r\n% suboptimal arms at that rate even after the algorithm identifies the\r\n% optimal choice. Rather than using a fixed value for $\\epsilon$, we can\r\n% start with a high value that decreases over time. This way, we can favor\r\n% exploration initially, and then favor exploitation later on. We can\r\n% easily create a new class\r\n% |<https:\/\/blogs.mathworks.com\/images\/loren\/2016\/eDcrease.m eDcrease>| from\r\n% |eGreedy| to accomplish that.\r\n% \r\n% The tricky part is how we should define our function that decreases the\r\n% $\\epsilon$ value. Here is one way of doing it. It starts with pure\r\n% exploration but we allocate more trials for exploitation as the\r\n% experiment progresses, with gradually decreasing s of $\\epsilon$ . I\r\n% tried different decreasing functions but it seems like the rate of\r\n% decrease shouldn't be too quick to get the best result.\r\n\r\nf = @(x) 1 .\/ (x .^ .5);                    % decreasing function\r\nx = 1:n_trials;                             % number of trials\r\nfigure                                      % new figure\r\nplot(x,f(x))                                % plot epsilon value\r\nxlabel('Trials')                            % x-axis label\r\nylabel('Epsilon')                           % y-axis label\r\ntitle('Epsilon Decreasing Function')        % title\r\n\r\n%% Test the New Strategy\r\n% Let's now run this for the same number of trials. It looks like it also\r\n% identified the orange button as the best option, but it spent more time\r\n% in exploration initially.\r\n\r\nrng(1);                                     % for reproducibility\r\nmyDec = eDecrease(buttons.N, f);            % initialize object\r\nfor i = 1:n_trials                          % repeat trials\r\n    color = myDec.choose();                 % choose a color\r\n    click = buttons.test(color);            % test the button\r\n    myDec.update(color, click);             % update expected reward\r\nend\r\nname = 'Epsilon-Decreasing Strategy';       % title       \r\nmyDec.plot(name, cols)                      % plot expected rewards\r\n\r\n%% \r\n% The estimated expected rewards are also closer to the actual values.\r\n\r\narray2table([myDec.scores(end,:);buttons.probs], ...\r\n    'VariableNames',cols, ...\r\n    'RowNames',{'Estimated','Actual'}) \r\n\r\n%% \r\n% Unfortunately, the result shows that this strategy is not doing as well\r\n% as the epsilon-greedy one.\r\n\r\nstrategies =  {'Epsilon-First','Epsilon-Greedy','Epsilon-Decreasing','Optimal'};\r\ntrials = array2table([[100 300 100]; ...    % number of trials\r\n    myGreedy.trials; myDec.trials; [0 n_trials 0]]);\r\nclicks = sum(bsxfun(@times, ...             % get expected number\r\n    table2array(trials), buttons.probs), 2);% of clicks\r\nRegret = clicks - clicks(end);              % subtract optimal clicks\r\nRegret =  table(Regret, 'RowNames', strategies)\r\n\r\n%% Updating the Live Demo\r\n% If the epsilon-decreasing strategy had worked, I might have wanted to\r\n% update my live demo. Had I implemented the epsilon-greedy algorithm in\r\n% other languages, I would have had to re-code all over again. Luckily, I\r\n% am using my original MATLAB code, so this is what I would do:\r\n% \r\n% # Change the wrapper functions to call |eDecrease| rather than |eGreedy|\r\n% and re-deploy them to MATLAB Production Server.\r\n% # No change to the web front end\r\n% # No change to the IT back end\r\n% \r\n% If you are working in a team, this means you can update your code freely\r\n% without asking your front end or back end people to help you out. Sweet!\r\n% *This is a powerful workflow advantage you gain from using MATLAB\r\n% Production Server*.\r\n% \r\n%% Real World Uses\r\n% In the examples we used here, we consider the case of choosing the best\r\n% color for the \"Buy Now\" button on a web page but you can see this would\r\n% be useful, for example, to choose which ads a news aggregator website\r\n% should show on the search page -\r\n% <https:\/\/www.inverse.com\/article\/13762-how-the-multi-armed-bandit-determines-what-ads-and-stories-you-see-online\r\n% the Washington Post> apparently uses a bandit algorithm for that.\r\n% \r\n% Recommender systems typically face the\r\n% <https:\/\/en.wikipedia.org\/wiki\/Cold_start cold start problem> - for\r\n% example,\r\n% <https:\/\/blogs.mathworks.com\/loren\/2015\/04\/22\/the-netflix-prize-and-production-machine-learning-systems-an-insider-look\/\r\n% Netflix> has no data for how people would rate newly released movies. It\r\n% can still recommend such movies through a trial and error process as\r\n% well.\r\n% \r\n% We also talked about the advantage of MATLAB Production Server in your\r\n% workflow. One of the pain points Netflix experienced is the problem of\r\n% dual implementations - you prototype your algorithm in one language such\r\n% as MATLAB, and you deploy it to production systems in another - such as\r\n% C\/C++, .NET, or Java. This means you have to re-code every time you\r\n% change your algorithm. You can prototype your algorithm in MATLAB, and\r\n% that code can be deployed to production system quite easily with MATLAB\r\n% Production Sever.\r\n% \r\n%% Summary - Is life a bandit problem?\r\n% <https:\/\/blogs.mathworks.com\/community\/ MATLAB Community blog> host Ned\r\n% saw an early draft of this post and sent me\r\n% <http:\/\/longnow.org\/seminars\/02016\/jun\/20\/algorithms-live\/ this video>\r\n% about using algorithms to make life decisions like when to stop looking\r\n% for an ideal apartment or spouse. It is a different genre of algorithm\r\n% called <https:\/\/en.wikipedia.org\/wiki\/Optimal_stopping optimal stopping>,\r\n% but the video proposes that you can apply it in life. Maybe you can also\r\n% find a way to apply bandit algorithms in life. Let us know how you would\r\n% use this algorithm <https:\/\/blogs.mathworks.com\/loren\/?p=2043#respond\r\n% here>!\r\n##### SOURCE END ##### 5a5fa2bf9d9b434fa7e32bc3c02e6cdb\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/multi_armed_bandits_03.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>Casino slot machines have a playful nickname - \"one-armed bandit\" - because of the single lever it has and our tendency to lose money when we play them. They also inspire the creativity of researchers. Today's guest blogger, <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/951521\">Toshi Takeuchi<\/a>, will introduce an interesting thought experiment known as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Multi-armed_bandit\">multi-armed bandits<\/a>.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2016\/10\/10\/multi-armed-bandit-problem-and-exploration-vs-exploitation-trade-off\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[24,48],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2043"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=2043"}],"version-history":[{"count":4,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2043\/revisions"}],"predecessor-version":[{"id":3355,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2043\/revisions\/3355"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=2043"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=2043"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=2043"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}