{"id":5178,"date":"2016-04-01T01:05:59","date_gmt":"2016-04-01T06:05:59","guid":{"rendered":"https:\/\/blogs.mathworks.com\/simulink\/?p=5178"},"modified":"2016-04-01T14:22:22","modified_gmt":"2016-04-01T19:22:22","slug":"simulating-the-2016-baseball-season","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/simulink\/2016\/04\/01\/simulating-the-2016-baseball-season\/","title":{"rendered":"Simulating The 2016 Baseball Season"},"content":{"rendered":"<p>A few weeks ago, <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/4370178\">Matt<\/a>, <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/4758131-corey-lagunowich\">Corey<\/a> and I had a conversation about the rise of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sabermetrics\">sabermetrics<\/a> and sports analytics. With the baseball season opening on April 3rd we decided to apply the power of MATLAB and Simulink to predict how things will turn out.<\/p>\r\n\r\n<p>Here is what we came up with:<\/p>\r\n\r\n<ul>\r\n\t<li>Obtain past baseball statistics and extrapolate for this season.<\/li>\r\n\t<li>Implement the simulation of a baseball game in Simulink and SimEvents<\/li>\r\n\t<li>Run a monte carlo simulation of all games in the 2016 season<\/li>\r\n<\/ul>\r\n\r\n<p><strong>Predictive Analytics<\/strong><\/p>\r\n\r\n<p>We loaded the past 3 seasons from <a href=\"http:\/\/www.seanlahman.com\/baseball-archive\/statistics\/\">Sean Lahman's archive of baseball statistics<\/a> into MATLAB using the <a href=\"https:\/\/www.mathworks.com\/products\/database\/\">Database Toolbox<\/a>.<\/p>\r\n \r\n<p>Using this data, we constructed probabilities for player performance in 2016 taking the following into account:<\/p>\r\n \r\n<ul>\r\n\t<li><strong>Player age:<\/strong> We adjusted player performance expectations based on age <a href=\"\">using analysis from Baseball Prospectus<\/a>. We allowed random deviation from this arc based on something we called the Ortiz factor<\/li>\r\n\r\n\t<li><strong>Durability:<\/strong> We also calculated a durability index based on players\u2019 games played in previous seasons. This was used to project injuries. We called this the Buchholz factor.<\/li>\r\n\r\n\t<li><strong>Contract status:<\/strong> Players often perform at their peak when contracts are approaching renewal and see a drop in production after receiving a new, lucrative \r\ncontract. We called this the Sandoval factor.<\/li>\r\n\r\n<p><strong>Simulating a Baseball Game in SimEvents<\/strong><\/p>\r\n\r\n<p>When thinking about all the products in the Simulink family, one that seems appropriate to simulate a baseball game is <a href=\"https:\/\/www.mathworks.com\/products\/simevents\/\">SimEvents<\/a>. In R2016a, the entire SimEvents block library is new, so I thought it would be a good opportunity for me to get familiar with it, and to highlight what it can do.<\/p>\r\n\r\n<p>Here is what the top model looks like:<\/p>\r\n \r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/simulink\/files\/baseballTop.png\" alt=\"Top Model\" \/><\/p>\r\n\r\n<p>To begin, we use the <a title=\"https:\/\/www.mathworks.com\/help\/releases\/R2016a\/simevents\/ref\/entitygenerator.html (link no longer works)\">Entity Generator<\/a> block to generate the <i>player<\/i> entity. In the Event Actions tab of the block, we call MATLAB code and Simulink Functions to set the attributes of the player entity. In the MATLAB function <tt>initPlayer<\/tt>, we manage the roster of each team, including some variability in player replacements during the game.<\/p>\r\n\r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/simulink\/files\/generateEntity.png\" alt=\"Player Entity Generation\" \/><\/p>\r\n\r\n<p>The player then goes through a <a title=\"https:\/\/www.mathworks.com\/help\/releases\/R2016a\/simevents\/ref\/discreteeventchart.html (link no longer works)\">Discrete Event Stateflow Chart<\/a>. In the chart, we use the attributes of the player entity and a random number to determine the outcome of the at-bat. Using the <tt>forward<\/tt> function, the <i>player<\/i> entity is either forwarded to first base, or back to the bench.<\/p>\r\n\r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/simulink\/files\/plateSF.png\" alt=\"Plate logic\" \/><\/p>\r\n\r\n<p>Each base is modeled using an <a title=\"https:\/\/www.mathworks.com\/help\/releases\/R2016a\/simevents\/ref\/entityserver.html (link no longer works)\">Entity Server<\/a> block. At the output of the server, we place an <a title=\"https:\/\/www.mathworks.com\/help\/releases\/R2016a\/simevents\/ref\/entitygate.html (link no longer works)\">Entity Gate<\/a> block. We implemented some logic, once again based on player statistics and a random number to determine if\/when the player tries to move toward the next base. Using a <a title=\"https:\/\/www.mathworks.com\/help\/releases\/R2016a\/simevents\/ref\/entityoutputswitch.html (link no longer works)\">Output Switch<\/a> block and some additional randomness based on player stats, we either advance the player to the next base or collect them as an out.<\/p>\r\n\r\n<p>Notice the {...} on the Server block. This means that the block has Event Actions defined. In this case, when the <i>player<\/i> entity is serviced, the Simulink function is executed to decide if the player stays on base, move to next base, or is out. Triggering the logic computation in the Server event actions ensures it gets executed at the proper time, in sync with the rest of the SimEvents network.<\/p>\r\n\r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/simulink\/files\/baseLogic.png\" alt=\"bases logic\" \/><\/p>\r\n\r\n<p>Downstream, we collect all the retired players in a server with a capacity of 3. When the server is full, we flush it, empty the bases, and move to the next half-inning.<\/p>\r\n\r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/simulink\/files\/manageInnings.png\" alt=\"Inning management\" \/><\/p>\r\n\r\nWith that done, we only need to use the Number of Entities arrived statistics of Entity Terminators to count the players who have been able to make it to home plate as a run.\r\n\r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/simulink\/files\/countPoints.png\" alt=\"Count points\" \/><\/p>\r\n\r\n<p><strong>The Season<\/strong><\/p>\r\n\r\n<p>Baseball\u2019s regular season has 162 games across 30 teams. We wanted to run a Monte Carlo simulation with 1000 iterations to get a probability distribution of the outcome, so we had to virtually play nearly 2.6 million baseball games. We applied parallel computing to this challenge and were able to complete those games in less time than a typical rendition of \u201cTake me out to the ballgame\u201d.<\/p>\r\n\r\n<p>To simulate one game, I created a function that looks like:<\/p>\r\n\r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/simulink\/files\/runOneGame.png\" alt=\"Simulating one game\" \/><\/p>\r\n\r\n<p>It then becomes very easy to call this function inside a parfor loop to simulate the entire season using as much processing power as available:<\/p>\r\n\r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/simulink\/files\/runSeason.png\" alt=\"Simulating the season\" \/><\/p>\r\n\r\n<p>By taking advantage of the <a href=\"https:\/\/www.mathworks.com\/help\/simulink\/ug\/fast-restart-workflow.html\">Fast Restart<\/a> feature, an entire season can be simulated in only a few seconds.<\/p>\r\n\r\n<p><strong>The Results<\/strong><\/p>\r\n\r\n<p>Here are our projected results for the National League champion compared well to the Vegas odds:<\/p>\r\n\r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/simulink\/files\/season1.png\" alt=\"Season stats 1\" \/><\/p>\r\n\r\n<p>For the American League, we started to depart a bit from the odds maker\u2019s projections:<\/p>\r\n\r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/simulink\/files\/season2.png\" alt=\"Season stats 2\" \/><\/p>\r\n\r\n<p>In what was surely a strange statistical anomaly, the Cubs did not win the World Series in any of the 1000 simulations, despite winning the National League 34% of the time.<\/p>\r\n\r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/simulink\/files\/season3.png\" alt=\"Seasons tats 3\" \/><\/p>\r\n\r\n<p>At this point, I asked Matt and Corey to try one more thing. As we all know, in 1994, the Montreal Expos were playing extremely well, and many predicted they were the team to beat in the playoffs. But tragically the World Series was cancelled by labor strife. So, we gave the Expos another chance in our virtual season by replacing the Washington Nationals with the Expos \u201994 roster. The results were astounding.<\/p>\r\n\r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/simulink\/files\/season4.png\" alt=\"season stats 4\" \/><\/p>\r\n\r\n<p><strong>Now it\u2019s your turn<\/strong><\/p>\r\n\r\n<p>You might look at today\u2019s date and think this was all a joke. But one thing is sure, you\u2019d be foolish to underestimate Felipe Alou\u2019s squad from 1994... and the power of the new SimEvents in R2016a!<\/p>","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/simulink\/files\/season4.png\" onError=\"this.style.display ='none';\" \/><\/div><p>A few weeks ago, Matt, Corey and I had a conversation about the rise of sabermetrics and sports analytics. With the baseball season opening on April 3rd we decided to apply the power of MATLAB and... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/simulink\/2016\/04\/01\/simulating-the-2016-baseball-season\/\">read more >><\/a><\/p>","protected":false},"author":41,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[71,30,10,16],"tags":[467,416],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/posts\/5178"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/comments?post=5178"}],"version-history":[{"count":49,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/posts\/5178\/revisions"}],"predecessor-version":[{"id":5305,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/posts\/5178\/revisions\/5305"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/media?parent=5178"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/categories?post=5178"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/tags?post=5178"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}