Simulating The 2016 Baseball Season
A few weeks ago, Matt, Corey and I had a conversation about the rise of sabermetrics and sports analytics. With the baseball season opening on April 3rd we decided to apply the power of MATLAB and Simulink to predict how things will turn out.
Here is what we came up with:
- Obtain past baseball statistics and extrapolate for this season.
- Implement the simulation of a baseball game in Simulink and SimEvents
- Run a monte carlo simulation of all games in the 2016 season
Predictive Analytics
We loaded the past 3 seasons from Sean Lahman's archive of baseball statistics into MATLAB using the Database Toolbox.
Using this data, we constructed probabilities for player performance in 2016 taking the following into account:
- Player age: We adjusted player performance expectations based on age using analysis from Baseball Prospectus. We allowed random deviation from this arc based on something we called the Ortiz factor
- Durability: We also calculated a durability index based on players’ games played in previous seasons. This was used to project injuries. We called this the Buchholz factor.
- Contract status: Players often perform at their peak when contracts are approaching renewal and see a drop in production after receiving a new, lucrative contract. We called this the Sandoval factor.
Simulating a Baseball Game in SimEvents
When thinking about all the products in the Simulink family, one that seems appropriate to simulate a baseball game is SimEvents. In R2016a, the entire SimEvents block library is new, so I thought it would be a good opportunity for me to get familiar with it, and to highlight what it can do.
Here is what the top model looks like:
To begin, we use the Entity Generator block to generate the player entity. In the Event Actions tab of the block, we call MATLAB code and Simulink Functions to set the attributes of the player entity. In the MATLAB function initPlayer, we manage the roster of each team, including some variability in player replacements during the game.
The player then goes through a Discrete Event Stateflow Chart. In the chart, we use the attributes of the player entity and a random number to determine the outcome of the at-bat. Using the forward function, the player entity is either forwarded to first base, or back to the bench.
Each base is modeled using an Entity Server block. At the output of the server, we place an Entity Gate block. We implemented some logic, once again based on player statistics and a random number to determine if/when the player tries to move toward the next base. Using a Output Switch block and some additional randomness based on player stats, we either advance the player to the next base or collect them as an out.
Notice the {...} on the Server block. This means that the block has Event Actions defined. In this case, when the player entity is serviced, the Simulink function is executed to decide if the player stays on base, move to next base, or is out. Triggering the logic computation in the Server event actions ensures it gets executed at the proper time, in sync with the rest of the SimEvents network.
Downstream, we collect all the retired players in a server with a capacity of 3. When the server is full, we flush it, empty the bases, and move to the next half-inning.
With that done, we only need to use the Number of Entities arrived statistics of Entity Terminators to count the players who have been able to make it to home plate as a run.The Season
Baseball’s regular season has 162 games across 30 teams. We wanted to run a Monte Carlo simulation with 1000 iterations to get a probability distribution of the outcome, so we had to virtually play nearly 2.6 million baseball games. We applied parallel computing to this challenge and were able to complete those games in less time than a typical rendition of “Take me out to the ballgame”.
To simulate one game, I created a function that looks like:
It then becomes very easy to call this function inside a parfor loop to simulate the entire season using as much processing power as available:
By taking advantage of the Fast Restart feature, an entire season can be simulated in only a few seconds.
The Results
Here are our projected results for the National League champion compared well to the Vegas odds:
For the American League, we started to depart a bit from the odds maker’s projections:
In what was surely a strange statistical anomaly, the Cubs did not win the World Series in any of the 1000 simulations, despite winning the National League 34% of the time.
At this point, I asked Matt and Corey to try one more thing. As we all know, in 1994, the Montreal Expos were playing extremely well, and many predicted they were the team to beat in the playoffs. But tragically the World Series was cancelled by labor strife. So, we gave the Expos another chance in our virtual season by replacing the Washington Nationals with the Expos ’94 roster. The results were astounding.
Now it’s your turn
You might look at today’s date and think this was all a joke. But one thing is sure, you’d be foolish to underestimate Felipe Alou’s squad from 1994... and the power of the new SimEvents in R2016a!
- Category:
- Fun,
- Modeling,
- Simulation,
- What's new?
Comments
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.