# MLB Opening Day: MATLAB, Data and Baseball!

It’s that time of year again. Baseball season gets underway this weekend. Ever since the book Moneyball: The Art of Winning an Unfair Game was published in 2004, there has been an increased focus on data analysis in this sport.

Engineers, engineering students, and mathematically-inclined baseball enthusiasts have used MATLAB in a number of baseball-related studies, from finding the sweet spot on a bat, to determining the best spot in a line-up for the top hitters. MATLAB has also been used to predict the success of a team, similar to the approach seen in Moneyball.

## Combining MATLAB with Baseball Data

With the increased availability of sensor technology for sports measurements, it’s no wonder we see mathematicians and engineers combining MATLAB with baseball. Sensors generate data… lots of data. There are a number of ways MATLAB has been used to analyze the resulting baseball data.

• Bryan Cole, featured writer for Beyond the Box Score, used MATLAB to quantify a relationship between consistency and hitter quality, measuring over 1,500 individual swings from 25 hitters. His goal was to enable young hitters to better measure their progress, and to provide a scouting tool for scouts and coaches to judge prospective players.

“Technological developments, including inertial bat sensors and camera-based ball tracking systems, should make it possible to develop a quantitative measure of consistency readily available to a wider range of players, with a wider range of abilities,” according to Bryan Cole.
• A project at MIT was designed to answer a specific baseball related question: When is stealing second base a beneficial move for the offense? For this project, MATLAB was used to simulate a large set of possible outcomes.

“I constructed a baseball game simulator in MATLAB. After verifying its accuracy to real-life MLB statistics, I simulated millions of baseball games to test the effects of different stolen base strategy to answer the question,” said David Hesslink, MIT student investigator.
• Neural networks can be employed analyze large sets of baseball data.  A neural network be trained to find solutions, recognize patterns, classify data and forecast future events. Bryan Cole used MATLAB to build two artificial neural networks to determine if umpires call strikes differently for different pitch types. The first used only the pitch’s location when it crossed the plate. The second included parameters related to break and movement as well as end speed. He notes an advantage to neural networks is that weights can be used to determine the relative importance of each feature.The conclusion? Pitch type had minimal impact on strike zone.

Last month, the Washington Post ran an article on the 2015 home run totals, calling it the “biggest home run surge since the steroids era”. The number of MLB home runs jumped by over 17% in the past season, the largest spike since 1996.

The Washington Post turned to Robert Vanderbei, a math professor at Princeton, to examine the odds of the offensive surge. Vanderbei used MATLAB to determine the odds of a 17% increase after downward trends in 2013 and 2014. What did his calculations find were the odds of such a spike?

“It said zero,” Vanderbei said. “Something definitely changed. I don’t know what, but something definitely, significantly changed.”

In that Washington Post article, a MLB executive credited the spike to data analysis:

“Teams are smarter, more information is available and there are philosophical shifts happening all over baseball. We have the tools to analyze everything and we are valuing things differently.

## Tools to analyze = MATLAB

Could MATLAB-based research and data analysis be responsible for the home run spike? It is possible!  Leave a comment if you’ve worked on a baseball-related project with MATLAB.

|