This began as my final project for CS50 and has evolved into a research project I'm currently writing a paper on. V1.0 had a maximum accuracy of 57.2% and V2.0 had a maximum accuracy of 58.05%, compared to student averages of 55-56% with a logistic regression model. My target accuracyc is in the 59-60%, bringing in data from the past decade worth of seasons.
The main difference between V1-2.0 and V3.0 is 3.0 introduces the pybaseball package to improve data accuracy. While I built a fairly extensive play-by-play interpreter in prior versions, there were still issues with some data accuracy. The hope of using a 3rd party package is to improve accuracy while giving more specific and detailed information, such as that provided by Statcast (Baseball Savant).
TBD - STILL IN DEVELOPMENT
Disclaimer: The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".