The uncertainty of sports continues to fascinate me, so with a go at statistical methods and experimentation, I figured I'd take my best educated guess at the team who might come out on top in 2022 based upon prior game history and team performance metrics.
Data: https://www.hockey-reference.com/
Each away and home team model are predicting the outcome of the Stanley Cup 2022 Final schedule based upon team stats and my metrics without historical win/loss outcomes. The most important variables based upon feature importance and permutation importance tests were home and away goals and goal differential.
The first iteration of the model consists of the ensemble of:
- Logistic Regression
- Non-Linear SVM
- Decision Trees
Stacking Model: Linear SVM
The final iteration of the model consists of the stack of:
- Multi-nomial Naive Bayes
- Logistic Regression
Stacking Model: Extreme Gradient Boosting Classifier (XGBoost)
Predictions were based upon unknown data within the Stanley Cup 2022 series, which consisted of averages by team, average duration of an NHL game, and average outcome in place of null values.
ACTUAL STANLEY CUP RECORD
date | away_team | home_team | winner |
2022-06-15 | Tampa Bay Lightning | Colorado Avalanche | Colorado Avalanche |
2022-06-18 | Tampa Bay Lightning | Colorado Avalanche | Colorado Avalanche |
2022-06-20 | Colorado Avalanche | Tampa Bay Lightning | Tampa Bay Lightning |
2022-06-22 | Colorado Avalanche | Tampa Bay Lightning | Colorado Avalanche |
2022-06-24 | Tampa Bay Lightning | Colorado Avalanche | Tampa Bay Lightning |
2022-06-26 | Colorado Avalanche | Tampa Bay Lightning | Colorado Avalanche |
See the project & code here 🏒