This repository contains a machine learning model based on the Random Forest algorithm for forecasting English Premier League match outcomes. The model utilizes various features, including rolling averages and dynamic retraining, to enhance precision in predicting both home and away outcomes.
The project is organized into different components:
-
Data Collection and Preprocessing: The
data
folder contains scripts for collecting and preprocessing the EPL match data. To scrape the data, run thescrape.py
script. Otherwise, use thematches.csv
file directly to train and test the model by running theprediction_model.ipynb
file. -
Feature Engineering: The
features
folder includes tools for creating relevant features for the machine learning model. This involves calculating rolling averages and other dynamic features that capture the teams' recent performances. -
Model Training: The
models
folder contains the main machine learning model implemented using the Random Forest algorithm. -
Prediction: The
predict
folder provides utilities for making predictions on new match data using the trained model. -
Evaluation: The
evaluation
folder includes scripts and notebooks for evaluating the model's performance on historical data. The evaluation process helps fine-tune the model and understand its strengths and weaknesses. -
Dynamic Retraining: The
dynamic_retraining
folder contains scripts for implementing dynamic retraining. This involves updating the model periodically with new data to ensure that it stays relevant and accurate over time.
- Python 3.x
- Required Python packages are listed in the
requirements.txt
file. Install them using:
pip install -r requirements.txt
-
Data Collection and Preprocessing:
- To scrape EPL match data, run the
scrape.py
script in thedata
folder. - Alternatively, use the pre-existing
matches.csv
file to train and test the model.
- To scrape EPL match data, run the
-
Feature Engineering:
- Utilize tools in the
features
folder to generate relevant features for the machine learning model.
- Utilize tools in the
-
Model Training:
- Train the Random Forest model using the implementation in the
models
folder.
- Train the Random Forest model using the implementation in the
-
Prediction:
- Use utilities in the
predict
folder for making predictions for a specific set of matches.
- Use utilities in the
-
Evaluation:
- Utilize scripts and notebooks in the
evaluation
folder to assess the model's performance on historical data.
- Utilize scripts and notebooks in the
-
Dynamic Retraining:
- Periodically implement dynamic retraining strategies from the
dynamic_retraining
folder to update the model with new data.
- Periodically implement dynamic retraining strategies from the
If you would like to contribute to this project, please follow the standard GitHub flow: fork the repository, create a branch, make your changes, and submit a pull request.