Skip to content

A Random Forest-based machine learning model forecasting English Premier League match outcomes, enhancing precision through rolling averages and dynamic retraining for combined home and away predictions.

Notifications You must be signed in to change notification settings

soham2002/Premier-League-Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

English Premier League Match Outcome Forecasting with Random Forest

Overview

This repository contains a machine learning model based on the Random Forest algorithm for forecasting English Premier League match outcomes. The model utilizes various features, including rolling averages and dynamic retraining, to enhance precision in predicting both home and away outcomes.

Project Structure

The project is organized into different components:

  1. Data Collection and Preprocessing: The data folder contains scripts for collecting and preprocessing the EPL match data. To scrape the data, run the scrape.py script. Otherwise, use the matches.csv file directly to train and test the model by running the prediction_model.ipynb file.

  2. Feature Engineering: The features folder includes tools for creating relevant features for the machine learning model. This involves calculating rolling averages and other dynamic features that capture the teams' recent performances.

  3. Model Training: The models folder contains the main machine learning model implemented using the Random Forest algorithm.

  4. Prediction: The predict folder provides utilities for making predictions on new match data using the trained model.

  5. Evaluation: The evaluation folder includes scripts and notebooks for evaluating the model's performance on historical data. The evaluation process helps fine-tune the model and understand its strengths and weaknesses.

  6. Dynamic Retraining: The dynamic_retraining folder contains scripts for implementing dynamic retraining. This involves updating the model periodically with new data to ensure that it stays relevant and accurate over time.

Requirements

  • Python 3.x
  • Required Python packages are listed in the requirements.txt file. Install them using:
pip install -r requirements.txt

Usage

  1. Data Collection and Preprocessing:

    • To scrape EPL match data, run the scrape.py script in the data folder.
    • Alternatively, use the pre-existing matches.csv file to train and test the model.
  2. Feature Engineering:

    • Utilize tools in the features folder to generate relevant features for the machine learning model.
  3. Model Training:

    • Train the Random Forest model using the implementation in the models folder.
  4. Prediction:

    • Use utilities in the predict folder for making predictions for a specific set of matches.
  5. Evaluation:

    • Utilize scripts and notebooks in the evaluation folder to assess the model's performance on historical data.
  6. Dynamic Retraining:

    • Periodically implement dynamic retraining strategies from the dynamic_retraining folder to update the model with new data.

Contributing

If you would like to contribute to this project, please follow the standard GitHub flow: fork the repository, create a branch, make your changes, and submit a pull request.

About

A Random Forest-based machine learning model forecasting English Premier League match outcomes, enhancing precision through rolling averages and dynamic retraining for combined home and away predictions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published