Tennis is very popular sport which is enjoyed and worshiped by fans from all over the world. There are 4 major tournaments known as the grand slam tournaments namely the Wimbledon, Australian Open, US Open and French Open. It is usually played by players on three different types of surfaces (Clay, Hard,Grass). Tennis is an extremely unpredictable sport where each player has a unique style and technique which makes the game even more interesting and challenging to predict the winner. Today, machine learning is used in many sports such as soccer, cricket, baseball, tennis etc. As we know data is everywhere and tennis is defined by data, and machine learning techniques are already making waves in the field of tennis not only for professional players but also for coaches, fans and potential bidders.The statistical analysis has helped to remodel the game of tennis by diving deep in to the insights of the game and predicting the results with great accuracy. This has not just increased the efficiency of the betting markets but also helped players and coaches to get better understanding about the game.
The objective of this project is to analyze last 20 years of data and answer the following questions:
- Find the top 10 players over the years across all grand slam tournaments based on their average rankings and which country do they hail from?
- Find the number of right or left hand players in the dataset
- Find the longest match ever played in the 20 years across the four grand slams?
- Number of aces, double faults, break points faced and saved across tournaments in last 20 years
- Correlation between various variables and winning
- Trajectory of best player over the years, which is Roger Federer
In addition to this analysis, the project is aimed to develop predictive models and understand what are the key factors that impact the winning or losing of tennis player based on their past performance. The following classfication algorithms will be used to develop the models:
- Logistic Regression
- KNN
- Naive Bayes
- Decision Tree Classifier
- Random Forest Classifier