Skip to content

Udacity Data Analyst Nanodegree Project 1 - Investigate the European Soccer Dataset and communicate findings

Notifications You must be signed in to change notification settings

OsyTheDataGuy/Investigate-the-European-Soccer-Dataset-DAND-Project-1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Soccer Database Analysis

by Nonso Udechukwu

Introduction to the Soccer Dataset

The soccer database contains data for over 25,000 football matches played from 2008 to 2016 by 299 European football clubs.

The dataset contains seven tables, namely:

  • Country (id, name)
  • League (id, country_id, name)
  • Match (id, country_id, league_id, home_team_goal, away_team_goal, and 100 other columns)
  • Player
  • Team
  • Player Attributes
  • Team Attributes

Libraries Used

  • Pandas: For storing and manipulating structured data. Pandas functionality is built on NumPy (upgrade to version 0.25.1)
  • Numpy: For multi-dimensional array, matrix data structures and, performing mathematical operations
  • Matplotlib: For all visualizations (including maps and graphs)
  • PandaSQL: For querying pandas DataFrame using SQL syntax

Project Methodology

The main steps for this project are as follows:

  • Data Wrangling:
    • Data Gathering
    • Data Assessment
    • Data Cleaning
  • Exploratory Analysis
  • Conclusions/Results

Key Insights

Based on the data and analysis carried out, I found that:

  1. The best teams in each of the top 5 leagues, from 2008 to 2015, were:

    • Based on matches won: Man Utd (EPL), Real Madrid and Barcelona (La Liga), Bayern Munich (Bundesliga), Juventus (Serie A), PSG (Ligue 1)
    • Based on goals scored: Chelsea, Real Madrid, Bayern, Juventus, and Paris Saint-Germain
  2. The teams that improved their goalscoring the most in each of the top 5 leagues (across the period) were Tottenham Hotspur, Real Madrid CF, Borussia Monchengladbach, Napoli, and Paris Saint-Germain.

  3. I found zero to no correlation between the team attributes and goals scored or matches won.

Limitations

Certain teams didn't have goal data in certain seasons.

About

Udacity Data Analyst Nanodegree Project 1 - Investigate the European Soccer Dataset and communicate findings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published