This project uses a Jupyter notebook hosted on Google Colab to manipulate and analyze a World Bank dataset, including various economic and environmental indicators. The notebook performs data loading, data validation, and initial exploration steps, making it suitable for data scientists or analysts interested in working with global development data. Colab Link
This project demonstrates fundamental data manipulation techniques with a dataset from the World Bank. Key tasks include:
- Importing necessary libraries and the dataset.
- Performing initial data validation and exploration.
- Using assertions to confirm dataset integrity and structure.
The dataset used in this project is hosted on GitHub and accessed via the following URL:
The dataset contains the following columns:
Country
- Name of the countryYear
- Year of observationGDP (USD)
- Gross Domestic Product in USDPopulation
- Population countLife Expectancy
- Average life expectancy at birthUnemployment Rate (%)
- Unemployment rate as a percentageCO2 Emissions (metric tons per capita)
- CO2 emissions per capitaAccess to Electricity (%)
- Percentage of population with access to electricity
Ensure you have the following Python packages installed:
numpy
pandas
seaborn
matplotlib
You can install these packages using:
make install
- Open the notebook in Google Colab for cloud-based execution.
- Run each cell sequentially to load data, validate its structure, and perform initial analysis.
Testing based on the tag named test_cell. In order to test other cell add tag 'test_cell'
make test_file
Linting
make lint
Format
make format
- Library Imports - Imports required libraries such as
numpy
,pandas
,seaborn
, andmatplotlib
. - Data Load and Overview - Loads the dataset from GitHub and gives an initial overview.
- Data Validation - Ensures that the dataset structure matches the expected format and that it contains the necessary columns and rows.
The notebook includes several assertions to confirm:
- Presence of required columns.
- Dataset dimensions (200 rows and 8 columns).