Project title: Cloud-Hosted Notebook: Data Manipulation with World Bank Dataset

This project uses a Jupyter notebook hosted on Google Colab to manipulate and analyze a World Bank dataset, including various economic and environmental indicators. The notebook performs data loading, data validation, and initial exploration steps, making it suitable for data scientists or analysts interested in working with global development data. Colab Link

Project Overview

This project demonstrates fundamental data manipulation techniques with a dataset from the World Bank. Key tasks include:

Importing necessary libraries and the dataset.
Performing initial data validation and exploration.
Using assertions to confirm dataset integrity and structure.

Dataset

The dataset used in this project is hosted on GitHub and accessed via the following URL:

World Bank Dataset

Dataset Columns

The dataset contains the following columns:

Country - Name of the country
Year - Year of observation
GDP (USD) - Gross Domestic Product in USD
Population - Population count
Life Expectancy - Average life expectancy at birth
Unemployment Rate (%) - Unemployment rate as a percentage
CO2 Emissions (metric tons per capita) - CO2 emissions per capita
Access to Electricity (%) - Percentage of population with access to electricity

Getting Started

Prerequisites

Ensure you have the following Python packages installed:

numpy
pandas
seaborn
matplotlib

You can install these packages using:

make install

Running the Notebook

Open the notebook in Google Colab for cloud-based execution.
Run each cell sequentially to load data, validate its structure, and perform initial analysis.

Testing, Linting , Format

Testing based on the tag named test_cell. In order to test other cell add tag 'test_cell'

make test_file

Linting

make lint

Format

make format

Notebook Structure

Library Imports - Imports required libraries such as numpy, pandas, seaborn, and matplotlib.
Data Load and Overview - Loads the dataset from GitHub and gives an initial overview.
Data Validation - Ensures that the dataset structure matches the expected format and that it contains the necessary columns and rows.

Data Validation

The notebook includes several assertions to confirm:

Presence of required columns.
Dataset dimensions (200 rows and 8 columns).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
data		data
src		src
Makefile		Makefile
README.md		README.md
Ramil_IDS_706_Cloud_Hosted_Notebook_Data_Manipulation.ipynb		Ramil_IDS_706_Cloud_Hosted_Notebook_Data_Manipulation.ipynb
main.py		main.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
test_main.py		test_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project title: Cloud-Hosted Notebook: Data Manipulation with World Bank Dataset

Project Overview

Dataset

Dataset Columns

Getting Started

Prerequisites

Running the Notebook

Testing, Linting , Format

Notebook Structure

Data Validation

About

Releases

Packages

Languages

Ramil-cyber/Ramil-Cloud-Hosted-Notebook-Data-Manipulation

Folders and files

Latest commit

History

Repository files navigation

Project title: Cloud-Hosted Notebook: Data Manipulation with World Bank Dataset

Project Overview

Dataset

Dataset Columns

Getting Started

Prerequisites

Running the Notebook

Testing, Linting , Format

Notebook Structure

Data Validation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages