This project involves analyzing greenhouse gas emissions data to understand emissions trends across different countries and years. The analysis includes data cleaning, visualization of emissions by country and year, and forecasting future emissions using a Random Forest model.
- Pandas: For data manipulation and cleaning.
- Matplotlib: For creating visualizations.
- Scikit-learn: For building and evaluating the Random Forest model.
- NumPy: For numerical operations.
- Janitor: For data cleaning utilities.
.
├── data/
│ └── data.csv # Input data file with greenhouse gas emissions data
├── figures/
│ ├── country.jpg # Bar plot of emissions by country
│ ├── yearly.jpg # Line plot of yearly emissions
│ ├── country-yearly.jpg # Bar plot of emissions by countries over years
│ └── forecast.jpg # Line plot of forecasted emissions
├── scripts/
│ └── analysis.py # Main script for data analysis
├── requirements.txt # Required Python packages
├── README.md # This readme file
├── data-description.md # Description of the dataset
└── methodology.md # Methodology used in the analysis
└── results.md # Results and findings from the analysis
To set up the environment for this project, install the required Python packages using:
pip install requirements.txt
-
Prepare the Data: Ensure that the dataset is placed in the
data
folder and nameddata.csv
. -
Run the Analysis: Execute the main script to perform the data analysis:
python scripts/analysis.py
The script performs the following tasks:
-
Data Loading and Cleaning: Loads the dataset and performs initial cleaning operations, including renaming columns and converting units.
-
Data Visualization:
- Emissions by Country: Generates a bar plot to show total emissions for each country.
- Yearly Emissions: Creates a line plot to visualize the trend of emissions over the years.
- Yearly Emissions by Country: Produces a bar plot showing emissions from the top countries over time.
-
Predictive Modeling: Uses a Random Forest model to predict future greenhouse gas emissions and visualize the forecast.
-
The dataset is loaded and cleaned by renaming columns and adjusting units. The year is converted to a period index to handle time spans effectively.
-
Emissions by Country: A bar plot is generated to illustrate the total greenhouse gas emissions by country, providing insight into which countries have the highest contributions.
-
Yearly Emissions: A line plot shows the trend of total emissions over the years, helping identify patterns or changes in emission levels.
-
Yearly Emissions by Country: A bar plot visualizes emissions by the top countries across different years, highlighting significant contributors and trends.
A Random Forest model is used to predict future emissions. The model is trained on historical data, and the Mean Absolute Percentage Error (MAPE) is calculated to evaluate its performance. Forecasts for future years are visualized to provide insight into potential future emission levels.
The analysis yields several key insights, including:
- The top countries contributing to greenhouse gas emissions.
- Trends in emissions over the years.
- Forecasts of future emissions based on historical data.
Contributions are welcome! Please fork the repository and submit pull requests with improvements or additional features.
This project is licensed under the MIT License - see the LICENSE.md file for details.
The conversation continues on Kaggle