image source : https://eelawcentre.org.za/wp-content/uploads/000_9ew8rr.jpeg
- Project Overview
- Dataset Description
- Objective
- Project Steps
- Data Preprocessing
- Modeling
- How to Run the Project
- Real-World Applications
- Visualizations and Model Comparison
- Conclusion
- Future Work
- Contributing
- License
- Contact Information
This project focuses on analyzing and predicting systemic crises across 13 African countries between 1860 and 2014. The aim is to build a machine learning model that can predict the emergence of a systemic crisis based on various economic indicators such as inflation rates, exchange rates, debt defaults, and more.
The dataset includes information on banking, financial, inflation, and systemic crises from 1860 to 2014 in the following African countries: Algeria, Angola, Central African Republic, Ivory Coast, Egypt, Kenya, Mauritius, Morocco, Nigeria, South Africa, Tunisia, Zambia, and Zimbabwe.
- country_number: Numeric country identifier
- country_code: ISO code of the country
- country: Name of the country
- year: Year of observation
- systemic_crisis: Indicates whether a systemic crisis occurred (1: Yes, 0: No)
- exch_usd: Exchange rate against USD
- domestic_debt_in_default: Domestic debt in default
- sovereign_external_debt_default: External debt default
- gdp_weighted_default: GDP-weighted default rate
- inflation_annual_cpi: Annual inflation rate
- independence: Whether the country was independent in that year
- currency_crises: Occurrence of a currency crisis (1: Yes, 0: No)
- inflation_crises: Occurrence of an inflation crisis (1: Yes, 0: No)
- banking_crisis: Whether a banking crisis occurred (1: Yes, 0: No)
Kaggle - African Economic Crises Data Analysis (https://www.kaggle.com/code/ezzaldin6/african-economic-crises-data-analysis)
This dataset focuses on the Banking, Debt, Financial, Inflation and Systemic Crises that occurred, from 1860 to 2014, in 13 African countries, including: Algeria, Angola, Central African Republic, Ivory Coast, Egypt, Kenya, Mauritius, Morocco, Nigeria, South Africa, Tunisia, Zambia and Zimbabwe
https://drive.google.com/file/d/1fTQ9R29kgAhInFO0HMqvkcAfSZWg6fCx/view
The main goal of this project is to predict the likelihood of a systemic crisis in any of the 13 African countries using various economic features from the dataset.
- Data Importation & Exploration: Load the data and understand its structure using Pandas Profiling.
- Data Preprocessing: Handle missing values, duplicates, outliers, and encode categorical variables.
- Model Selection & Training: Train machine learning models using the preprocessed data.
- Model Evaluation: Evaluate models based on accuracy, precision, recall, and F1-score.
- Model Improvement Techniques: Use feature selection, hyperparameter tuning, and cross-validation to improve model performance.
No missing values were detected in the dataset.
Duplicates were checked and removed using df.drop_duplicates()
.
Outliers in exch_usd (exchange rate) and gdp_weighted_default were detected using box plots and scatter plots. Winsorization and Z-score techniques were used to handle extreme values while preserving data integrity.
Label Encoding was applied to categorical columns like country_code, country, and banking_crisis using LabelEncoder
.
The data was split into training and testing sets:
- Features: All columns except
systemic_crisis
. - Target:
systemic_crisis
.
The split was 80% for training and 20% for testing.
We selected RandomForestClassifier as the initial model due to its robustness in handling various feature types and its ability to manage class imbalance.
- Accuracy: 96.34%
- Precision, Recall, and F1-scores were evaluated for both positive and negative classes, with good performance noted across the board, especially in detecting systemic crises.
RandomForest-based feature importance was used to select the most relevant features, improving model interpretability.
We used GridSearchCV to fine-tune hyperparameters of the Random Forest model to find the best parameters.
StratifiedKFold cross-validation was used to maintain class balance during training and validation.
SMOTE (Synthetic Minority Oversampling Technique) was applied to address class imbalance, especially for under-represented classes.
We also experimented with other models:
- Logistic Regression
- Support Vector Machines (SVM)
RandomForest outperformed both models in terms of accuracy and handling class imbalance.
- Clone the repository and install the required dependencies using
requirements.txt
. - Download the dataset from Kaggle.
- Run the main notebook or script to execute the data analysis and model training.
- Review the output for performance metrics, visualizations, and predictions.
- Early Warning Systems: Governments can anticipate crises using these models, adjusting policies proactively.
- Policy Decision-Making: Central banks may adjust inflation targeting or currency controls based on model insights.
- Investment Risk Assessment: Investors can make informed decisions based on the predicted likelihood of financial crises.
- Credit Rating Agencies: Credit assessments can be improved by factoring in these predictions.
- International Aid Allocation: Organizations like the IMF can allocate resources proactively to countries at risk.
Screenshot of summary statistics.
Screenshot showing the most relevant features for prediction.
Visualize classification performance using confusion matrices.
Summary of model accuracy for RandomForest, Logistic Regression, and SVM.
- The RandomForest model achieved a high accuracy of 96.34% and performed well across precision, recall, and F1-score metrics.
- Class imbalance remains a challenge, but techniques like SMOTE helped improve the model’s ability to detect minority-class instances.
- The predictive model offers valuable insights for early warning systems and economic policy decisions in African countries.This project has real-world implications for policy-making, risk assessment, and economic stability in African countries.
- Further optimize hyperparameters to improve model performance.
- Experiment with advanced techniques for handling class imbalance, such as undersampling or custom loss functions.
- Explore neural network models for potentially better predictions.
We welcome contributions to improve BODYBALANCE.AI. To contribute:
- Fork the repository.
- Create a feature branch (
git checkout -b feature-name
). - Commit your changes (
git commit -m 'Add some feature'
). - Push to the branch (
git push origin feature-name
). - Open a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
For any inquiries or support related to BODYBALANCE.AI, please contact:
Clifford Nwanna
Email: nwannachumaclifford@gmail.com