Overview

This project utilises clustering algorithms to categorise countries based on key socio-economic and health metrics. The goal was to identify and prioritise countries in dire need of funds for maximum impact. The dataset used was from kaggle.

Dependencies

Matlab 2020a: Main software for data processing and clustering.
Plot2LaTeX: Script used for generating figures and visual representations (requires Inkscape).
Clustering Algorithms: Algorithms sourced from the textbook Introduction to Pattern Recognition.

Methodology

The data underwent preprocessing where features were first grouped based on their inherent relationships into specific categories: health (child mortality, health, life expectancy, total fertility rate), trade (imports, exports), and finance (income, inflation, GDP). Features were then combined into broader categories, therefore, enhancing the comprehension of inter-feature relationships.

Following this grouping, the dataset was normalised using a min-max scheme to ensure all features contributed equally. A correlation matrix was constructed and visualised using a heatmap.

For clustering, the k-means algorithm was applied. The initialisation of the centroids was done using the rand_data_init.m function and iteratively refined clusters using the k_means.m function until convergence. The final result was the dataset partitioning into three distinct clusters, with each cluster showcasing its associated countries.

Key Findings

A strong correlation exists between low income and high child mortality, a known concept in economics and public health.
Countries exhibiting this correlation typically have underdeveloped economies and weaker healthcare infrastructures.
Through clustering, the report identifies specific groups of countries that might be more in need of development aid. For instance, countries in Cluster 1 are suggested to be in greater need.
Initial evidence indicates a potential for more in-depth clustering analysis. However, it's important to note that unsupervised analysis has its limitations, and drawing definitive conclusions can be challenging.

For a more detailed view of the analysis, and the conclusions thereof, have a closer look at the report.

Sample Output

Clustering visualisation:

Coutries within clusters:

           Cluster 1	           Cluster 2	           Cluster 3
          ----------	          ----------	          ----------
             Albania	 Antigua and Barbuda	             Belarus
              Belize	              Bhutan	             Algeria
              Angola	             Armenia	          Azerbaijan
             Bahrain	         Afghanistan	           Argentina
           Australia	             Austria	             Bahamas
                   .	                   .	                   .
                   .	                   .	                   .
                   .	                   .	                   .
               Tonga	             Tunisia	             Ukraine
             Vanuatu	             Vietnam	United Arab Emirates
          Uzbekistan	           Venezuela	               Yemen
              Zambia	         Timor-Leste	              Uganda
      United Kingdom	       United States	             Uruguay

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Overview

Dependencies

Methodology

Key Findings

Sample Output

Files

README.md

Latest commit

History

README.md

File metadata and controls

Overview

Dependencies

Methodology

Key Findings

Sample Output