Diabetes in Pima Indian Women: understanding the problem and searching for answers through Data Analysis
Data Analysis Project on Pima Indian Women for the Data Analysis and Mining course 2018/2019. The purpose of the project is to try discover new information about the high rate of diabetes that occurred 50 years ago in the population of Pima Indian Women through data exploration and data analysis. The study is focused on 3 main Data Analysis techniques: Linear Regression analysis; Principal Component Analysis; Fuzzy Clustering with Anomalous Patterns analysis.
Data set source: https://www.kaggle.com/uciml/pima-indians-diabetes-database
The code requires the 'scikit-fuzzy' package to run, available at: https://scikit-fuzzy.readthedocs.io/en/latest/install.html
All of the code is available in Jupyter Notebooks and can be run on the Jupyter program. There's also an auxiliary file 'anomalous_cluster.py' that contains the implementation of the Anomalous Algorithm.