Codes and Reports file related to Project Work
-
Classifying Diabetes Cases in a Health Dataset | R and Python
Classified diabetes cases using machine learning techniques (Logistic Regression, Decision Tree, Random Forest, KNN) on a dataset with 21 features. Performed undersampling to handle imbalanced data and achieved 74 % Test Accuracy and 83 % precision in Random Forest. Extracted insights from health-related data and identified key features related to diabetes. The joint work was done on R. But since we couldn't use SMOTE in R ( was taking a lot time), I personally implimented it in Python and also done the whole work in Python. The main difference of the Python and R code is how the imbalanced data had beed handled. Both the methods are giving similar results. I would suggest to look theR code as it is more compact and detailed. Also the report is based on R code.
Link for the dataset: https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset
Github folder name: files