According to WHO, diabetes affects around 422 million people globally. This disease is directly responsible for 1.5 million fatalities per year. Diabetes detection at an early stage can lower the risk of a person developing devastating complications.
This project aims to predict the chance of a patient having diabetes based on a certain set of symptoms.
This data was collected from Sylhet Diabetes Hospital of Sylhet, Bangladesh. Direct questionnaires were distributed to patients who have just become diabetic or who are still non-diabetic but have a few or more symptoms.
(Islam et al, 2020)
- The dataset contains 520 records and 17 columns including the target variable
- There are no missing values in the dataset
- Age is the only numerical column in the data
- The average age of the respondents is about 48 years
- The youngest respondent is 16 years old while the oldest respondent is 90 years of age
- More than 60 percent of the cases were diabetic
- Over 60 percent of the respondents were male patients
The categories were binary encoded (0 and 1)
A model that detects as many diabetic cases as possible should be of interest to us. As a result, recall was employed to select the preferred model.
Linear and Tree-based models were trained on the preprocessed data. The ensemble models produced encouraging results, with ExtraTreesRegressor correctly classifying 98.75 percent of the diabetic cases
Polyuria (excessive peeing), Polydipsia (extreme thirst), and gender are the most important early indicators of diabetes.
The chosen model was used to build a tool that predicts the risk level (in percentage) of diabetes in a person
Click Here to run the prediction tool
-
Islam, MM Faniqul, et al. 'Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques.' Computer Vision and Machine Intelligence in Medical Image Analysis. Springer, Singapore, 2020. 113-125.
-
WHO (World Health Organization), Diabetes[online]. Available at: https://www.who.int/health-topics/diabetes [Accessed 2nd February 2022].