Machine Learning Approach to the Discovery of Disease Risk Factors on Individuals Assessed by the National Health and Nutrition Examination Surveys
The prediction of chronic disease risk factors is one of the most important and challenging problems in healthcare analytics. Accurate risk factor prediction helps developing strategies aimed at avoiding unnecessary interventions for patients and reducing costs for insurance companies and healthcare providers.
In this project, we train a gradient boosting classifier on publicly available data, to identify risk predictors for diabetes and cardiovascular disease for individuals residing in the United States. Our results suggest that researchers and clinicians could make use of machine learning analyses to obtain valuable health assessment of patients at a much reduced cost and time.
The present project constitutes original intellectual work licensed by GNU GPLv3. You are free to read and adapt the application to your needs provided you do not make exact copies of excerpts of this work as part of your reporting or product development. Any inquiries and suggestions for improvement are warmly welcomed.
To access the project report we use an IDE capable of editing and running Ipython notebooks. If Jupyter is installed in the python distribution type:
$ jupyter notebook disease-risk-factors.ipynb