Skip to content

Latest commit

 

History

History
27 lines (18 loc) · 1.51 KB

File metadata and controls

27 lines (18 loc) · 1.51 KB

Applied Machine Learning

Disease Prediction Project

Overview:

This machine learning project comes from the Applied Machine Learning course I took in Fall 2020.

Project Goal:

The goal is to predict whether or not a patient has a certain unspecified disease. This is a binary classification problem.

Dataset:

Provided by the professor the course, the training dataset has 49,000 rows and 12 columns. Methodology:

This analysis and report of two jupyter nootbooks all has below steps.

Data Preparation

I discussed the potential data quality issues I identified about the dataset and how I applied various data preprocessing techniques to cope with those issues and performed Exploratory Data Analysis (EDA). Whenever appropriate, I enhanced my EDA with the effective data visualization.

Build, tune and evaluate various machine learning algorithms

I applied a list of machine learning algorithms covered in the course to the training data and construct disease diagnosis models. I also performed extensive model experiments with hyper-parameters’ tuning.

The first jupyter notebook has NBC, KNN, linear SVM, non-linear SVM, Random Forest and Gradient Boosting Machine. The second jupyter notebook has Logistic Regression, Artificial Neural Network/Deep Learning and Decision Tree.

Prediction and Interpretation

After building the classification models, I applied them to the test dataset (Disease Prediction Testing.csv) provided to predict if each person in the testing dataset has the disease.