Machine Learning Homework

Credit Risk Resampling

Of the three aproaches, the oversampling models had the highest balanced accuracy scores, with naive oversampling the highest at 0.63.
In terms of recall, again, oversampling had the highest scores - this time SMOTE the highest with 0.67.
Oversampling was also highest for geometric mean, with naive oversampling the highest with a score of 0.63.

For the ensemble models, I was sure to scale my data before to achieve better results.

Of the two models, the Easy Ensemble AdaBoost Classifier had the higher balanced accuracy score of 0.945.
The Easy Ensemble Classifier also had the better recall, with a weighed average score of 0.94.
Geometric mean was not reported in the classification report, but the Easy Ensemble Classifier had the higher F1 score of 0.97 (weighted average).
For the balanced random forest model, the top three features were:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.DS_Store		.DS_Store
LoanStats_2019Q1.csv		LoanStats_2019Q1.csv
LoanStats_2019Q1.csv.zip		LoanStats_2019Q1.csv.zip
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb