Credit_Risk_Analysis

Purpose

The purpose of this analysis is to use Machine Learning to determine risks of applicates from a data set from LendingClub. This project is classified as “Supervied Learning” because the data includes labeled outcomes. To complete the analysis, adjustments to balance the unbalanced classifications from the given data set were made for more accurate predictions for higher accuracy scores.

Tools/Machine Learning Algorithms

RandomOverSampler
SMOTE
ClusterCentroids
SMOTEENN
BalancedRandomForestClassifier
EasyEnsembleClassifier

Results

Originally, the dataset had over 100,000 loan applicants in Q1 2019. After using the loan status to determine if the application was considered "high" or "low" risk, the applicants who were classified as "current" or "loan status" were classified as "low risk", meaning the rest of the data was "high risk". By cleaning the data, this reduced the dataset to 68470 with nearly all applicants classified to "low risk"(99%).

OverSampling

RandomeOverSampler Model

RandomeOverSampler Model found a balanced accuracy score of 64%

The high risk precision rate was 1% with a recall of 66%, which gave this result of an F1 score of 2%.
The low risk had a precision of 100% and the recall was at 62%.

SMOTE (Synthetic Minority Oversampling Technique)

The SMOTE algorithm had a balanced accuracy score of 65.8% which is somewhat better than the previous model.

The high risk precision, again, was only 1% but the recall dropped slightly to 62%
The low risk still had a precision of 100% but improved the recall score to 69%.

UnderSampling

Cluster Centroids algorithm

This algorithms balanced score was lower than the oversamplings scores at 54.4%

The high risk precision rate was at 1% and the recall at 69%. The F1 score was 1%.
The low risk model had a precision rate of 100% with a low recall rate of 40% compared to the oversampling models.

Combination Sampling

SMOTEENN

(Synthetic Minority Oversampling Technique + Edited Nearest Neighbors) or SMOTEENN had a balanced accuracy score with was 64.8%

The high risk precision rate was 1% and the precision rate was a 72%, which brought the F1 score to 2%.
The low risk was still 100%, but with a recall at 57%.

Ensemble Classifiers to Predict Credit Risk

BalancedRandomForestClassifier

This algorithm brought the balanced accuracy score to 78.8%.

The high risk precision rate increased to 3% with the recall at 70% to give the F1 score of 6%.
The low risk still had a precision score of 100% but a high recall of 87%.

EasyEnsembleClassifier Model

This algorithm had the best algorithm score of 93.1%.

The high risk precision rate increased to 9% and the recall increased to 92% wiht the highest F1 score of 16%.
The low risk precision rate was still 100% but the recall jumped to 94%.

Summary

After reviewing the results, it was clear that the EasyEnsembleClassifier Model had the best results with an accuracy score of 93.1 and a precision rate of 9% when predicitng high risk applicants. The recall rate was also the highest at 92% for high risk applicants as well as low risk applicants, 94%. This model is clearly the best model to choose because it has the best algorithm to assess credit risks when lending to applicants.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit_Risk_Analysis

Purpose

Tools/Machine Learning Algorithms

Results

OverSampling

RandomeOverSampler Model

SMOTE (Synthetic Minority Oversampling Technique)

UnderSampling

Cluster Centroids algorithm

Combination Sampling

SMOTEENN

Ensemble Classifiers to Predict Credit Risk

BalancedRandomForestClassifier

EasyEnsembleClassifier Model

Summary

About

Releases

Packages

Languages

minut9/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit_Risk_Analysis

Purpose

Tools/Machine Learning Algorithms

Results

OverSampling

RandomeOverSampler Model

SMOTE (Synthetic Minority Oversampling Technique)

UnderSampling

Cluster Centroids algorithm

Combination Sampling

SMOTEENN

Ensemble Classifiers to Predict Credit Risk

BalancedRandomForestClassifier

EasyEnsembleClassifier Model

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages