Skip to content

Fedesgh/Building_Credit_Risk_Classifier_Using_Bagging_Kneighbors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Motivation

The motivation for this repository are the difficulties that the dataset present when we define the Target and Features. One of the problems involve several data leakages.

There are several attempts in kaggle with low metrics particularly when we restrict the training set to features with information before the loan was granted and we want try to improve it:

https://www.kaggle.com/datasets/devanshi23/loan-data-2007-2014/data

We use various data preprocces techniques like SelectKbest with information value, Binning , Up-sampling with Imlearn, One Hot Encoder and Imputers

Problems at defining the target

loan_status (our target) has the followings values:

  1. Current
  2. Fully Paid
  3. Charged Off
  4. Late (31-120 days)
  5. In Grace Period
  6. Does not meet the credit policy. Status:Fully Paid
  7. Late (16-30 days)
  8. Default
  9. Does not meet the credit policy. Status:Charged Off

The main point we must consider is that the values belong to differents moments in the loan life span.

Those that belong to an end of the Loan:

  1. Fully Paid
  2. Charged Off
  3. Does not meet the credit policy. Status:Fully Paid
  4. Default
  5. Does not meet the credit policy. Status:Charged Off

Middle term of a loan:

  1. Current
  2. Late (31-120 days)
  3. Late (16-30 days)

while In Grace Period belongs to the beginning.

On top of this we should consider:

All the loans regardless its end, were previously in time "In Period Grace"

All the loans regardless its end, were previously in time Current and/or Late

Our target

"Good loans" (1):

  1. Fully Paid

"Bad loans" (0):

  1. Charged Off
  2. Does not meet the credit policy. Status:Fully Paid
  3. Default
  4. Does not meet the credit policy. Status:Charged Off

We just consider ends of loans categorys in the target, and we should consider only features in X_train set that belong before the loan was granted.

Result metrics.

result.jpg

About

Problem statment about modeling target vector and attempt to improve metrics

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published