Predict loan eligibility using IBM Watson Studio

Loans are the core business of loan companies. The main profit comes directly from the loan’s interest. The loan companies grant a loan after an intensive process of verification and validation. However, they still don’t have assurance if the applicant is able to repay the loan with no difficulties.

In this tutorial, we’ll build a predictive model to predict if an applicant is able to repay the lending company or not. We will prepare the data using Jupyter Notebook and then build the model using SPSS Modeler.

Learning objectives

After completing this tutorial, you’ll understand how to:

Add and prepare your data
Build a machine learning model
Save the model

Prerequisites

In order to complete this tutorial, you will need the following:

IBM Cloud account.
Object Storage Service.
Watson Studio Service.
Machine Learning Service.

Estimated time

The overall time of reading and following this tutorial is approximately one hour.

Architecture Diagram

Steps

Dataset

The dataset is from Analytics Vidhya

The format of the data:

Variable Description
Loan_ID Unique Loan ID
Gender Male/ Female
Married Applicant married (Y/N)
Dependents Number of dependents
Education Applicant Education (Graduate/ Under Graduate)
Self_Employed Self employed (Y/N)
ApplicantIncome Applicant income
CoapplicantIncome Coapplicant income
LoanAmount Loan amount in thousands
Loan_Amount_Term Term of loan in months
Credit_History Credit history meets guidelines
Property_Area Urban/ Semi Urban/ Rural
Loan_Status Loan approved (Y/N)

Step 1. Create a project in Watson Studio

From Watson Studio main page, click on New project. Choose Complete to get the full functionalities. Once you enter your project name, click on Create.

Step 2. Upload the dataset to Watson Studio

Open Find and add data on the right-side panel, drag and drop the dataset (.csv file) from your computer to that area.

Step 3. Create SPSS modeler flow

On the same Assets page, scroll down to Modeler flows.
Click the (+) New flow icon.
Under the ‘New’ tab, name your modeler ‘Loan Eligibility Predictive model’.
Click Create.

Step 4. Add and prepare data

Add data to the canvas using the Data Asset node.
Double click on the node and click Change Data Asset to open the Asset Browser. Select train.csv then click OK and Save.

Let’s look into the summary statistics of our data using the Data Audit node.

Drag and drop the Data Audit node, and connect it with the Data Asset node. After running the node you can see your audit report on right side panel.

We can see that some columns have missing values. Let’s remove the rows that have null values using the Select node.

Drag and drop the Select node, connect it with the Data Asset node and right click on it and open the node.
Select discard mode and provide the below condition to remove rows with null values.

(@NULL(Gender) or @NULL(Married) or @NULL(Dependents) or @NULL(Self_Employed) or @NULL(LoanAmount) or @NULL(Loan_Amount_Term) or @NULL(Credit_History))

Now our data is clean, and we can proceed with building the model.

Step 5. Configure variables type

Drag and Drop the Type node to configure variables type, from Field Operations palette.
Double click the node or right click to open it.
Choose Configure Types to read the metadata.
Change the Role from the drop down menu of [Loan_Status] from input to output.
Change the Role drop down menu of [LoanID] from none to Record ID.
Click Save.

Step 6. Build a machine learning model

The model predicts the loan eligibility of two classes (Either Y:Yes or N:No). Thus, the choice of algorithms fell into Bayesian networks since it’s known to give good results for predicting classification problems.

Split the Data into training and testing sets using the Partition node, from Field Operations palette.
Double click the Partition node to customize the partition size into 80:20, change the ratio in the Training Partition to 80 and Testing Partition to 20.

Drag and drop the Bayes Net node from the Modeling Palette.
Double click the node to change the settings. Check Use custom field roles to assign Loan_Status as the target and all the remaining attributes as input except Partition and Loan_ID. When you finish, click Save.

Run your Bayesian Network node, then you’ll see your model in an orange colored node.

Step 7. View the model

Right click on the orange colored node, then click on View.
Now you can see the Network Graph and other model information here.

Step 8. Evaluate the performance of the model

Drag and drop the Analysis node from the Output section, and connect it with the model.
After running the node, you can see your analysis report on the right side panel.

The analysis report shows we have achieved 82.3% accuracy on our test data set with this model. At the end, you can build more models within the same canvas until you get the result you want.

Step 9. Save the Model

Right-click on the Bayes Net node and select Save branch as a model. Enter a name for the model. A machine learning service should be added automatically if you already created one. Click on Save.

In the Asset page under Watson Machine Learning models you can access your saved model, where you can deploy it later.

Notebook Details

Please click here for Jupyter Notebook Tutorial

Summary

In this tutorial, you learned how to create a complete predictive model, from importing the data, preparing the data, to training and saving the model. You also learned how to use SPSS Modeler and export the model to Watson Machine Learning models

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
Dataset		Dataset
Output		Output
doc		doc
.gitignore		.gitignore
Loan Eligibility.str		Loan Eligibility.str
LoanEligibility.jpg		LoanEligibility.jpg
README.md		README.md
loan-eligibility.ipynb		loan-eligibility.ipynb
loan-eligibility.md		loan-eligibility.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predict loan eligibility using IBM Watson Studio

Contents

Learning objectives

Prerequisites

Estimated time

Architecture Diagram