Comparison of Machine Learning Methods on a Sample of Bank Customers When Estimating Probability of Default Status
In this project, machine learning methods on the data set consisting of bank customers have been described. Anticipation of the client’s entry into the status of default towards a financial institution has been set as analysis target. Dataset’s attributes have been studied in detail with univariate, bivariate and multivariate analysis. Subsequently, machine learning models were built on selected variables. Models included are Logistic Regression, Decision Trees, Random forest and Naive Bayes.
Every model results in referent value of ROC curve (Receiver Operating Characteristic Curve). Then, every referent value is compared with each other. The most accurate predictive model is the one with the biggest referent value.
All data and results are visualised using R functions.