Customer churn is the percentage of customers that stopped using company's product or service during a certain time frame. In this case, bank’s customer retention through customer churn prediction is important to prevent the loss of market share, have a sustainable revenue and profitability. The problem framing for this project is given the attributes of a bank customer, predict whether a customer will leave the bank (churn) or not. This is a binary classification machine learning problem.
The dataset is obtained through Kaggle Churn Modelling, which is an open and free dataset uploaded by Shruti_lyyer. Before this, it was published by SuperDataScience Team as template dataset to train the Artificial Neural Networks (ANN). It can be downloaded from https://www.kaggle.com/datasets/shrutimechlearn/churn-modelling.
The project implemented by:
- Data preparation: Perform exploratory analysis on the data, data cleaning and filtering, prepare a training, validation and test set
- Comparing machine learning algorithms: Comparing 5 ML classifiers; K Nearest Neighbour (KNN), Decision Tree, Naïve Bayes, Logistic Regression and Linear Support Vector Machine (SVM)
- Selecting features: Explore various feature selection and dimensionality reduction methods
- Ensemble learning: Examine if ensemble learning methods can improve the model performance
- Varying training sample size: Observe the performance when varying the data sample size