The code performs data preprocessing, machine learning model training, evaluation, and model saving for a binary classification problem on the divorce dataset.
Here's the step-by-step explanation of the code:
-
IMPORT LIBRARIES: Imports the necessary libraries. In this code, it imports the
pandas
andnumpy
libraries. -
Read Data: Reads the dataset. It reads the file
../dataset/divorce_data.csv
and assigns it to a variable namedunprocessed_data
. -
Info & head & describe: Displays the first few rows of the dataset and provides basic statistics.
-
EDA (Exploratory Data Analysis): Visualizes and explores the dataset. This includes creating count plots, distribution plots, and correlation analysis.
-
TRAIN-TEST-SPLIT & PREPROCESSING: Splits the dataset into training and testing sets. It separates the target variable (
Divorce
) from the features and performs any necessary preprocessing steps. -
EVALUATE FUNCTION: Defines a function to evaluate the performance of machine learning models. It computes metrics such as accuracy and F1-score.
-
MODELING: Fits various machine learning models to the training data and evaluates their performance on the test data. The models used include Logistic Regression, K-Nearest Neighbors, Random Forests, Naive Bayes, and an Artificial Neural Network (ANN).
-
SAVE MODELS: Saves the trained models to disk using the
joblib
library for Logistic Regression, KNN, Random Forests, and Naive Bayes, and theh5py
library for the ANN model. -
VISUALIZATION: Visualizes the performance of the models using bar plots.
Overall, this code performs data exploration, model training, evaluation, and model saving for a binary classification problem on the divorce dataset.