NAME: Michael H Bitar
DATE: Sep 5, 2022 / r0.3
PROJECT: Develop a machine learning solution to assist marketing executives at a client bank to understand which characteristics of potential customers are the best predictors of purchasing of one of the bank’s products. The client bank is interested in using a predictive model to score each potential customer’s propensity to purchase, as well as understand which customer characteristics are most important in driving purchasing behavior, in order to inform future marketing segmentation personalization.
METHODOLOGY: The provided dataset is imbalanced. I will be using various under/over sampling methodology to search for the best F1 Score.
RESULTS: Oversampling the minority class provided the highest F1 Score for both classes in target y.
REFERENCES: Content of this guide is based on authors listed in the references section at the end of this notebook. Thanks to all the authors for their valuable educational contributions.
# Import needed packages
import numpy as np
import pandas as pd
import tensorflow as tf
import sklearn as sk
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix , classification_report
import matplotlib.pyplot as plt
import seaborn as sns
from pandas.plotting import scatter_matrix
from pandas_profiling import ProfileReport
# set random seeds for reproducability
seed = 15
# disable warning messages when possible
import warnings
# display all dataframe columns
pd.set_option('display.max_columns', None)
# suppress sci notation for readability
# Make numpy values easier to read
np.set_printoptions(precision=3, suppress=True)
from tensorflow.keras import layers
from tensorflow import keras
from tensorflow_addons import losses
# load dataset over the web
url = ""
df = pd.read_csv(url,sep=";",error_bad_lines=False, engine ='python')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45211 entries, 0 to 45210
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 45211 non-null int64
1 job 45211 non-null object
2 marital 45211 non-null object
3 education 45211 non-null object
4 default 45211 non-null object
5 balance 45211 non-null int64
6 housing 45211 non-null object
7 loan 45211 non-null object
8 contact 45211 non-null object
9 day 45211 non-null int64
10 month 45211 non-null object
11 duration 45211 non-null int64
12 campaign 45211 non-null int64
13 pdays 45211 non-null int64
14 previous 45211 non-null int64
15 poutcome 45211 non-null object
16 y 45211 non-null object
dtypes: int64(7), object(10)
memory usage: 5.9+ MB
# # Profile the dataset
# profile = ProfileReport(df, title="Pandas Profiling Report")
# profile
# Generate scatter diagrams
scatter = scatter_matrix(df[['age','balance','duration','campaign']],figsize=(10,10))
plt.suptitle('Scatter Matrix of Age, Balance, Duration and Campaign')
# show any duplicates rows in dataframe
# show columns and their uique values
def print_unique_col_values(df):
for column in df:
if df[column].dtypes=='object':
print(f'{column}: {df[column].unique()}')
job: ['management' 'technician' 'entrepreneur' 'blue-collar' 'unknown'
'retired' 'admin.' 'services' 'self-employed' 'unemployed' 'housemaid'
marital: ['married' 'single' 'divorced']
education: ['tertiary' 'secondary' 'unknown' 'primary']
default: ['no' 'yes']
housing: ['yes' 'no']
loan: ['no' 'yes']
contact: ['unknown' 'cellular' 'telephone']
month: ['may' 'jun' 'jul' 'aug' 'oct' 'nov' 'dec' 'jan' 'feb' 'mar' 'apr' 'sep']
poutcome: ['unknown' 'failure' 'other' 'success']
y: ['no' 'yes']
# check value for uniquness of target label
array(['no', 'yes'], dtype=object)
# count each unique value for target label.
# We can see an imbalance in favor of "no" values for the traget label
no 39922
yes 5289
Name: y, dtype: int64
# apply nominal one-hot encoding to select columns
df = pd.get_dummies(data=df, columns=['job','education','marital','month','poutcome','contact'])
Index(['age', 'default', 'balance', 'housing', 'loan', 'day', 'duration',
'campaign', 'pdays', 'previous', 'y', 'job_admin.', 'job_blue-collar',
'job_entrepreneur', 'job_housemaid', 'job_management', 'job_retired',
'job_self-employed', 'job_services', 'job_student', 'job_technician',
'job_unemployed', 'job_unknown', 'education_primary',
'education_secondary', 'education_tertiary', 'education_unknown',
'marital_divorced', 'marital_married', 'marital_single', 'month_apr',
'month_aug', 'month_dec', 'month_feb', 'month_jan', 'month_jul',
'month_jun', 'month_mar', 'month_may', 'month_nov', 'month_oct',
'month_sep', 'poutcome_failure', 'poutcome_other', 'poutcome_success',
'poutcome_unknown', 'contact_cellular', 'contact_telephone',
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45211 entries, 0 to 45210
Data columns (total 49 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 45211 non-null int64
1 default 45211 non-null object
2 balance 45211 non-null int64
3 housing 45211 non-null object
4 loan 45211 non-null object
5 day 45211 non-null int64
6 duration 45211 non-null int64
7 campaign 45211 non-null int64
8 pdays 45211 non-null int64
9 previous 45211 non-null int64
10 y 45211 non-null object
11 job_admin. 45211 non-null uint8
12 job_blue-collar 45211 non-null uint8
13 job_entrepreneur 45211 non-null uint8
14 job_housemaid 45211 non-null uint8
15 job_management 45211 non-null uint8
16 job_retired 45211 non-null uint8
17 job_self-employed 45211 non-null uint8
18 job_services 45211 non-null uint8
19 job_student 45211 non-null uint8
20 job_technician 45211 non-null uint8
21 job_unemployed 45211 non-null uint8
22 job_unknown 45211 non-null uint8
23 education_primary 45211 non-null uint8
24 education_secondary 45211 non-null uint8
25 education_tertiary 45211 non-null uint8
26 education_unknown 45211 non-null uint8
27 marital_divorced 45211 non-null uint8
28 marital_married 45211 non-null uint8
29 marital_single 45211 non-null uint8
30 month_apr 45211 non-null uint8
31 month_aug 45211 non-null uint8
32 month_dec 45211 non-null uint8
33 month_feb 45211 non-null uint8
34 month_jan 45211 non-null uint8
35 month_jul 45211 non-null uint8
36 month_jun 45211 non-null uint8
37 month_mar 45211 non-null uint8
38 month_may 45211 non-null uint8
39 month_nov 45211 non-null uint8
40 month_oct 45211 non-null uint8
41 month_sep 45211 non-null uint8
42 poutcome_failure 45211 non-null uint8
43 poutcome_other 45211 non-null uint8
44 poutcome_success 45211 non-null uint8
45 poutcome_unknown 45211 non-null uint8
46 contact_cellular 45211 non-null uint8
47 contact_telephone 45211 non-null uint8
48 contact_unknown 45211 non-null uint8
dtypes: int64(7), object(4), uint8(38)
memory usage: 5.4+ MB
# apply binary encoding to select columns
df['housing'] = df['housing'].map({'yes': 1, 'no': 0})
df['loan'] = df['loan'].map({'yes': 1, 'no': 0})
df['default'] = df['default'].map({'yes': 1, 'no': 0})
df['y'] = df['y'].map({'yes': 1, 'no': 0})
age | default | balance | housing | loan | day | duration | campaign | pdays | previous | y | job_admin. | job_blue-collar | job_entrepreneur | job_housemaid | job_management | job_retired | job_self-employed | job_services | job_student | job_technician | job_unemployed | job_unknown | education_primary | education_secondary | education_tertiary | education_unknown | marital_divorced | marital_married | marital_single | month_apr | month_aug | month_dec | month_feb | month_jan | month_jul | month_jun | month_mar | month_may | month_nov | month_oct | month_sep | poutcome_failure | poutcome_other | poutcome_success | poutcome_unknown | contact_cellular | contact_telephone | contact_unknown | |
19134 | 38 | 0 | 202 | 0 | 0 | 5 | 1311 | 2 | -1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
2906 | 49 | 0 | 387 | 1 | 0 | 14 | 347 | 1 | -1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
6083 | 51 | 0 | 51 | 0 | 0 | 27 | 121 | 2 | -1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
22974 | 52 | 0 | 60 | 0 | 1 | 26 | 196 | 2 | -1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
21472 | 58 | 0 | 2180 | 0 | 0 | 18 | 273 | 4 | -1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
12657 | 27 | 0 | 35 | 0 | 0 | 4 | 255 | 1 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
34582 | 39 | 0 | 782 | 1 | 0 | 5 | 542 | 4 | 47 | 11 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
10749 | 56 | 0 | 21664 | 0 | 0 | 17 | 214 | 8 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
41068 | 26 | 0 | 535 | 0 | 0 | 14 | 125 | 1 | -1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
5847 | 31 | 0 | -452 | 1 | 0 | 26 | 182 | 5 | -1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
32866 | 35 | 0 | 520 | 1 | 0 | 17 | 476 | 1 | -1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
39582 | 34 | 0 | 23076 | 0 | 0 | 26 | 243 | 2 | 35 | 6 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
9398 | 58 | 0 | 1657 | 0 | 0 | 6 | 172 | 2 | -1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
40101 | 27 | 0 | 538 | 0 | 0 | 4 | 124 | 2 | -1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
31400 | 24 | 0 | 1250 | 0 | 0 | 25 | 411 | 2 | -1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
7868 | 35 | 0 | 2708 | 1 | 0 | 30 | 178 | 2 | -1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
308 | 44 | 0 | 129 | 1 | 1 | 5 | 189 | 2 | -1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
18300 | 46 | 0 | 272 | 0 | 0 | 31 | 149 | 4 | -1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
5307 | 34 | 0 | 2144 | 1 | 0 | 23 | 157 | 1 | -1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
4807 | 35 | 0 | -15 | 1 | 0 | 21 | 554 | 3 | -1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
# normalize select data
columns_to_scale = ['age','balance','day','duration','campaign','pdays','previous']
scaler = MinMaxScaler()
df[columns_to_scale] = scaler.fit_transform(df[columns_to_scale])
age | default | balance | housing | loan | day | duration | campaign | pdays | previous | y | job_admin. | job_blue-collar | job_entrepreneur | job_housemaid | job_management | job_retired | job_self-employed | job_services | job_student | job_technician | job_unemployed | job_unknown | education_primary | education_secondary | education_tertiary | education_unknown | marital_divorced | marital_married | marital_single | month_apr | month_aug | month_dec | month_feb | month_jan | month_jul | month_jun | month_mar | month_may | month_nov | month_oct | month_sep | poutcome_failure | poutcome_other | poutcome_success | poutcome_unknown | contact_cellular | contact_telephone | contact_unknown | |
0 | 0.519481 | 0 | 0.092259 | 1 | 0 | 0.133333 | 0.053070 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
1 | 0.337662 | 0 | 0.073067 | 1 | 0 | 0.133333 | 0.030704 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
2 | 0.194805 | 0 | 0.072822 | 1 | 1 | 0.133333 | 0.015453 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
3 | 0.376623 | 0 | 0.086476 | 1 | 0 | 0.133333 | 0.018707 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
4 | 0.194805 | 0 | 0.072812 | 0 | 0 | 0.133333 | 0.040260 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
45206 | 0.428571 | 0 | 0.080293 | 0 | 0 | 0.533333 | 0.198658 | 0.032258 | 0.000000 | 0.000000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
45207 | 0.688312 | 0 | 0.088501 | 0 | 0 | 0.533333 | 0.092721 | 0.016129 | 0.000000 | 0.000000 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
45208 | 0.701299 | 0 | 0.124689 | 0 | 0 | 0.533333 | 0.229158 | 0.064516 | 0.212156 | 0.010909 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
45209 | 0.506494 | 0 | 0.078868 | 0 | 0 | 0.533333 | 0.103294 | 0.048387 | 0.000000 | 0.000000 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
45210 | 0.246753 | 0 | 0.099777 | 0 | 0 | 0.533333 | 0.073404 | 0.016129 | 0.216743 | 0.040000 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
45211 rows Ă— 49 columns
for col in df:
print(f'{col}: {df[col].unique()}')
age: [0.519 0.338 0.195 0.377 0.221 0.13 0.312 0.325 0.299 0.143 0.455 0.506
0.429 0.351 0.545 0.494 0.182 0.091 0.286 0.273 0.442 0.364 0.234 0.403
0.532 0.247 0.416 0.468 0.481 0.39 0.078 0.26 0.169 0.156 0.117 0.208
0.065 0.104 0.558 0.052 0.039 0.026 0.623 0.571 0.844 0.74 0.636 0.675
0.61 0.649 0.597 0.662 0.701 0.688 0.013 0.753 0.87 0.584 0.935 0.831
0.714 0.727 0.779 0.805 0.987 0.792 0.766 0.883 1. 0.818 0. 0.922
0.857 0.896 0.961 0.974 0.909]
default: [0 1]
balance: [0.092 0.073 0.073 ... 0.147 0.202 0.221]
housing: [1 0]
loan: [0 1]
day: [0.133 0.167 0.2 0.233 0.267 0.367 0.4 0.433 0.467 0.5 0.6 0.633
0.667 0.733 0.833 0.867 0.9 0.933 0.967 0.033 0.067 0.1 0.333 0.533
0.567 0.767 0.8 0. 0.3 0.7 1. ]
duration: [0.053 0.031 0.015 ... 0.264 0.253 0.316]
campaign: [0. 0.016 0.032 0.065 0.048 0.081 0.097 0.113 0.129 0.145 0.161 0.177
0.194 0.29 0.21 0.371 0.242 0.5 0.274 0.339 0.226 0.258 0.387 0.323
0.677 0.806 1. 0.645 0.403 0.435 0.871 0.79 0.597 0.355 0.306 0.452
0.484 0.581 0.468 0.726 0.419 0.919 0.516 0.548 0.532 0.565 0.613 0.694]
pdays: [0. 0.174 0.192 0.106 0.1 0.165 0.17 0.103 0.162 0.203 0.117 0.201
0.196 0.193 0.225 0.19 0.149 0.217 0.226 0.198 0.136 0.138 0.12 0.197
0.135 0.189 0.153 0.151 0.142 0.183 0.214 0.128 0.133 0.134 0.2 0.205
0.127 0.175 0.111 0.119 0.173 0.202 0.222 0.209 0.213 0.178 0.167 0.159
0.146 0.208 0.126 0.182 0.194 0.112 0.21 0.147 0.15 0.224 0.144 0.122
0.118 0.031 0.206 0.033 0.211 0.179 0.13 0.139 0.158 0.143 0.216 0.219
0.131 0.187 0.155 0.195 0.218 0.01 0.166 0.22 0.212 0.204 0.007 0.115
0.154 0.108 0.107 0.013 0.116 0.18 0.228 0.123 0.177 0.169 0.148 0.009
0.14 0.185 0.124 0.104 0.032 0.227 0.157 0.161 0.141 0.181 0.172 0.156
0.036 0.132 0.114 0.221 0.188 0.04 0.11 0.163 0.037 0.229 0.109 0.125
0.034 0.308 0.284 0.291 0.26 0.281 0.275 0.282 0.235 0.266 0.274 0.297
0.265 0.292 0.305 0.083 0.257 0.283 0.288 0.306 0.276 0.236 0.3 0.298
0.278 0.299 0.269 0.289 0.259 0.186 0.273 0.302 0.286 0.294 0.253 0.261
0.237 0.258 0.287 0.271 0.263 0.303 0.003 0.311 0.267 0.29 0.239 0.231
0.31 0.268 0.295 0.314 0.313 0.279 0.304 0.24 0.247 0.256 0.312 0.234
0.255 0.233 0.249 0.232 0.296 0.264 0.242 0.25 0.087 0.245 0.085 0.088
0.307 0.243 0.248 0.089 0.272 0.095 0.008 0.241 0.315 0.002 0.28 0.244
0.317 0.093 0.318 0.011 0.321 0.015 0.322 0.102 0.319 0.099 0.097 0.252
0.029 0.025 0.325 0.048 0.338 0.057 0.378 0.353 0.349 0.381 0.354 0.345
0.075 0.361 0.33 0.38 0.382 0.347 0.372 0.366 0.383 0.07 0.375 0.385
0.36 0.359 0.351 0.374 0.376 0.386 0.356 0.377 0.37 0.046 0.364 0.336
0.339 0.357 0.352 0.368 0.365 0.333 0.067 0.369 0.164 0.39 0.346 0.362
0.388 0.384 0.391 0.367 0.021 0.086 0.171 0.392 0.344 0.396 0.393 0.373
0.397 0.398 0.35 0.323 0.394 0.389 0.017 0.399 0.018 0.335 0.4 0.401
0.328 0.403 0.327 0.03 0.326 0.32 0.094 0.006 0.101 0.096 0.092 0.081
0.016 0.337 0.044 0.091 0.073 0.026 0.341 0.408 0.077 0.023 0.041 0.414
0.411 0.407 0.404 0.416 0.412 0.42 0.343 0.329 0.419 0.417 0.055 0.415
0.331 0.421 0.409 0.405 0.413 0.342 0.422 0.406 0.423 0.049 0.334 0.078
0.427 0.425 0.424 0.058 0.042 0.429 0.43 0.428 0.358 0.431 0.435 0.069
0.436 0.047 0.022 0.05 0.024 0.08 0.045 0.443 0.065 0.064 0.052 0.45
0.084 0.448 0.038 0.072 0.459 0.452 0.076 0.433 0.454 0.446 0.447 0.444
0.071 0.474 0.466 0.499 0.453 0.439 0.528 0.506 0.456 0.44 0.079 0.53
0.531 0.532 0.485 0.06 0.525 0.494 0.508 0.463 0.522 0.492 0.451 0.471
0.461 0.545 0.546 0.548 0.549 0.063 0.547 0.437 0.55 0.053 0.054 0.569
0.068 0.056 0.595 0.061 0.592 0.597 0.587 0.616 0.445 0.251 0.039 0.625
0.5 0.501 0.638 0.498 0.513 0.641 0.539 0.708 0.644 0.635 0.442 0.68
0.537 0.672 0.552 0.484 0.766 0.719 0.49 0.683 0.438 0.432 0.744 0.599
0.519 0.516 0.727 0.458 0.062 0.529 0.769 0.633 0.476 0.64 0.789 0.464
0.748 0.788 0.489 0.579 0.664 0.774 0.478 0.673 0.472 0.868 0.517 0.856
0.591 0.479 0.487 0.891 0.455 0.784 0.608 0.505 0.477 0.524 0.468 0.526
0.611 0.553 0.908 0.805 0.61 0.909 0.475 0.511 0.615 0.9 0.482 0.523
0.564 0.495 0.623 0.54 0.542 0.823 0.502 0.005 0.898 0.836 0.951 0.602
0.646 0.874 0.565 0.89 0.665 0.567 0.533 0.873 0.536 0.534 0.753 0.954
0.563 0.497 0.752 0.491 0.86 0.962 0.883 0.674 0.893 0.981 0.894 0.976
0.885 0.682 0.967 0.677 0.693 0.556 0.562 0.558 0.47 0.51 0.781 0.928
0.557 0.578 0.792 0.886 0.889 0.604 0.483 0.607 0.575 0.948 0.923 0.584
0.628 0.924 0.622 0.624 1. 0.632 0.609]
previous: [0. 0.011 0.004 0.015 0.007 0.04 0.058 0.022 0.018 0.036 0.044 0.025
0.065 0.033 0.076 0.029 0.051 0.055 0.095 0.135 0.047 0.091 0.073 0.098
0.062 0.084 0.138 0.105 0.087 0.185 1. 0.08 0.069 0.109 0.211 0.102
0.116 0.145 0.2 0.127 0.149]
y: [0 1]
job_admin.: [0 1]
job_blue-collar: [0 1]
job_entrepreneur: [0 1]
job_housemaid: [0 1]
job_management: [1 0]
job_retired: [0 1]
job_self-employed: [0 1]
job_services: [0 1]
job_student: [0 1]
job_technician: [0 1]
job_unemployed: [0 1]
job_unknown: [0 1]
education_primary: [0 1]
education_secondary: [0 1]
education_tertiary: [1 0]
education_unknown: [0 1]
marital_divorced: [0 1]
marital_married: [1 0]
marital_single: [0 1]
month_apr: [0 1]
month_aug: [0 1]
month_dec: [0 1]
month_feb: [0 1]
month_jan: [0 1]
month_jul: [0 1]
month_jun: [0 1]
month_mar: [0 1]
month_may: [1 0]
month_nov: [0 1]
month_oct: [0 1]
month_sep: [0 1]
poutcome_failure: [0 1]
poutcome_other: [0 1]
poutcome_success: [0 1]
poutcome_unknown: [1 0]
contact_cellular: [0 1]
contact_telephone: [0 1]
contact_unknown: [1 0]
df.y, df.y.unique()
(0 0
1 0
2 0
3 0
4 0
45206 1
45207 1
45208 1
45209 0
45210 0
Name: y, Length: 45211, dtype: int64,
array([0, 1]))
# split target y from dataframe X
y = df.y
X = df.drop('y',axis='columns')
0 0
1 0
2 0
3 0
4 0
45206 1
45207 1
45208 1
45209 0
45210 0
Name: y, Length: 45211, dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45211 entries, 0 to 45210
Data columns (total 48 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 45211 non-null float64
1 default 45211 non-null int64
2 balance 45211 non-null float64
3 housing 45211 non-null int64
4 loan 45211 non-null int64
5 day 45211 non-null float64
6 duration 45211 non-null float64
7 campaign 45211 non-null float64
8 pdays 45211 non-null float64
9 previous 45211 non-null float64
10 job_admin. 45211 non-null uint8
11 job_blue-collar 45211 non-null uint8
12 job_entrepreneur 45211 non-null uint8
13 job_housemaid 45211 non-null uint8
14 job_management 45211 non-null uint8
15 job_retired 45211 non-null uint8
16 job_self-employed 45211 non-null uint8
17 job_services 45211 non-null uint8
18 job_student 45211 non-null uint8
19 job_technician 45211 non-null uint8
20 job_unemployed 45211 non-null uint8
21 job_unknown 45211 non-null uint8
22 education_primary 45211 non-null uint8
23 education_secondary 45211 non-null uint8
24 education_tertiary 45211 non-null uint8
25 education_unknown 45211 non-null uint8
26 marital_divorced 45211 non-null uint8
27 marital_married 45211 non-null uint8
28 marital_single 45211 non-null uint8
29 month_apr 45211 non-null uint8
30 month_aug 45211 non-null uint8
31 month_dec 45211 non-null uint8
32 month_feb 45211 non-null uint8
33 month_jan 45211 non-null uint8
34 month_jul 45211 non-null uint8
35 month_jun 45211 non-null uint8
36 month_mar 45211 non-null uint8
37 month_may 45211 non-null uint8
38 month_nov 45211 non-null uint8
39 month_oct 45211 non-null uint8
40 month_sep 45211 non-null uint8
41 poutcome_failure 45211 non-null uint8
42 poutcome_other 45211 non-null uint8
43 poutcome_success 45211 non-null uint8
44 poutcome_unknown 45211 non-null uint8
45 contact_cellular 45211 non-null uint8
46 contact_telephone 45211 non-null uint8
47 contact_unknown 45211 non-null uint8
dtypes: float64(7), int64(3), uint8(38)
memory usage: 5.1 MB
# prep traning and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=15, stratify=y)
X_train.shape, X_test.shape
((36168, 48), (9043, 48))
y_train.shape, y_test.shape
((36168,), (9043,))
y_train.value_counts(), y.value_counts()
(0 31937
1 4231
Name: y, dtype: int64,
0 39922
1 5289
Name: y, dtype: int64)
age | default | balance | housing | loan | day | duration | campaign | pdays | previous | job_admin. | job_blue-collar | job_entrepreneur | job_housemaid | job_management | job_retired | job_self-employed | job_services | job_student | job_technician | job_unemployed | job_unknown | education_primary | education_secondary | education_tertiary | education_unknown | marital_divorced | marital_married | marital_single | month_apr | month_aug | month_dec | month_feb | month_jan | month_jul | month_jun | month_mar | month_may | month_nov | month_oct | month_sep | poutcome_failure | poutcome_other | poutcome_success | poutcome_unknown | contact_cellular | contact_telephone | contact_unknown | |
33460 | 0.194805 | 0 | 0.134603 | 0 | 0 | 0.633333 | 0.033144 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
43272 | 0.103896 | 0 | 0.106668 | 1 | 0 | 0.333333 | 0.018910 | 0.000000 | 0.113532 | 0.014545 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
5529 | 0.519481 | 0 | 0.072803 | 1 | 0 | 0.733333 | 0.028060 | 0.080645 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
11861 | 0.337662 | 0 | 0.075082 | 0 | 0 | 0.633333 | 0.002847 | 0.080645 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
29271 | 0.142857 | 0 | 0.075718 | 1 | 1 | 0.033333 | 0.020740 | 0.016129 | 0.000000 | 0.000000 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
33020 | 0.311688 | 0 | 0.087992 | 1 | 0 | 0.533333 | 0.129321 | 0.032258 | 0.000000 | 0.000000 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
16965 | 0.532468 | 0 | 0.071151 | 0 | 1 | 0.800000 | 0.031110 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
16975 | 0.272727 | 0 | 0.086349 | 1 | 0 | 0.800000 | 0.224888 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
17911 | 0.233766 | 0 | 0.113486 | 0 | 0 | 0.966667 | 0.020130 | 0.016129 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
29558 | 0.194805 | 0 | 0.088455 | 0 | 0 | 0.066667 | 0.051240 | 0.080645 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
age | default | balance | housing | loan | day | duration | campaign | pdays | previous | job_admin. | job_blue-collar | job_entrepreneur | job_housemaid | job_management | job_retired | job_self-employed | job_services | job_student | job_technician | job_unemployed | job_unknown | education_primary | education_secondary | education_tertiary | education_unknown | marital_divorced | marital_married | marital_single | month_apr | month_aug | month_dec | month_feb | month_jan | month_jul | month_jun | month_mar | month_may | month_nov | month_oct | month_sep | poutcome_failure | poutcome_other | poutcome_success | poutcome_unknown | contact_cellular | contact_telephone | contact_unknown | |
0 | 0.519481 | 0 | 0.092259 | 1 | 0 | 0.133333 | 0.053070 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
1 | 0.337662 | 0 | 0.073067 | 1 | 0 | 0.133333 | 0.030704 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
2 | 0.194805 | 0 | 0.072822 | 1 | 1 | 0.133333 | 0.015453 | 0.0 | 0.0 | 0.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
3 | 0.376623 | 0 | 0.086476 | 1 | 0 | 0.133333 | 0.018707 | 0.0 | 0.0 | 0.0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
4 | 0.194805 | 0 | 0.072812 | 0 | 0 | 0.133333 | 0.040260 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
5 | 0.220779 | 0 | 0.074901 | 1 | 0 | 0.133333 | 0.028264 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
6 | 0.129870 | 0 | 0.076862 | 1 | 1 | 0.133333 | 0.044124 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
7 | 0.311688 | 1 | 0.072822 | 1 | 0 | 0.133333 | 0.077267 | 0.0 | 0.0 | 0.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
8 | 0.519481 | 0 | 0.073902 | 1 | 0 | 0.133333 | 0.010167 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
9 | 0.324675 | 0 | 0.078187 | 1 | 0 | 0.133333 | 0.011183 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
# count columnsn to determine TF model input shape
# save original df
df_orig = df.copy()
# build tensorflow layers
def ANN(X_train, y_train, X_test, y_test, loss, weights):
model = keras.Sequential([
keras.layers.Dense(48, input_dim=48, activation='relu'),
keras.layers.Dense(24, activation='relu'),
keras.layers.Dense(1, activation='sigmoid')
model.compile(optimizer='adam', loss=loss, metrics=['accuracy'])
if weights == -1:, y_train, epochs=50)
else:, y_train, epochs=50, class_weight = weights)
print(model.evaluate(X_test, y_test))
y_preds = model.predict(X_test)
y_preds = np.round(y_preds)
print("Classification Report: \n", classification_report(y_test, y_preds))
return y_preds
y_preds = ANN(X_train, y_train, X_test, y_test, 'binary_crossentropy', -1)
Metal device set to: Apple M1
Epoch 1/50
1131/1131 [==============================] - 6s 5ms/step - loss: 0.2855 - accuracy: 0.8910
1131/1131 [==============================] - 6s 5ms/step - loss: 0.2855 - accuracy: 0.8910
1131/1131 [==============================] - 5s 4ms/step - loss: 0.1627 - accuracy: 0.9288
283/283 [==============================] - 1s 3ms/step - loss: 0.2238 - accuracy: 0.9016
[0.2238490879535675, 0.9015812873840332]
[0.2238490879535675, 0.9015812873840332]
283/283 [==============================] - 0s 1ms/step
Classification Report:
Classification Report:
precision recall f1-score support
0 0.93 0.96 0.94 7985
1 0.60 0.49 0.54 1058
accuracy 0.90 9043
macro avg 0.77 0.72 0.74 9043
weighted avg 0.89 0.90 0.90 9043
df = df_orig
count_class_0, count_class_1 = df.y.value_counts()
# Divide by class
df_class_0 = df[df['y'] == 0]
df_class_1 = df[df['y'] == 1]
# Undersample 0 class in y and combine DataFrames of both class 0 and 1
df_class_0_under = df_class_0.sample(count_class_1)
df_test_under = pd.concat([df_class_0_under, df_class_1], axis=0)
print('Random undersampling:')
Random undersampling:
0 5289
1 5289
Name: y, dtype: int64
X = df_test_under.drop('y',axis='columns')
y = df_test_under['y']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=15, stratify=y)
# Number of classes in training Data
0 4231
1 4231
Name: y, dtype: int64
y_preds = ANN(X_train, y_train, X_test, y_test, 'binary_crossentropy', -1)
Epoch 1/50
265/265 [==============================] - 1s 5ms/step - loss: 0.6106 - accuracy: 0.6615
67/67 [==============================] - 0s 3ms/step - loss: 0.3852 - accuracy: 0.8521
[0.38524705171585083, 0.8520793914794922]
[0.38524705171585083, 0.8520793914794922]
67/67 [==============================] - 0s 1ms/step
Classification Report:
Classification Report:
precision recall f1-score support
0 0.88 0.82 0.85 1058
1 0.83 0.89 0.86 1058
accuracy 0.85 2116
macro avg 0.85 0.85 0.85 2116
weighted avg 0.85 0.85 0.85 2116
2022-09-05 11:34:37.686646: I tensorflow/core/grappler/optimizers/] Plugin optimizer for device_type GPU is enabled.
# The F1 score results are much better than training with default dataset.
df = df_orig
# Oversample 1-class and concat the DataFrames of both classes
df_class_1_over = df_class_1.sample(count_class_0, replace=True)
df_test_over = pd.concat([df_class_0, df_class_1_over], axis=0)
print('Random over-sampling:')
Random over-sampling:
0 39922
1 39922
Name: y, dtype: int64
# split X and y
X = df_test_over.drop('y',axis='columns')
y = df_test_over['y']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=15, stratify=y)
# Number of classes in training Data
1 31938
0 31937
Name: y, dtype: int64
loss = keras.losses.BinaryCrossentropy()
weights = -1
y_preds = ANN(X_train, y_train, X_test, y_test, 'binary_crossentropy', -1)
Epoch 1/50
1997/1997 [==============================] - 9s 4ms/step - loss: 0.4215 - accuracy: 0.8104
500/500 [==============================] - 2s 3ms/step - loss: 0.2614 - accuracy: 0.9022
[0.2613811194896698, 0.9022481441497803]
[0.2613811194896698, 0.9022481441497803]
500/500 [==============================] - 1s 1ms/step
Classification Report:
Classification Report:
precision recall f1-score support
0 0.94 0.86 0.90 7985
1 0.87 0.94 0.91 7984
accuracy 0.90 15969
macro avg 0.90 0.90 0.90 15969
weighted avg 0.90 0.90 0.90 15969
# The F1 score results are better than training with either default dataset and with undersmapling.
df = df_orig
0 39922
1 5289
Name: y, dtype: int64
# split into three undersampled sets
# 0 39922
# 1 5289
# model1 --> class1(5289) + class0(0, 5289)
# model2 --> class1(5289) + class0(5290, 10579)
# model3 --> class1(5289) + class0(10579, 15868)
# rebuild X y split
X = df.drop('y',axis='columns')
y = df['y']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=15, stratify=y)
0 31937
1 4231
Name: y, dtype: int64
df = X_train.copy()
df['y'] = y_train
age | default | balance | housing | loan | day | duration | campaign | pdays | previous | job_admin. | job_blue-collar | job_entrepreneur | job_housemaid | job_management | job_retired | job_self-employed | job_services | job_student | job_technician | job_unemployed | job_unknown | education_primary | education_secondary | education_tertiary | education_unknown | marital_divorced | marital_married | marital_single | month_apr | month_aug | month_dec | month_feb | month_jan | month_jul | month_jun | month_mar | month_may | month_nov | month_oct | month_sep | poutcome_failure | poutcome_other | poutcome_success | poutcome_unknown | contact_cellular | contact_telephone | contact_unknown | y | |
33460 | 0.194805 | 0 | 0.134603 | 0 | 0 | 0.633333 | 0.033144 | 0.000000 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
43272 | 0.103896 | 0 | 0.106668 | 1 | 0 | 0.333333 | 0.018910 | 0.000000 | 0.113532 | 0.014545 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
5529 | 0.519481 | 0 | 0.072803 | 1 | 0 | 0.733333 | 0.028060 | 0.080645 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
11861 | 0.337662 | 0 | 0.075082 | 0 | 0 | 0.633333 | 0.002847 | 0.080645 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
29271 | 0.142857 | 0 | 0.075718 | 1 | 1 | 0.033333 | 0.020740 | 0.016129 | 0.000000 | 0.000000 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
df_class0 = df[df.y==0]
df_class1 = df[df.y==1]
def get_train_batch(df_majority, df_minority, start, end):
df_train = pd.concat([df_majority[start:end], df_minority], axis=0)
X_train = df_train.drop('y', axis='columns')
y_train = df_train.y
return X_train, y_train
# train the first model
X_train, y_train = get_train_batch(df_class0, df_class1, 0, 5289)
y_pred1 = ANN(X_train, y_train, X_test, y_test, 'binary_crossentropy', -1)
Epoch 1/50
298/298 [==============================] - 2s 5ms/step - loss: 0.5743 - accuracy: 0.7086
283/283 [==============================] - 1s 3ms/step - loss: 0.4620 - accuracy: 0.7975
[0.4620147943496704, 0.7975229024887085]
[0.4620147943496704, 0.7975229024887085]
283/283 [==============================] - 0s 1ms/step
Classification Report:
Classification Report:
precision recall f1-score support
0 0.99 0.78 0.87 7985
1 0.36 0.91 0.51 1058
accuracy 0.80 9043
macro avg 0.67 0.85 0.69 9043
weighted avg 0.91 0.80 0.83 9043
# train the second model
X_train, y_train = get_train_batch(df_class0, df_class1, 5290, 10579)
y_pred2 = ANN(X_train, y_train, X_test, y_test, 'binary_crossentropy', -1)
Epoch 1/50
Epoch 1/50
298/298 [==============================] - 2s 5ms/step - loss: 0.5872 - accuracy: 0.6886
283/283 [==============================] - 1s 3ms/step - loss: 0.4857 - accuracy: 0.7945
[0.48568180203437805, 0.7945371866226196]
[0.48568180203437805, 0.7945371866226196]
283/283 [==============================] - 0s 1ms/step
Classification Report:
Classification Report:
precision recall f1-score support
0 0.98 0.78 0.87 7985
1 0.35 0.91 0.51 1058
accuracy 0.79 9043
macro avg 0.67 0.84 0.69 9043
weighted avg 0.91 0.79 0.83 9043
# train the third model
X_train, y_train = get_train_batch(df_class0, df_class1, 10579, 15868)
y_pred3 = ANN(X_train, y_train, X_test, y_test, 'binary_crossentropy', -1)
Epoch 1/50
Epoch 1/50
298/298 [==============================] - 2s 5ms/step - loss: 0.5789 - accuracy: 0.7042
283/283 [==============================] - 1s 3ms/step - loss: 0.4903 - accuracy: 0.7897
[0.49025654792785645, 0.7896715402603149]
[0.49025654792785645, 0.7896715402603149]
283/283 [==============================] - 0s 1ms/step
Classification Report:
Classification Report:
precision recall f1-score support
0 0.99 0.77 0.87 7985
1 0.35 0.92 0.50 1058
accuracy 0.79 9043
macro avg 0.67 0.84 0.69 9043
weighted avg 0.91 0.79 0.82 9043
# combine the three training results
y_pred_final = y_pred1.copy()
for i in range(len(y_pred1)):
n_ones = y_pred1[i] + y_pred2[i] + y_pred3[i]
if n_ones>1:
y_pred_final[i] = 1
y_pred_final[i] = 0
cl_rep = classification_report(y_test, y_pred_final)
precision recall f1-score support
0 0.99 0.78 0.87 7985
1 0.36 0.93 0.52 1058
accuracy 0.80 9043
macro avg 0.67 0.86 0.70 9043
weighted avg 0.91 0.80 0.83 9043
# The F1 Score for both classes is lower than oversampling and undersampling
- Add early stopping for training
- Explore binning of some features
I am grateful for the amazing content by posted by Kaggle developers. I have relied on content and tips from the following resources to develop this notebook.