Acc : 99.61 | ham F1 score : 99.77 | spam F1 score : 98.65 for SMS Spam Collection Dataset

This repo is the original code of a Kaggle notebook for SMS Spam Collection Dataset.

What you can additionally do on this code;

manage arguments, callbacks, and other various features of pl.Trainer with a single yaml file
configure all the hyperparameters of an experiment with a single yaml file
check the past config through logs of hydra
trace model training by wandb

Result

	F1 score	Accuracy
ham	99.77	-
spam	98.65	-
total	99.61	99.61

The model is tested on 517 samples selected as 10% of the entire dataset. The validation dataset is randomly sampled each time the training runs and it does not involve in the model's learning.

As an objective loss, I adopted focal loss to deal with positive/negative imbalance. Instead of following the paper, I implemented it in the form of multi-class classification, which makes training more stable. In addition, for better stability of training, I adopted regularization by AdamW, warmup start, and linear-decreasing lr scheduler.

Usage

To train the model on this repo, run the command line;

python train.py —config-name exp_0

If you want to test the checkpoints, enter checkpoints file path on exp_0.yaml and run the command line;

python test.py —config-name exp_0

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.idea		.idea
checkpoints		checkpoints
config		config
data		data
dataset		dataset
log		log
src		src
.DS_Store		.DS_Store
EDA.py		EDA.py
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Acc : 99.61 | ham F1 score : 99.77 | spam F1 score : 98.65 for SMS Spam Collection Dataset

Result

Usage

License

About

Releases

Languages

License

Espresso-AI/bert-sms-spam-classification

Folders and files

Latest commit

History

Repository files navigation

Acc : 99.61 | ham F1 score : 99.77 | spam F1 score : 98.65 for SMS Spam Collection Dataset

Result

Usage

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages