Multilingual cyber abuse detection using advanced transformer architecture (presented at IEEE TENCON 2019)
This repo presents the source code for training and pre-processing code-mixed text used in our paper:
Aditya Malte, Pratik Ratadiya, "Multilingual cyber abuse detection using advanced transformer architecture", IEEE TENCON 2019
TRAC-1 code-mixed dataset for detection of cyber abuse
BERT(Base/Large/Multi), XLNet, various hyperparameters
demojization, transliteration, normalization and so on.
- State-of-the-art performance on Hindi dataset
- Excellent performance (top-5) on English dataset
Colaboratory Notebooks to be added soon