Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis
Sentiment analysis is the process of identifying and categorizing people’s emotions or opinions regarding various topics. Analyzing political sentiment is critical for understanding the complexities of public opinion processes, especially during election seasons. It gives significant information on voter preferences, attitudes, and current trends. In this study, we investigate political sentiment analysis during Bangladeshi elections, specifically examining how effectively Pre-trained Language Models (PLMs) and Large Language Models (LLMs) capture complex sentiment characteristics. Our study centers on the creation of the "Motamot" dataset, comprising 7,058 instances annotated with positive and negative sentiments, sourced from diverse online newspaper portals, forming a comprehensive resource for political sentiment analysis. We meticulously evaluate the performance of various PLMs including BanglaBERT, Bangla BERT Base, XLM-RoBERTa, mBERT, and sahajBERT, alongside LLMs such as Gemini 1.5 Pro and GPT 3.5 Turbo. Moreover, we explore zero-shot and few-shot learning strategies to enhance our understanding of political sentiment analysis methodologies. Our findings underscore BanglaBERT’s commendable accuracy of 88.10% among PLMs. However, the exploration into LLMs reveals even more promising results. Through the adept application of Few-Shot learning techniques, Gemini 1.5 Pro achieves an impressive accuracy of 96.33%, surpassing the remarkable performance of GPT 3.5 Turbo, which stands at 94%. This underscores Gemini 1.5 Pro’s status as the superior performer in this comparison.
Explore our research on Bengali Political Sentiment Analysis to understand the nuances of political discourse in the Bengali language. Access the full paper here.
We called the dataset "Motamot" in Bengali (মতামত) and in English (Opinion). It was meticulously compiled from a range of online newspapers focusing on political events and conversations during Bangladeshi elections. Our data collection process involved scraping articles and opinion pieces from reputable news sources, ensuring a diverse and representative sample of political discourse. "Motamot" gives a broad look into the many opinions and conversations that shape Bangladesh's political environment. The dataset can be accessed from here.
Train | Test | Validation | |
---|---|---|---|
Total | 5647 | 706 | 705 |
Positive | 3306 | 413 | 413 |
Negative | 2341 | 293 | 292 |
Positive | Negative | |
---|---|---|
Count | 3306 | 2341 |
Positive | Negative | |
---|---|---|
Count | 413 | 293 |
Positive | Negative | |
---|---|---|
Count | 413 | 292 |
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
BanglaBERT | 0.8204 | 0.8222 | 0.8204 | 0.8203 |
Bangla BERT Base | 0.6803 | 0.6907 | 0.6812 | 0.6833 |
DistilBERT | 0.6320 | 0.6358 | 0.6320 | 0.6317 |
mBERT | 0.6427 | 0.6496 | 0.6428 | 0.6153 |
sahajBERT | 0.6708 | 0.6791 | 0.6709 | 0.6707 |
LLMs | Metric | Zero-shot | 5-shot | 10-shot | 15-shot |
---|---|---|---|---|---|
GPT 3.5 Turbo | Accuracy | 0.8500 | 0.8900 | 0.9133 | 0.9400 |
Precision | 0.8467 | 0.8867 | 0.9200 | 0.9467 | |
Recall | 0.8533 | 0.8926 | 0.9079 | 0.9342 | |
F1-Score | 0.8495 | 0.8896 | 0.9139 | 0.9404 | |
Gemini 1.5 Pro | Accuracy | 0.8608 | 0.8981 | 0.9200 | 0.9633 |
Precision | 0.8931 | 0.8846 | 0.9333 | 0.9667 | |
Recall | 0.8477 | 0.9205 | 0.9091 | 0.9603 | |
F1-Score | 0.8698 | 0.9022 | 0.9211 | 0.9635 |
For any questions, collaboration opportunities, or further inquiries, please feel free to reach out:
-
Fatema Tuj Johora Faria
- Email: fatema.faria142@gmail.com
-
Mukaffi Bin Moin
- Email: mukaffi28@gmail.com
-
Rabeya Islam Mumu
- Email: rabeya.islammomo@gmail.com
-
Md Mahabubul Alam Abir
- Email: mahbubabir09@gmail.com
-
Abrar Nawar Alfy
- Email: abraralfy49@gmail.com
@misc{faria2024motamotdatasetrevealingsupremacy,
title={Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis},
author={Fatema Tuj Johora Faria and Mukaffi Bin Moin and Rabeya Islam Mumu and Md Mahabubul Alam Abir and Abrar Nawar Alfy and Mohammad Shafiul Alam},
year={2024},
eprint={2407.19528},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.19528},
}