Pytorch Implementation of GoEmotions with Huggingface Transformers
Dataset labeled 58000 Reddit comments with 28 emotions
- admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise + neutral
-
Use
bert-base-cased
(Same as the paper's code) -
In paper, 3 Taxonomies were used. I've also made the data with new taxonomy labels for
hierarchical grouping
andekman
.- Original GoEmotions (27 emotions + neutral)
- Hierarchical Grouping (positive, negative, ambiguous + neutral)
- Ekman (anger, disgust, fear, joy, sadness, surprise + neutral)
- I've replace
[unused1]
,[unused2]
to[NAME]
,[RELIGION]
in the vocab, respectively.
[PAD]
[NAME]
[RELIGION]
[unused3]
[unused4]
...
- I've also set
special_tokens_map.json
as below, so the tokenizer won't split the[NAME]
or[RELIGION]
into its word pieces.
{
"unk_token": "[UNK]",
"sep_token": "[SEP]",
"pad_token": "[PAD]",
"cls_token": "[CLS]",
"mask_token": "[MASK]",
"additional_special_tokens": ["[NAME]", "[RELIGION]"]
}
- torch==1.4.0
- transformers==2.11.0
- attrdict==2.0.1
You can change the parameters from the json files in config
directory.
Parameter | |
---|---|
Learning rate | 5e-5 |
Warmup proportion | 0.1 |
Epochs | 10 |
Max Seq Length | 50 |
Batch size | 16 |
For taxonomy, choose original
, group
or ekman
$ python3 run_goemotions.py --taxonomy {$TAXONOMY}
$ python3 run_goemotions.py --taxonomy original
$ python3 run_goemotions.py --taxonomy group
$ python3 run_goemotions.py --taxonomy ekman
Best Result of Macro F1
Macro F1 (%) | Dev | Test |
---|---|---|
original | 50.16 | 50.30 |
group | 69.41 | 70.06 |
ekman | 62.59 | 62.38 |
- Inference for multi-label classification was made possible by creating a new
MultiLabelPipeline
class. - Already uploaded
finetuned model
on Huggingface S3.- Original GoEmotions Taxonomy:
monologg/bert-base-cased-goemotions-original
- Hierarchical Group Taxonomy:
monologg/bert-base-cased-goemotions-group
- Ekman Taxonomy:
monologg/bert-base-cased-goemotions-ekman
- Original GoEmotions Taxonomy:
from transformers import BertTokenizer
from model import BertForMultiLabelClassification
from multilabel_pipeline import MultiLabelPipeline
from pprint import pprint
tokenizer = BertTokenizer.from_pretrained("monologg/bert-base-cased-goemotions-original")
model = BertForMultiLabelClassification.from_pretrained("monologg/bert-base-cased-goemotions-original")
goemotions = MultiLabelPipeline(
model=model,
tokenizer=tokenizer,
threshold=0.3
)
texts = [
"Hey that's a thought! Maybe we need [NAME] to be the celebrity vaccine endorsement!",
"it’s happened before?! love my hometown of beautiful new ken 😂😂",
"I love you, brother.",
"Troll, bro. They know they're saying stupid shit. The motherfucker does nothing but stink up libertarian subs talking shit",
]
pprint(goemotions(texts))
# Output
[{'labels': ['neutral'], 'scores': [0.9750906]},
{'labels': ['curiosity', 'love'], 'scores': [0.9694574, 0.9227462]},
{'labels': ['love'], 'scores': [0.993483]},
{'labels': ['anger'], 'scores': [0.99225825]}]
from transformers import BertTokenizer
from model import BertForMultiLabelClassification
from multilabel_pipeline import MultiLabelPipeline
from pprint import pprint
tokenizer = BertTokenizer.from_pretrained("monologg/bert-base-cased-goemotions-group")
model = BertForMultiLabelClassification.from_pretrained("monologg/bert-base-cased-goemotions-group")
goemotions = MultiLabelPipeline(
model=model,
tokenizer=tokenizer,
threshold=0.3
)
texts = [
"Hey that's a thought! Maybe we need [NAME] to be the celebrity vaccine endorsement!",
"it’s happened before?! love my hometown of beautiful new ken 😂😂",
"I love you, brother.",
"Troll, bro. They know they're saying stupid shit. The motherfucker does nothing but stink up libertarian subs talking shit",
]
pprint(goemotions(texts))
# Output
[{'labels': ['positive'], 'scores': [0.9989434]},
{'labels': ['ambiguous', 'positive'], 'scores': [0.99801123, 0.99845874]},
{'labels': ['positive'], 'scores': [0.99930394]},
{'labels': ['negative'], 'scores': [0.9984231]}]
from transformers import BertTokenizer
from model import BertForMultiLabelClassification
from multilabel_pipeline import MultiLabelPipeline
from pprint import pprint
tokenizer = BertTokenizer.from_pretrained("monologg/bert-base-cased-goemotions-ekman")
model = BertForMultiLabelClassification.from_pretrained("monologg/bert-base-cased-goemotions-ekman")
goemotions = MultiLabelPipeline(
model=model,
tokenizer=tokenizer,
threshold=0.3
)
texts = [
"Hey that's a thought! Maybe we need [NAME] to be the celebrity vaccine endorsement!",
"it’s happened before?! love my hometown of beautiful new ken 😂😂",
"I love you, brother.",
"Troll, bro. They know they're saying stupid shit. The motherfucker does nothing but stink up libertarian subs talking shit",
]
pprint(goemotions(texts))
# Output
[{'labels': ['joy', 'neutral'], 'scores': [0.30459446, 0.9217335]},
{'labels': ['joy', 'surprise'], 'scores': [0.9981395, 0.99863845]},
{'labels': ['joy'], 'scores': [0.99910116]},
{'labels': ['anger'], 'scores': [0.9984291]}]