A Flair based Bengali collections which provide different bengali flair embeddings and Bengali flair trained NER, POS, Text classification model.
pip install -r requirements.txt
Here we have trained Flair character based language model for Bengali Wiki dataset.
-
- Total wikipedia artcles: 110449
- Train epoch: 5 Epochs
- Validation loss: 1.5366
- Validation perplexity: 4.6490
-
- Total wikipedia artcles: 110449
- Train epoch: 5 Epochs
- Validation loss: 1.4717
- Validation perplexity: 4.3566
Here we have trained Bengali NER model for wikiann Bengali NER dataset.
- Total wikiann train data: 1000
- Total wikiann validation data: 100
- TOTAL wikiann test data: 100
- Train epoch: 70 Epochs
- Score in Test data
- F-score (micro) 0.7751
- F-score (macro) 0.775
- Accuracy 0.7364
- For details log check here
- To generate flair embedding using any Bengali text
from flair.data import Sentence
sentence = Sentence('রামপ্রসাদ সেন জন্মগ্রহণ করেছিলেন গাঙ্গেয় পশ্চিমবঙ্গের এক তান্ত্রিক বৈদ্যব্রাহ্মণ পরিবারে।')
# init embeddings from your trained LM
char_lm_embeddings = FlairEmbeddings('models/embeddings/wikipedia/bnwiki_forward.pt')
# embed sentence
char_lm_embeddings.embed(sentence)
- To fine-tune for training flair based NER, POS, Text classification model
from flair.embeddings import StackedEmbeddings
embedding_types = [
FlairEmbeddings('models/embeddings/wikipedia/bnwiki_forward.pt'),
FlairEmbeddings('models/embeddings/wikipedia/bnwiki_backward.pt')
]
embeddings = StackedEmbeddings(embeddings=embedding_types)
- To use NER model
from flair.data import Sentence
from flair.models import SequenceTagger
text = "কবিরঞ্জন রামপ্রসাদ সেন (১৭১৮ বা ১৭২৩ – ১৭৭৫) ছিলেন অষ্টাদশ শতাব্দীর এক বিশিষ্ট বাঙালি শাক্ত কবি ও সাধক।"
ner_model_path = "models/ner/wikiann.pt"
ner_model = SequenceTagger.load(ner_model_path)
sentence = Sentence(text)
ner_model.predict(sentence)
entities = sentence.get_spans('ner')
for entity in entities:
print(entity)
# output: Span[0:3]: "কবিরঞ্জন রামপ্রসাদ সেন" → PER (0.5903)