Skip to content

sagorbrur/bnflair

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BNFLAIR

A Flair based Bengali collections which provide different bengali flair embeddings and Bengali flair trained NER, POS, Text classification model.

Installation

pip install -r requirements.txt

Embeddings

Bengali Wiki Flair embeddings

Here we have trained Flair character based language model for Bengali Wiki dataset.

  • Forward LM

    • Total wikipedia artcles: 110449
    • Train epoch: 5 Epochs
    • Validation loss: 1.5366
    • Validation perplexity: 4.6490
  • Backward LM

    • Total wikipedia artcles: 110449
    • Train epoch: 5 Epochs
    • Validation loss: 1.4717
    • Validation perplexity: 4.3566

Bengali NER Model

Wikiann Model

Here we have trained Bengali NER model for wikiann Bengali NER dataset.

  • Total wikiann train data: 1000
  • Total wikiann validation data: 100
  • TOTAL wikiann test data: 100
  • Train epoch: 70 Epochs
  • Score in Test data
    • F-score (micro) 0.7751
    • F-score (macro) 0.775
    • Accuracy 0.7364
  • For details log check here

Usage

Embeddings

  • To generate flair embedding using any Bengali text
from flair.data import Sentence

sentence = Sentence('রামপ্রসাদ সেন জন্মগ্রহণ করেছিলেন গাঙ্গেয় পশ্চিমবঙ্গের এক তান্ত্রিক বৈদ্যব্রাহ্মণ পরিবারে।')

# init embeddings from your trained LM
char_lm_embeddings = FlairEmbeddings('models/embeddings/wikipedia/bnwiki_forward.pt')

# embed sentence
char_lm_embeddings.embed(sentence)
  • To fine-tune for training flair based NER, POS, Text classification model
from flair.embeddings import StackedEmbeddings

embedding_types = [
    FlairEmbeddings('models/embeddings/wikipedia/bnwiki_forward.pt'),
    FlairEmbeddings('models/embeddings/wikipedia/bnwiki_backward.pt')
]

embeddings = StackedEmbeddings(embeddings=embedding_types)

NER

  • To use NER model
from flair.data import Sentence
from flair.models import SequenceTagger

text = "কবিরঞ্জন রামপ্রসাদ সেন (১৭১৮ বা ১৭২৩ – ১৭৭৫) ছিলেন অষ্টাদশ শতাব্দীর এক বিশিষ্ট বাঙালি শাক্ত কবি ও সাধক।"
ner_model_path = "models/ner/wikiann.pt"

ner_model = SequenceTagger.load(ner_model_path)

sentence = Sentence(text)
ner_model.predict(sentence)
entities = sentence.get_spans('ner')

for entity in entities:
    print(entity)

# output: Span[0:3]: "কবিরঞ্জন রামপ্রসাদ সেন" → PER (0.5903)