Skip to content

An implementation of bidirectional LSTM-CRF for Named Entity Relationship on custom corpus with custom word embeddings

License

Notifications You must be signed in to change notification settings

sudhamstarun/AwesomeNER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AwesomeNER

Introduction

The repository contains bidirectional LSTM-CRF for Named Entity Relationship on custom corpus using custom word embeddings using Tensorflow. This implementation also focuses on a finance domain specific training data extracted through rigorous data cleaning and processing.

It also contains an implementation of a state of the art Bi-directional LSTM-CNN-CRF architecture (Published at ACL'16. Link To Paper) for Named Entity Recognition using Pytorch.

Motivation

After trying to search for a NER BiLSTM-CRF implementation for finance specific sentences, it could be concluded that there was very little prior work which was conducted in this area. So it is imperative for the implementation to be succesful, task specific word embeddings and tagged training needed to be created. So this project mostly goes through the whole process of word embedding generation and training of a BiLSTM-CRF model using the tagged training data.

However, given the nature of the data, it is quite difficult to actually clean and collect enough data to conduct LSTM-CRF implementation Therefore, the implementation in this sepcific repo for finance corpus will only be limited to CRF.

Even though the project as such focuses on implementing a Bidirecitonal Long Short Term Memory - Conditional Random Field model to help us label sentences. This notebook mostly focuses on exploring a new methodology inspired from the paper. The contrasting difference between a BiLSTM-CNN-CRF and our project is that it also makes use of a CNN layer which helps us generate character embeddings on our corpus versus using just the word embeddings in our other implementation. Upon satisfactory results, the implementation would be updates as a pip library available to be used for future purposes.

References

  1. Sequence Tagging with Tensorflow using bi-LSTM + CRF with character embeddings for NER and POS by Guillaume Genthial Link
  2. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF . In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: LongPapers). Association for Computational Linguistics, Berlin, Germany Link
  3. Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on. An implementation by Hironsan Nakayama Link

Author

Tarun Sudhams

About

An implementation of bidirectional LSTM-CRF for Named Entity Relationship on custom corpus with custom word embeddings

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published