naetherm / NLP Public

Notifications You must be signed in to change notification settings
Fork 0
Star 4

Some of my NLP projects I've worked on and to harden my experience with the research field of NLP.

4 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
chatbot		chatbot
dependency-parsing		dependency-parsing
entity-tagging		entity-tagging
examples-with-transformers		examples-with-transformers
language-detection		language-detection
named-entity-recognition		named-entity-recognition
non-deep-learning		non-deep-learning
ocr		ocr
pos-tagging		pos-tagging
question-answering		question-answering
sentence-similarity		sentence-similarity
sentiment-analysis		sentiment-analysis
spacy_examples		spacy_examples
spam_detection		spam_detection
stemming		stemming
text-classification		text-classification
text-generation		text-generation
text-mining		text-mining
text-summarization		text-summarization
transcription		transcription
word-segmentation		word-segmentation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Repository files navigation

NLP

This repository represents just a collection of machine and deep learning approaches for different tasks in the field of NLP (Natural Language Processing).

Note

Some codes, e.g. the BERT models for entity tagging and pos tagging are using <= Tensorflow 1.15, I am currently in the process of upgrading those to 2.x.

Table of Contents

Language Detection
POS Tagging
Entity Tagging
Sentiment Analysis
Word Segmentation
Chatbot

Content

Language Detection

All models were trained and evaluation on the Tatoeba dataset.

There are the following implementations:

Baseline implementation using the python langdetect module (00_langdetect.py)
Character N-Gram implementation (01_nsec_langdetect.py)

POS Tagging

All models were trained and evaluated on CONLL POS dataset.

There are the following implementations:

Basic BERT language model approach (10_bert.py)

Entity Tagging

All models were trained and evaluated on CONLL POS dataset.

There are the following implementations:

Basic BERT language model approach (10_bert.py)

Sentiment Analysis

All models were trained and evaluated on IMDB Dataset.

There are the following implementations:

LSTM model (01_lstm.py)
Bidirectional LSTM model (02_bilstm.py)

Word Segmentation

All models were trained on the first 30.000 lines of Oscar Corpus EN.

There are the following implementations:

LSTM based model (01_lstm.py)
Bidirectional LSTM based model (02_bilstm.py)
CNN based model (03_cnn.py)

Chatbot

All models within the chatbot section were trained with the Cornell Movie Dialog Corpus. The required files from the corpus were already added to the repository.

There are the following implementations:

Basic RNN model (01_seq2seq_rnn.py)
LSTM model (02_seq2seq_lstm.py)
GRU model (03_seq2seq_gru.py)
Bidirectional Basic RNN model (04_seq2seq_birnn.py)
Bidirectional LSTM model (05_seq2seq_bilstm.py)
Bidirectional GRU model (06_seq2seq_bigru.py)