This project was realized in the context of an introductory course on Deep Learning applied to NLP. From sentences, composing a text, as well as the grammatical positions of each word, we had to use Part-Of-Speech Tagging (POS-Tagging) and Shallow-Parsing (chuncking) models in order to extract "hidden" information and the existing relations between words in a text. The objective is to use Deep Learning to quantify:
The difference between a POS-Tagging and Shallow-Parsing model;
The contribution, in the architecture, of a pre-trained embedding layer and of back-propagation to the vector representation of the embedding layer;
The impact on the predictive capacity of the models of more or less context around the word, whose grammatical tag is to be predicted, by varying the ngram range;
The difference between an architecture implementing a per-task model versus a multi-task architecture;
The difference between a simple multi-task model and a hierarchical multi-task model consisting in building a cascade architecture where the tasks do not intervene at the same depth of the neural network.
- Python version 3.9.7
- Install requirements.txt
$ pip install -r requirements.txt
- requirements
- This folder contains a .txt file with all the packages and versions needed to run the project.
- NLP_from_scratch
- This is a .ipynb file which are the TP.
- data_utils
- This folder contains Python files that are used as a package in the notebook.
- data
- This folder contains the data.
Here is the project pattern:
- project
> keras-POS-Tagging
> requirements
- requirements.txt
> image
- MLP.PNG
- mtl_images.PNG
- pos_tagging.PNG
- NLP_from_scratch.ipynb
> data_utils
- pos.py
- utils.py
> data
- test.txt
- test_chunk.txt
- train.txt
- train_chunk.txt
- vocab.txt
- wordVectors.txt