Skip to content

Introduction to part-of-speech tagging and shallow parsing with keras

Notifications You must be signed in to change notification settings

lprtk/keras-pos-tagging

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

POS Tagging & Shallow Parsing with Keras

GitHub issues GitHub forks Github Stars Code style: black

POS-Tagging

Table of contents

Content

This project was realized in the context of an introductory course on Deep Learning applied to NLP. From sentences, composing a text, as well as the grammatical positions of each word, we had to use Part-Of-Speech Tagging (POS-Tagging) and Shallow-Parsing (chuncking) models in order to extract "hidden" information and the existing relations between words in a text. The objective is to use Deep Learning to quantify:

  • The difference between a POS-Tagging and Shallow-Parsing model;

  • The contribution, in the architecture, of a pre-trained embedding layer and of back-propagation to the vector representation of the embedding layer;

  • The impact on the predictive capacity of the models of more or less context around the word, whose grammatical tag is to be predicted, by varying the ngram range;

  • The difference between an architecture implementing a per-task model versus a multi-task architecture;

  • The difference between a simple multi-task model and a hierarchical multi-task model consisting in building a cascade architecture where the tasks do not intervene at the same depth of the neural network.

Requirements

  • Python version 3.9.7
  • Install requirements.txt
$ pip install -r requirements.txt 

File details

  • requirements
    • This folder contains a .txt file with all the packages and versions needed to run the project.
  • NLP_from_scratch
    • This is a .ipynb file which are the TP.
  • data_utils
    • This folder contains Python files that are used as a package in the notebook.
  • data
    • This folder contains the data.

Here is the project pattern:

- project 
    > keras-POS-Tagging
        > requirements 
            - requirements.txt
        > image 
            - MLP.PNG
            - mtl_images.PNG
            - pos_tagging.PNG
        - NLP_from_scratch.ipynb 
        > data_utils 
            - pos.py
            - utils.py
        > data 
            - test.txt
            - test_chunk.txt
            - train.txt
            - train_chunk.txt
            - vocab.txt
            - wordVectors.txt

Features

My profilMy GitHub