Punctuation Restoration

Requirements

Imagine that you are building a software for transcribing speech to text. The speech transcription part works perfectly, but cannot transcribe punctuations. The task is to train a predictive model to ingest a sequence of text and add punctuation (period, comma or question mark) in the appropriate locations. This task is important for all downstream data processing jobs.

Example input:

this is a string of text with no punctuation this is a new sentence

Example output:

this is a string of text with no punctuation <period> this is a new sentence <period>

Solution

My solution is largely based on Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration.

The architecture is defined as follows:

Obtain words embeddings from GloVe.
The word embeddings are then processed by densely connected Bi-LSTM layers.
These Bi-LSTM layers are followed by a RNN with an attention mechanism and conditional random field (CRF) log likelihood loss.

The experiments are performed on the IWSLT dataset which consists of TED Talks transcript.

The detailed analysis can be found in this notebook.

Setup and Installation

First step, clone the repo:

https://github.com/k9luo/Punctuation-Restoration.git

Second step, you can download pretrained GloVe word embeddings and create a new conda virutal environment with setup.sh. Or you can manually do these steps yourself. Note that the running setup.sh will install the GPU version of TensorFlow:

sh setup.sh

Third step, activate the virtual environment:

conda activate restore_punct

Fourth step, add the new virutal environment to Jupyter Notebook:

python -m ipykernel install --user --name=restore_punct

Training and Inference

Please run python main.py.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data/raw/LREC_converted		data/raw/LREC_converted
evaluation		evaluation
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.ipynb		main.ipynb
main.py		main.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Punctuation Restoration

Requirements

Solution

Setup and Installation

Training and Inference

About

Releases

Packages

Languages

License

k9luo/Punctuation-Restoration

Folders and files

Latest commit

History

Repository files navigation

Punctuation Restoration

Requirements

Solution

Setup and Installation

Training and Inference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages