This repo contains (will contain shortly) a PyTorch implementation of the Transformer architecture (Vaswani et. al. 2017) as well as experiments with generative pre-training (Radford et. al. 2018, Devlin et. al. 2018).
The repo also contains slides for a presentation given for the Scientific Discussions at Intact Data Lab.
- create training setting similar to Vaswani paper
- add dropout
- use BPE to encode sentences
- Preprocessing using SpaCy
- Train on WMT and Cornell Movie Dialog Corpus
- add label smoothing
- implement beam search
- Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017.
- Radford, Alec, et al. "Improving language understanding by generative pre-training." URL https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf (2018).
- Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).