This is the reproduce of Transformer architecture in paper "Attention is all your need".
The aim of this repository is to help those who want an insight to the details of Transformer realization, without being bothered with data preprocessing.
The structure of Transformer is illustrated as bellow
Thus, we build the network hierarchically. From the top to bottom level is
Transformer--Fused_Embedding Encoder Decoder--Encoder_layer Decoder_layer--Multiheaded Attention PositionWise_FeedForwardNetwork
the tree structure is shown as bellow:
-Transformer.py
--Fus_Embeddings(AggregationModel.py)
-- word Embedding Vectors
-- Positional Encoding(Modules.py)
--Encoder(AggregationModel.py)
-- Encoder Layer(Model.py)
-- MultiHeadedAttention(Modules.py)
-- PostionWiseFFN(Modules.py)
--Decoder(AggregationModel.py)
-- Decoder Layer(Model.py)
-- MultiHeadedAttention(Modules.py)
-- PostionWiseFFN(Modules.py)
- pytorch 1.1.0
- python 3.6.8
- torchtext 0.5.0
- tqdm
- dill
- The byte-pair-encoding has already been processed so that you can focus on the specific structure of Transformer
- Train the model
python train.py -data_pkl ./bpe_deen/bpe_vocab.pkl -train_path ./bpe_deen/deen-train -val_path ./bpe_deen/deen-val -log deen_bpe -label_smoothing -save_model trained -b 256 -warmup 128000 -epoch 400
- GPU requirement: 4 TitanX
- The data interface is borrowed from "A PyTorch implementation of the Transformer model in "Attention is All You Need"."
- Another outstanding work "The Annotated Transformer" inspired me during my coding process