-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.md
48 lines (38 loc) · 2.47 KB
/
README.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# Transformer-backbone
This is the reproduce of Transformer architecture in paper ["Attention is all your need"](https://arxiv.org/abs/1706.03762).
The aim of this repository is to help those who want an insight to the details of Transformer realization, without being bothered with data preprocessing.
The structure of Transformer is illustrated as bellow
<img src="./pic/structure.png" width="400" height="600" alt='' align=center />
Thus, we build the network hierarchically. From the top to bottom level is
Transformer--Fused_Embedding Encoder Decoder--Encoder_layer Decoder_layer--Multiheaded Attention PositionWise_FeedForwardNetwork
the tree structure is shown as bellow:
-_Transformer.py_
>--_Fus_Embeddings(AggregationModel.py)_
>>-- _word Embedding Vectors_
>>-- _Positional Encoding(Modules.py)_
>--_Encoder(AggregationModel.py)_
>>-- _Encoder Layer(Model.py)_
-- _MultiHeadedAttention(Modules.py)_
-- _PostionWiseFFN(Modules.py)_
>--_Decoder(AggregationModel.py)_
>>-- _Decoder Layer(Model.py)_
-- _MultiHeadedAttention(Modules.py)_
-- _PostionWiseFFN(Modules.py)_
# Environment Configuration
* pytorch 1.1.0
* python 3.6.8
* torchtext 0.5.0
* tqdm
* dill
# Usage
## WMT'17 Multimodal Translation: de-en BPE
1. The byte-pair-encoding has already been processed so that you can focus on the specific structure of Transformer
2. Train the model
```python train.py -data_pkl ./bpe_deen/bpe_vocab.pkl -train_path ./bpe_deen/deen-train -val_path ./bpe_deen/deen-val -log deen_bpe -label_smoothing -save_model trained -b 256 -warmup 128000 -epoch 400```
3. GPU requirement: 4 TitanX
# Performance
## Loss Accuracy
<img src="./pic/loss.png" width="300" height="300" alt='' align=center /><img src="./pic/acc.png" width="300" height="300" alt='' align=center />
# Acknowledgement
* The data interface is borrowed from ["A PyTorch implementation of the Transformer model in "Attention is All You Need"."](https://github.com/jadore801120/attention-is-all-you-need-pytorch)
* Another outstanding work ["The Annotated Transformer"](http://nlp.seas.harvard.edu/2018/04/03/attention.html) inspired me during my coding process