This repository is made for the NLP course project - 2023.
- Translating English text to Persian using Fairseq-py.
-
en-fa-MT_model1
- LSTM Decoder-encoder architecture includes one encoder layer and one decoder layer with with attention mechanism
- sentencepiece model is used for byte-pair-encoding (BPE) data
- train model using Fairseq-py
-
en-fa-MT_model2
- LSTM Decoder-encoder architecture includes one encoder layer and one decoder layer with with attention mechanism
- BERT-multilingual-base-model Tokenizer is used for tokenize data
- train model using Fairseq-py & Using the weights of the embedding layer of BERT-multilingual-base-model as the initial value of the Model weights
AFEC dataset includes aligned Persian and English sentences and human-translated sentences. For more information about the AFEC dataset, you can read its article