This survey delves into the current state of natural language processing (NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta. Through this paper, we identify key challenges and opportunities for NLP research in Ethiopia. Furthermore, we provide a centralized repository on GitHub that contains publicly available resources for various NLP tasks in these languages. This reposi tory can be updated periodically with contributions from other researchers. Our objective is to disseminate information to NLP researchers interested in Ethiopian languages and encourage future research in this domain.
Tools Name | Tools task | Language support | Resource link |
---|---|---|---|
amseg | Segmenter, tokenizer, transliteration, romanization and normalization | Amharic | amseg |
HornMorpho | Morphological analysis | Amhric, Afaan Ormo, Tigirgna | HornMorpho |
lemma | Lemmatizer | Amhric | lemma |
We discuss the MT progress for Ethiopian languages in three categories: English Centeric -> works done for the above target Ethiopian languages with English pair, Ethiopian - Ethiopian -> works done for Ethiopian language pairs without involving other languages and Multilingual MT -> works done for Ethiopian languages with other languages in a multilingual setting.
- Parallel Corpora Preparation for English-Amharic Machine Translation
- Extended Parallel Corpus for Amharic-English Machine Translation
- Context based machine translation with recurrent neural network for English–Amharic translation
- Offline Corpus Augmentation for English-Amharic Machine Translation
- The Effect of Normalization for Bi-directional Amharic-English Neural Machine Translation
- Optimal Alignment for Bi-directional Afaan Oromo-English Statistical Machine Translation
- English-Afaan Oromo Statistical Machine Translation
- English-Oromo Machine Translation: An Experiment Using a Statistical Approach
- Crowdsourcing Parallel Corpus for English-Oromo Neural Machine Translation using Community Engagement Platform
- Machine Learning Approach to English-Afaan Oromo Text-Text Translation: Using Attention based Neural Machine Translation
- The effect of shallow segmentation on English-Tigrinya statistical machine translation
- Morphological Segmentation for English-to-Tigrinya Statistical Machine Translation
- Enhancing Bi-directional English-Tigrigna Machine Translation Using Hybrid Approach
- Statistical Machine Translator For English To Tigrigna Translation
- An Exploration of Data Augmentation Techniques for Improving English to Tigrinya Translation
- A Parallel Corpora for bi-directional Neural Machine Translation for Low Resourced Ethiopian Languages
- Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data
- English-Ethiopian Languages Statistical Machine Translation
- Amharic-Awngi Machine Translation: An Experiment Using Statistical Approach
- Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrign
- Context based machine translation with recurrent neural network for English-Amharic translation
- Low resource neural machine translation: A benchmark for five african languages
- WebCrawl African : A Multilingual Parallel Corpora for African Languages
- A comparative study on different techniques for thai part-of-speech tagging
- Machine Learning Approaches for Amharic Parts-of-speech Tagging
- Towards improving Brill’s tagger lexical and transformation rule for Afaan Oromo language
- Deep learning-based part-of-speech tagging of the Tigrinya language
- Introducing various Semantic Models for Amharic: Experimentation and Evaluation with multiple Tasks and Datasets