Skip to content

Octopus is a neural machine generation toolkit for Arabic Natural Lnagauge Generation (NLG)

Notifications You must be signed in to change notification settings

UBC-NLP/octopus

Repository files navigation



GitHub release Documentation GitHub license Documentation Status GitHub stars GitHub forks

OCTOPUS

Octopus is a neural machine generation toolkit for Arabic Natural Lnagauge Generation (NLG) that described in our ArabiNLP 2023 paper: OCTOPUS: A Multitask Model and Toolkit for Arabic Natural Language Generation.

Octopus designed for eight machine generation tasks, encompassing diacritization, grammatical error correction, news headlines generation, paraphrasing, question answering, question generation, and transliteration. This comprehensive package includes a Python library along with associated command-line scripts.


Requirements and Installation

  • To install octopus and develop directly GitHub repo using pip:
    pip install -U git+https://github.com/UBC-NLP/octopus.git
  • To install octopus and develop locally:
    git clone https://github.com/UBC-NLP/octopus.git
    cd octopus
    pip install .

Getting Started

The full documentation contains instructions for getting started, translation using diffrent methods, intergrate OCTOPUS with your code, and provides more examples.

Colab Examples

(1) Command Line Interface

Command ContentColab link
octopus
  • Usage and Arguments
  • Using greedy search
  • Using beam search (default)
  • Using sampling search
  • Read text from file
colab
octopus_interactive
  • Usage and Arguments
  • Examples
colab

(2) Integrate Octopus with your python code

Functions ContentColab link
generate
generate_from_file
  • Install Octopus
  • Initial octopus object
  • Using greedy search
  • Using beam search (default)
  • Using sampling search
  • Read text from file
colab

License

octopus(-py) is Apache-2.0 licensed. The license applies to the pre-trained models as well.

Citation

If you use OCTOPUS toolkit or the pre-trained models for your scientific publication, or if you find the resources in this repository useful, please cite our paper as follows (to be updated):

 @misc{elmadany2023octopus,
      title={Octopus: A Multitask Model and Toolkit for Arabic Natural Language Generation}, 
      author={AbdelRahim Elmadany and El Moatez Billah Nagoudi and Muhammad Abdul-Mageed},
      year={2023},
      eprint={2310.16127},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgments

We gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada (NSERC; RGPIN-2018-04267), the Social Sciences and Humanities Research Council of Canada (SSHRC; 435-2018-0576; 895-2020-1004; 895-2021-1008), Canadian Foundation for Innovation (CFI; 37771), ComputeCanada (CC), UBC ARC-Sockeye and Advanced Micro Devices, Inc. (AMD). Any opinions, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NSERC, SSHRC, CFI, CC, AMD, or UBC ARC-Sockeye.