Skip to content

rodolfogoulart/ftokenizer

Repository files navigation

ftokenizer

Flutter Tokenizer for NLP models

Usage

ensure to add init

   await FTokenizer.init();

and to dispose

    FTokenizer.dispose();

If using on with Isolate, make shure to call await FTokenizer.init();on the begin andFTokenizer.dispose(); before close the Isolate

FTokenizer uses rust_tokenizer See the rust_tokenizer description: Rust-tokenizer is a drop-in replacement for the tokenization methods from the Transformers library It includes a broad range of tokenizers for state-of-the-art transformers architectures, including: Sentence Piece (unigram model)

Sentence Piece (BPE model)

BERT

ALBERT

DistilBERT

RoBERTa

GPT

GPT2

ProphetNet

CTRL

Pegasus

MBart50

M2M100

NLLB

DeBERTa

DeBERTa (v2)

The wordpiece based tokenizers include both single-threaded and multi-threaded processing. The Byte-Pair-Encoding tokenizers favor the use of a shared cache and are only available as single-threaded tokenizers Using the tokenizers requires downloading manually the tokenizers required files (vocabulary or merge files). These can be found in the Transformers library.

About

Flutter Tokenizer for NLP models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published