ftokenizer

Flutter Tokenizer for NLP models

Usage

ensure to add init

   await FTokenizer.init();

and to dispose

    FTokenizer.dispose();

If using on with Isolate, make shure to call await FTokenizer.init();on the begin andFTokenizer.dispose(); before close the Isolate

FTokenizer uses rust_tokenizer See the rust_tokenizer description: Rust-tokenizer is a drop-in replacement for the tokenization methods from the Transformers library It includes a broad range of tokenizers for state-of-the-art transformers architectures, including: Sentence Piece (unigram model)

Sentence Piece (BPE model)

BERT

ALBERT

DistilBERT

RoBERTa

GPT

GPT2

ProphetNet

CTRL

Pegasus

MBart50

M2M100

NLLB

DeBERTa

DeBERTa (v2)

The wordpiece based tokenizers include both single-threaded and multi-threaded processing. The Byte-Pair-Encoding tokenizers favor the use of a shared cache and are only available as single-threaded tokenizers Using the tokenizers requires downloading manually the tokenizers required files (vocabulary or merge files). These can be found in the Transformers library.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
android		android
cargokit		cargokit
example		example
ios		ios
lib		lib
linux		linux
macos		macos
rust		rust
test_driver		test_driver
windows		windows
.gitattributes		.gitattributes
.gitignore		.gitignore
.metadata		.metadata
.timetracker		.timetracker
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
analysis_options.yaml		analysis_options.yaml
flutter_rust_bridge.yaml		flutter_rust_bridge.yaml
pubspec.yaml		pubspec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ftokenizer

Usage

About

Releases

Packages

Languages

License

rodolfogoulart/ftokenizer

Folders and files

Latest commit

History

Repository files navigation

ftokenizer

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages