[docs] Updates djl model converter document (#3385)

Co-authored-by: nobody <nobody@localhost>
deepjavalibrary · Aug 5, 2024 · a03ab68 · a03ab68
1 parent 904c971
commit a03ab68
Showing 1 changed file with 61 additions and 33 deletions.
diff --git a/extensions/tokenizers/README.md b/extensions/tokenizers/README.md
@@ -13,13 +13,15 @@ You can also build the latest javadocs locally using the following command:
 ```sh
 ./gradlew javadoc
 ```
+
 The javadocs output is built in the `build/doc/javadoc` folder.
 
 ## Installation
 
 You can pull the module from the central Maven repository by including the following dependency in your `pom.xml` file:
 
 ```xml
+
 <dependency>
     <groupId>ai.djl.huggingface</groupId>
     <artifactId>tokenizers</artifactId>
@@ -29,71 +31,95 @@ You can pull the module from the central Maven repository by including the follo
 
 ## Usage
 
-### Use DJL HuggingFace model converter (experimental)
+### Use DJL HuggingFace model converter
 
 If you are trying to convert a complete HuggingFace (transformers) model,
 you can try to use our all-in-one conversion solution to convert to Java:
 
+Currently, this converter supports the following tasks:
+
+- fill-mask
+- question-answering
+- sentence-similarity
+- text-classification
+- token-classification
+
+#### Install `djl-converter`
+
+You can install `djl-converter` from djl master branch or clone the repository and install from source:
+
+```
+# install release version of djl-converter
+pip install https://publish.djl.ai/djl_converter/djl_converter-0.30.0-py3-none-any.whl
 
+# install from djl master branch
+pip install "git+https://github.com/deepjavalibrary/djl.git#subdirectory=extensions/tokenizers/src/main/python"
+
+# install djl-convert from local djl repo
+git clone https://github.com/deepjavalibrary/djl.git
+cd djl/extensions/tokenizers/src/main/python
+python3 -m pip install -e .
+
+# install optimum if you want to convert to OnnxRuntime
+pip install optimum
+
+# convert a single model to TorchScript, Onnxruntime or Rust
+djl-convert --help
+
+# import models as DJL Model Zoo
+djl-import --help
+```
 
 #### Convert Huggingface model to torchscript
 
 ```bash
-python3 -m pip install -r src/main/python/requirements.txt
-python3 src/main/python/model_zoo_importer.py -m deepset/bert-base-cased-squad2
+djl-convert -m deepset/bert-base-cased-squad2
 ```
 
-This will generate a zip file into your local folder:
+This will find converted model in `model/bert-base-cased-squad2/` folder:
 
 ```
-model/nlp/question_answer/ai/djl/huggingface/pytorch/deepset/bert-base-cased-squad2/0.0.1/bert-base-cased-squad2.zip
+djl-convert -m deepset/bert-base-cased-squad2
 ```
 
 #### Convert Huggingface model to OnnxRuntime
 
 ```bash
-python3 -m pip install -r src/main/python/requirements.txt
-python3 src/main/python/model_zoo_importer.py -m deepset/bert-base-cased-squad2 -f OnnxRuntime
+djl-convert -m deepset/bert-base-cased-squad2 -f OnnxRuntime
 ```
 
 #### Convert Huggingface model to Rust
 
 ```bash
-python3 -m pip install -r src/main/python/requirements.txt
-python3 src/main/python/model_zoo_importer.py -m deepset/bert-base-cased-squad2 -f Rust
+djl-convert -m deepset/bert-base-cased-squad2 -f Rust
 ```
-#### Use command line
-You can also use install djl-converter from source and use it as commandline:
-
-```bash
-cd djl/extensions/tokenizers/src/main/python
-python3 -m pip install -e .
-# convert a single model to TorchScript, Onnxruntime or Rust
-djl-convert --help
 
-# import models as DJL Model Zoo
-djl-import --help
-```
+#### Load converted model
 
 Then, all you need to do, is to load this model in DJL:
 
 ```java
 Criteria<QAInput, String> criteria = Criteria.builder()
-    .setTypes(QAInput.class, String.class)
-    .optModelPath(Paths.get("model/nlp/question_answer/ai/djl/huggingface/pytorch/deepset/bert-base-cased-squad2/0.0.1/bert-base-cased-squad2.zip"))
-    .optTranslatorFactory(new DeferredTranslatorFactory())
-    .optProgress(new ProgressBar()).build();
+        .setTypes(QAInput.class, String.class)
+        .optModelPath(Paths.get("model/bert-base-cased-squad2/"))
+        .optTranslatorFactory(new DeferredTranslatorFactory())
+        .optProgress(new ProgressBar()).build();
 ```
 
-Currently, this converter support:
+#### Import multiple Huggingface Hub models into DJL model zoo
 
-- fill-mask
-- question-answering
-- sentence-similarity
-- text-classification
-- token-classification
+```
+djl-import -m deepset/bert-base-cased-squad2
+```
+
+This will generate a zip file into your local djl model zoo folder structure:
+
+```
+model/nlp/question_answer/ai/djl/huggingface/pytorch/deepset/bert-base-cased-squad2/0.0.1/bert-base-cased-squad2.zip
+```
 
 ### From HuggingFace AutoTokenizer
+
 In most of the cases, you can easily use a pre-existing tokenizer in DJL:
 
 Python
@@ -110,10 +136,12 @@ HuggingFaceTokenizer tokenizer = HuggingFaceTokenizer.newInstance("sentence-tran
 ```
 
 This way requires network connection to huggingface repo.
-The way to determine if you can use this way is through looking into the "Files and versions" in [HuggingFace model tab](https://huggingface.co/sentence-transformers/msmarco-distilbert-dot-v5)
-and see if there is a `tokenizer.json`. 
+The way to determine if you can use this way is through looking into the "Files and versions"
+in [HuggingFace model tab](https://huggingface.co/sentence-transformers/msmarco-distilbert-dot-v5)
+and see if there is a `tokenizer.json`.
 
-If there is a `tokenizer.json`, you can get it directly through DJL. Otherwise, use the other way below to obtain a `tokenizer.json`.
+If there is a `tokenizer.json`, you can get it directly through DJL. Otherwise, use the other way below to obtain
+a `tokenizer.json`.
 
 ### From HuggingFace Pipeline