Add quantization and validation on device

ENOT-AutoDL · Sep 5, 2023 · 82cfac8 · 82cfac8
1 parent 12251a7
commit 82cfac8
Show file tree

Hide file tree

Showing 20 changed files with 547 additions and 38 deletions.
diff --git a/README.md b/README.md
@@ -78,15 +78,15 @@ If you want to use your own path for dataset, change `data_root` parameter in `c
 To train baseline model, run:
 
 ```bash
-bash commands/baseline/train_baseline.sh
+bash commands/baseline/train.sh
 ```
 
 The result of this command is the `model_best.pth` checkpoint in the `runs/baseline` directory.
 
 Use this command to verify baseline accuracy:
 
 ```bash
-bash commands/baseline/test_baseline.sh
+bash commands/baseline/test.sh
 ```
 
 ## Model optimization (Jetson)
@@ -100,26 +100,34 @@ To optimize a model by latency for Jetson, run our latency server on Jetson (see
 To optimize a model by latency for Jetson, run the corresponding script (x2/x3 means latency acceleration):
 
 ```bash
-bash commands/x2_jetson/prune_x2.sh
-bash commands/x3_jetson/prune_x3.sh
+bash commands/x2_jetson/prune.sh
+bash commands/x3_jetson/prune.sh
 ```
 
-### Tune
+### Model tuning
 
 After pruning, the model should be tuned with the following command:
 
 ```bash
-bash commands/x2_jetson/tune_x2.sh
-bash commands/x3_jetson/tune_x3.sh
+bash commands/x2_jetson/tune.sh
+bash commands/x3_jetson/tune.sh
+```
+
+### Quantization
+
+To use INT8 data type for model inference, follow our quantization pipeline:
+
+```bash
+bash commands/x3_jetson/quant.sh
 ```
 
 ### Accuracy and latency verification
 
 Use this command to verify the optimized model accuracy:
 
 ```bash
-bash commands/x2_jetson/test_x2.sh
-bash commands/x3_jetson/test_x3.sh
+bash commands/x2_jetson/test.sh
+bash commands/x3_jetson/test.sh
 ```
 
 Use this command to verify the optimized model latency:
@@ -131,7 +139,7 @@ bash commands/x3_jetson/measure.sh
 
 ### Our optimization results
 
-Download our checkpoints from [Google Drive](https://drive.google.com/file/d/1wxK4UcTS-5Y2yUt2oOJR2ubyfzJiJ7uc/view?usp=sharing).
+Download our checkpoints from [Google Drive](https://drive.google.com/file/d/1uDzWVkCwWnY5XZ8CH80b0vGVxRSmDU9G/view?usp=sharing).
 
 To extract `checkpoints` use the following command:
 
@@ -148,18 +156,67 @@ python test.py configs/tusimple_res18.py --model_ckpt checkpoints/x3_jetson/mode
 ```
 
 To check metrics on ONNX run:
+
 ```bash
 python test.py configs/tusimple_res18.py --onnx_path checkpoints/baseline/model_best.onnx --batch_size 1
 python test.py configs/tusimple_res18.py --onnx_path checkpoints/x2_jetson/model_best.onnx --batch_size 1
 python test.py configs/tusimple_res18.py --onnx_path checkpoints/x3_jetson/model_best.onnx --batch_size 1
 ```
 
+> **_NOTE:_** We recommend to check metric for `quantized_model.onnx` on a target device (see our instruction in [Validation on Jetson AGX Orin device](#validation-on-jetson-agx-orin-device))
+
 To check their latency, run the following commands:
 
 ```bash
 python measure.py --model_ckpt checkpoints/baseline/model_best.pth --host <jetson-server-host> --port 15003
 python measure.py --model_ckpt checkpoints/x2_jetson/model_best.pth --host <jetson-server-host> --port 15003
 python measure.py --model_ckpt checkpoints/x3_jetson/model_best.pth --host <jetson-server-host> --port 15003
+python measure.py --onnx checkpoints/x3_jetson/quantized_model.onnx --host <jetson-server-host> --port 15003
+```
+
+### Validation on Jetson AGX Orin device
+
+To make sure that your model accuracy is not affected by computations in FP16 or INT8 on Jetson device, follow this validation pipeline:
+
+⚠️ On PC where you run scripts from this repository:
+
+1. Create a dataset in the pickle format:
+
+   ```bash
+   python pickle_dataset.py configs/tusimple_res18.py --pickle_data_path pickle_data
+   ```
+
+1. Send an ONNX model, `pickle_data`, and `inference_on_device.py` to the Jetson device using `scp`:
+
+   ```bash
+   scp -P <jetson-port> -r path/to/model.onnx pickle_data inference_on_device.py <user-name>@<jetson-host>:/your/location/
+   ```
+
+⚠️ On Jetson device:
+
+1. Install OnnxRuntime package with TensorRT backend using the following commands:
+
+   ```bash
+   wget https://nvidia.box.com/shared/static/mvdcltm9ewdy2d5nurkiqorofz1s53ww.whl -O onnxruntime_gpu-1.15.1-cp38-cp38-linux_aarch64.whl
+   pip3 install onnxruntime_gpu-1.15.1-cp38-cp38-linux_aarch64.whl
+   ```
+
+1. Run inference on the pickled dataset from the directory with previously copied model ONNX, `pickle_data`, and `inference_on_device.py`:
+
+   ```bash
+   python3 inference_on_device.py -m your/model.onnx -i pickle_data -o out_pickle --device jetson
+   ```
+
+1. Send the resulting `out_pickle` directory to your PC to the `ultra-fast-lane-detector-v2` repository root using `scp`:
+
+   ```bash
+   scp -P <pc-port> -r out_pickle <user-name>@<pc-host>:/path/to/ultra-fast-lane-detector-v2/
+   ```
+
+⚠️ On PC where you run scripts from this repository:
+
+```bash
+python test_on_pickles.py configs/tusimple_res18.py --batch_size 1 --pickled_inference_results out_pickle
 ```
 
 ## Model optimization (TI)
@@ -176,23 +233,23 @@ Use our [instruction](https://github.com/ENOT-AutoDL/latency-server-ti-tda4-j721
 To optimize a model by latency for TI, run the corresponding script (x4 means latency acceleration):
 
 ```bash
-bash commands/x4_ti/prune_x4.sh
+bash commands/x4_ti/prune.sh
 ```
 
-### Tune
+### Model tuning
 
 After pruning, the model should be tuned with the following command:
 
 ```bash
-bash commands/x4_ti/tune_x4.sh
+bash commands/x4_ti/tune.sh
 ```
 
 ### Accuracy and latency verification
 
 Use this command to verify the optimized model accuracy:
 
 ```bash
-bash commands/x4_ti/test_x4.sh
+bash commands/x4_ti/test.sh
 ```
 
 Use this command to verify the optimized model latency:
@@ -203,7 +260,7 @@ bash commands/x4_ti/measure.sh
 
 ### Our optimization results
 
-Download our checkpoints from [Google Drive](https://drive.google.com/file/d/1wxK4UcTS-5Y2yUt2oOJR2ubyfzJiJ7uc/view?usp=sharing).
+Download our checkpoints from [Google Drive](https://drive.google.com/file/d/1uDzWVkCwWnY5XZ8CH80b0vGVxRSmDU9G/view?usp=sharing).
 
 To extract `checkpoints` use the following command:
 
@@ -222,6 +279,7 @@ python test.py configs/tusimple_res18.py --model_ckpt checkpoints/x4_ti/model_be
 > **_NOTE:_** Model `checkpoints/x3_ti/model_best.pth` was obtained on Jetson (`checkpoints/x2_jetson/model_best.pth`) and has x3 acceleration on TI device.
 
 To check metrics on ONNX run:
+
 ```bash
 python test.py configs/tusimple_res18.py --onnx_path checkpoints/baseline/model_best.onnx --batch_size 1
 python test.py configs/tusimple_res18.py --onnx_path checkpoints/x3_ti/model_best.onnx --batch_size 1
@@ -261,18 +319,18 @@ To make sure that your model accuracy is not affected by computations in FX8, fo
 
    > **_NOTE:_** It takes more about 60 min to calibrate and compile the baseline model on our x86_PC.
 
-1. Send `compiled_artifacts`, `pickle_data`, and `inference_on_ti.py` to the TI device using `scp`:
+1. Send `compiled_artifacts`, `pickle_data`, and `inference_on_device.py` to the TI device using `scp`:
 
    ```bash
-   scp -P <ti-port> -r compiled_artifacts pickle_data inference_on_ti.py <user-name>@<ti-host>:/your/location/
+   scp -P <ti-port> -r compiled_artifacts pickle_data inference_on_device.py <user-name>@<ti-host>:/your/location/
    ```
 
 ⚠️ On TI device:
 
-1. Run inference on pickled dataset from the directory with previously copied `compiled_artifacts`, `pickle_data`, and `inference_on_ti.py`:
+1. Run inference on the pickled dataset from the directory with previously copied `compiled_artifacts`, `pickle_data`, and `inference_on_device.py`:
 
    ```bash
-   TIDL_TOOLS_PATH=/opt/latency_server/tidl_tools python3 inference_on_ti.py -m compiled_artifacts -i pickle_data -o out_pickle
+   TIDL_TOOLS_PATH=/opt/latency_server/tidl_tools python3 inference_on_device.py -m compiled_artifacts -i pickle_data -o out_pickle --device ti
    ```
 
 1. Send the resulting `out_pickle` directory to your PC to the `ultra-fast-lane-detector-v2` repository root using `scp`:
@@ -284,5 +342,5 @@ To make sure that your model accuracy is not affected by computations in FX8, fo
 ⚠️ On PC where you run scripts from this repository:
 
 ```bash
-python test_on_ti_data.py configs/tusimple_res18.py --batch_size 1 --ti_inference_results out_pickle
+python test_on_pickles.py configs/tusimple_res18.py --batch_size 1 --pickled_inference_results out_pickle
 ```
diff --git a/commands/baseline/test.sh b/commands/baseline/test.sh
@@ -0,0 +1,5 @@
+#!/usr/bin/env bash
+
+python test.py \
+    configs/tusimple_res18.py \
+    --model_ckpt runs/baseline/model_best.pth
diff --git a/commands/baseline/train.sh b/commands/baseline/train.sh
@@ -0,0 +1,5 @@
+#!/usr/bin/env bash
+
+python train.py \
+    configs/tusimple_res18.py \
+    --log_path runs/baseline
diff --git a/commands/x2_jetson/prune.sh b/commands/x2_jetson/prune.sh
@@ -0,0 +1,11 @@
+#!/usr/bin/env bash
+
+python prune.py \
+    configs/tusimple_res18.py \
+    --log_path runs/jetson/x2/prune \
+    --latency_type server \
+    --acceleration 2.0 \
+    --n_search_steps 200 \
+    --host localhost \
+    --port 15003 \
+    --model_ckpt runs/baseline/model_best.pth
diff --git a/commands/x2_jetson/test.sh b/commands/x2_jetson/test.sh
@@ -0,0 +1,5 @@
+#!/usr/bin/env bash
+
+python test.py \
+    configs/tusimple_res18.py \
+    --model_ckpt runs/jetson/x2/tune/model_best.pth
diff --git a/commands/x2_jetson/tune.sh b/commands/x2_jetson/tune.sh
@@ -0,0 +1,9 @@
+#!/usr/bin/env bash
+
+python train.py \
+    configs/tusimple_res18_tune.py \
+    --log_path runs/jetson/x2/tune \
+    --model_ckpt runs/jetson/x2/prune/model_best.pth \
+    --teacher runs/baseline/model_best.pth \
+    --distill_loss 2.0 \
+    --epoch 200
diff --git a/commands/x3_jetson/measure_quant.sh b/commands/x3_jetson/measure_quant.sh
@@ -0,0 +1,6 @@
+#!/usr/bin/env bash
+
+python measure.py \
+    --onnx runs/jetson/x3/quant/quantized_model.onnx \
+    --host localhost \
+    --port 15003
diff --git a/commands/x3_jetson/prune.sh b/commands/x3_jetson/prune.sh
@@ -0,0 +1,11 @@
+#!/usr/bin/env bash
+
+python prune.py \
+    configs/tusimple_res18.py \
+    --log_path runs/jetson/x3/prune \
+    --latency_type server \
+    --acceleration 3.0 \
+    --n_search_steps 200 \
+    --host localhost \
+    --port 15003 \
+    --model_ckpt runs/baseline/model_best.pth
diff --git a/commands/x3_jetson/quant.sh b/commands/x3_jetson/quant.sh
@@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+
+python quant.py \
+    configs/tusimple_res18.py \
+    --log_path runs/jetson/x3/quant \
+    --model_ckpt checkpoints/x3_jetson/model_best.pth \
+    --epoch 0
diff --git a/commands/x3_jetson/test.sh b/commands/x3_jetson/test.sh
@@ -0,0 +1,5 @@
+#!/usr/bin/env bash
+
+python test.py \
+    configs/tusimple_res18.py \
+    --model_ckpt runs/jetson/x3/tune/model_best.pth
diff --git a/commands/x3_jetson/tune.sh b/commands/x3_jetson/tune.sh
@@ -0,0 +1,9 @@
+#!/usr/bin/env bash
+
+python train.py \
+    configs/tusimple_res18_tune.py \
+    --log_path runs/jetson/x3/tune \
+    --model_ckpt runs/jetson/x3/prune/model_best.pth \
+    --teacher runs/baseline/model_best.pth \
+    --distill_loss 2.0 \
+    --epoch 200
diff --git a/commands/x4_ti/prune.sh b/commands/x4_ti/prune.sh
@@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+
+python prune.py \
+    configs/tusimple_res18.py \
+    --log_path runs/ti/x4/prune \
+    --latency_type server \
+    --acceleration 1.12 \
+    --n_search_steps 200 \
+    --host localhost \
+    --port 15003 \
+    --model_ckpt checkpoints/x3_jetson/model_best.pth \
+    --ti_compatible
diff --git a/commands/x4_ti/test.sh b/commands/x4_ti/test.sh
@@ -0,0 +1,5 @@
+#!/usr/bin/env bash
+
+python test.py \
+    configs/tusimple_res18.py \
+    --model_ckpt runs/ti/x4/tune/model_best.pth
diff --git a/commands/x4_ti/tune.sh b/commands/x4_ti/tune.sh
@@ -0,0 +1,9 @@
+#!/usr/bin/env bash
+
+python train.py \
+    configs/tusimple_res18_tune.py \
+    --log_path runs/ti/x4/tune \
+    --model_ckpt runs/ti/x4/prune/model_best.pth \
+    --teacher checkpoints/baseline/model_best.pth \
+    --distill_loss 2.0 \
+    --epoch 200