generated from mlcommons/template
-
Notifications
You must be signed in to change notification settings - Fork 34
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #82 from alxdsptr/mlperf-inference-results-scc24
Mlperf inference results scc24 pku 2
- Loading branch information
Showing
35 changed files
with
40,715 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
TBD |
3 changes: 3 additions & 0 deletions
3
...erf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
| Model | Scenario | Accuracy | Throughput | Latency (in ms) | | ||
|---------------------|------------|----------------------|--------------|-------------------| | ||
| stable-diffusion-xl | offline | (14.02827, 84.33062) | 8.281 | - | |
60 changes: 60 additions & 0 deletions
60
...original-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops). | ||
|
||
*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.* | ||
|
||
## Host platform | ||
|
||
* OS version: Linux-5.14.0-427.33.1.el9_4.x86_64-x86_64-with-glibc2.29 | ||
* CPU version: x86_64 | ||
* Python version: 3.8.10 (default, Sep 11 2024, 16:02:53) | ||
[GCC 9.4.0] | ||
* MLCommons CM version: 3.4.1 | ||
|
||
## CM Run Command | ||
|
||
See [CM installation guide](https://docs.mlcommons.org/inference/install/). | ||
|
||
```bash | ||
pip install -U cmind | ||
|
||
cm rm cache -f | ||
|
||
cm pull repo mlcommons@cm4mlops --checkout=852b297c18a90edb8a9c975dd7ee7cf731e1e347 | ||
|
||
cm run script \ | ||
--tags=run-mlperf,inference,_r4.1-dev,_scc24-main \ | ||
--model=sdxl \ | ||
--implementation=nvidia \ | ||
--max_query_count=5000 \ | ||
--min_query_count=504 \ | ||
--framework=tensorrt \ | ||
--category=datacenter \ | ||
--scenario=Offline \ | ||
--execution_mode=test \ | ||
--device=cuda \ | ||
--max_batchsize=8 \ | ||
--quiet \ | ||
--rerun | ||
``` | ||
*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts), | ||
you should simply reload mlcommons@cm4mlops without checkout and clean CM cache as follows:* | ||
|
||
```bash | ||
cm rm repo mlcommons@cm4mlops | ||
cm pull repo mlcommons@cm4mlops | ||
cm rm cache -f | ||
|
||
``` | ||
|
||
## Results | ||
|
||
Platform: mlperf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main | ||
|
||
Model Precision: int8 | ||
|
||
### Accuracy Results | ||
`CLIP_SCORE`: `14.02827`, Required accuracy for closed division `>= 31.68632` and `<= 31.81332` | ||
`FID_SCORE`: `84.33062`, Required accuracy for closed division `>= 23.01086` and `<= 23.95008` | ||
|
||
### Performance Results | ||
`Samples per second`: `8.2807` |
88 changes: 88 additions & 0 deletions
88
...riginal-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/accuracy_console.out
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
[2024-11-18 22:46:22,137 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping. | ||
[2024-11-18 22:46:22,217 main.py:229 INFO] Detected system ID: KnownSystem.sc1 | ||
[2024-11-18 22:46:24,991 generate_conf_files.py:107 INFO] Generated measurements/ entries for sc1_TRT/stable-diffusion-xl/Offline | ||
[2024-11-18 22:46:24,991 __init__.py:46 INFO] Running command: python3 -m code.stable-diffusion-xl.tensorrt.harness --logfile_outdir="/home/lry/CM/repos/local/cache/6c0ba4746fa74e77/test_results/mlperf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=5000 --test_mode="AccuracyOnly" --gpu_batch_size=8 --mlperf_conf_path="/home/lry/CM/repos/local/cache/3e2d12440d5a4a93/inference/mlperf.conf" --tensor_path="build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/" --use_graphs=true --user_conf_path="/home/lry/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/0fe769204cb64955852be59f43b33ad5.conf" --gpu_inference_streams=1 --gpu_copy_streams=1 --gpu_engines="./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan" --scenario Offline --model stable-diffusion-xl | ||
[2024-11-18 22:46:24,991 __init__.py:53 INFO] Overriding Environment | ||
[2024-11-18 22:46:27,765 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping. | ||
2024-11-18 22:46:30,481 INFO worker.py:1567 -- Connecting to existing Ray cluster at address: 10.0.0.1:6379... | ||
2024-11-18 22:46:30,489 INFO worker.py:1743 -- Connected to Ray cluster. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m | ||
[2024-11-18 22:46:30,733 harness.py:207 INFO] Start Warm Up! | ||
[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:34,300 backend.py:428 INFO] initialized | ||
[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:34,402 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan. | ||
[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:34,654 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan. | ||
[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:35,018 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan. | ||
[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:39,168 backend.py:97 INFO] Enabling cuda graphs for unet | ||
[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:39,604 backend.py:155 INFO] captured graph for BS=1 | ||
[36m(SDXLCore pid=18778, ip=10.0.0.3)[0m [2024-11-18 22:46:32,459 backend.py:428 INFO] initialized[32m [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)[0m | ||
[36m(SDXLCore pid=220848)[0m [2024-11-18 22:46:37,641 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan.[32m [repeated 29x across cluster][0m | ||
[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:40,416 backend.py:155 INFO] captured graph for BS=2 | ||
[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:46:40,449 backend.py:97 INFO] Enabling cuda graphs for unet[32m [repeated 8x across cluster][0m | ||
[36m(SDXLCore pid=220852)[0m [2024-11-18 22:46:44,655 backend.py:155 INFO] captured graph for BS=6[32m [repeated 45x across cluster][0m | ||
[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:46:36,615 backend.py:428 INFO] initialized | ||
[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:46:38,113 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan.[32m [repeated 4x across cluster][0m | ||
[2024-11-18 22:47:06,644 harness.py:209 INFO] Warm Up Done! | ||
[2024-11-18 22:47:06,644 harness.py:211 INFO] Start Test! | ||
[2024-11-18 22:47:06,794 backend.py:852 INFO] 500 | ||
[36m(SDXLCore pid=18774, ip=10.0.0.3)[0m [2024-11-18 22:47:03,586 backend.py:630 INFO] generate_images | ||
[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:46:45,509 backend.py:155 INFO] captured graph for BS=8[32m [repeated 25x across cluster][0m | ||
[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:47:11,266 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m | ||
[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:47:18,975 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m | ||
[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:47:26,675 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m | ||
[36m(SDXLCore pid=220848)[0m [2024-11-18 22:47:35,394 backend.py:630 INFO] generate_images[32m [repeated 4x across cluster][0m | ||
[36m(SDXLCore pid=220848)[0m [2024-11-18 22:47:45,024 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m | ||
[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:47:49,957 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m | ||
[36m(SDXLCore pid=220848)[0m [2024-11-18 22:48:04,367 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m | ||
[2024-11-18 22:48:16,996 backend.py:901 INFO] [Server] Received 500 total samples | ||
[2024-11-18 22:48:16,999 backend.py:911 INFO] [Device 0] Reported 56 samples | ||
[2024-11-18 22:48:17,001 backend.py:911 INFO] [Device 1] Reported 56 samples | ||
[2024-11-18 22:48:17,002 backend.py:911 INFO] [Device 2] Reported 56 samples | ||
[2024-11-18 22:48:17,004 backend.py:911 INFO] [Device 3] Reported 56 samples | ||
[2024-11-18 22:48:17,006 backend.py:911 INFO] [Device 4] Reported 56 samples | ||
[2024-11-18 22:48:17,008 backend.py:911 INFO] [Device 5] Reported 55 samples | ||
[2024-11-18 22:48:17,009 backend.py:911 INFO] [Device 6] Reported 55 samples | ||
[2024-11-18 22:48:17,011 backend.py:911 INFO] [Device 7] Reported 55 samples | ||
[2024-11-18 22:48:17,013 backend.py:911 INFO] [Device 8] Reported 55 samples | ||
[2024-11-18 22:48:17,013 harness.py:214 INFO] Test Done! | ||
[2024-11-18 22:48:17,013 harness.py:216 INFO] Destroying SUT... | ||
[2024-11-18 22:48:17,013 harness.py:219 INFO] Destroying QSL... | ||
[36m(SDXLCore pid=220847)[0m [2024-11-18 22:48:06,100 backend.py:630 INFO] generate_images[32m [repeated 4x across cluster][0m | ||
benchmark : Benchmark.SDXL | ||
buffer_manager_thread_count : 0 | ||
data_dir : /home/lry/CM/repos/local/cache/d2b9079c1073417b/data | ||
gpu_batch_size : 8 | ||
gpu_copy_streams : 1 | ||
gpu_inference_streams : 1 | ||
input_dtype : int32 | ||
input_format : linear | ||
log_dir : /home/lry/CM/repos/local/cache/3443882dd9374096/repo/closed/NVIDIA/build/logs/2024.11.18-22.46.18 | ||
mlperf_conf_path : /home/lry/CM/repos/local/cache/3e2d12440d5a4a93/inference/mlperf.conf | ||
model_path : /home/lry/CM/repos/local/cache/d2b9079c1073417b/models/SDXL/ | ||
offline_expected_qps : 0.0 | ||
precision : int8 | ||
preprocessed_data_dir : /home/lry/CM/repos/local/cache/d2b9079c1073417b/preprocessed_data | ||
scenario : Scenario.Offline | ||
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9684X 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=96, threads_per_core=2): 2}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=791.59486, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=791594860000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA H100 80GB HBM3', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=79.6474609375, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=85520809984), max_power_limit=700.0, pci_id='0x233010DE', compute_sm=90): 5})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=2), system_id='sc1') | ||
tensor_path : build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/ | ||
test_mode : AccuracyOnly | ||
use_graphs : True | ||
user_conf_path : /home/lry/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/0fe769204cb64955852be59f43b33ad5.conf | ||
system_id : sc1 | ||
config_name : sc1_stable-diffusion-xl_Offline | ||
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP) | ||
optimization_level : plugin-enabled | ||
num_profiles : 1 | ||
config_ver : custom_k_99_MaxP | ||
accuracy_level : 99% | ||
inference_server : custom | ||
skip_file_checks : False | ||
power_limit : None | ||
cpu_freq : None | ||
[36m(SDXLCore pid=220850)[0m [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan | ||
[36m(SDXLCore pid=220850)[0m [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan | ||
[36m(SDXLCore pid=220850)[0m [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan | ||
[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan[32m [repeated 30x across cluster][0m | ||
[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan[32m [repeated 3x across cluster][0m | ||
[2024-11-18 22:48:18,480 run_harness.py:166 INFO] Result: Accuracy run detected. | ||
|
||
======================== Result summaries: ======================== | ||
|
Oops, something went wrong.