Merge pull request #82 from alxdsptr/mlperf-inference-results-scc24

Mlperf inference results scc24 pku 2
mlcommons · Nov 18, 2024 · 2125e9c · 2125e9c
2 parents 8149ec6 + 4ff08c5
commit 2125e9c
Show file tree

Hide file tree

Showing 35 changed files with 40,715 additions and 0 deletions.
diff --git a/open/pku/code/stable-diffusion-xl/README.md b/open/pku/code/stable-diffusion-xl/README.md
@@ -0,0 +1 @@
+TBD
diff --git a/...erf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main/README.md b/...erf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main/README.md
@@ -0,0 +1,3 @@
+| Model               | Scenario   | Accuracy             |   Throughput | Latency (in ms)   |
+|---------------------|------------|----------------------|--------------|-------------------|
+| stable-diffusion-xl | offline    | (14.02827, 84.33062) |        8.281 | -                 |
diff --git a/...original-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/README.md b/...original-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/README.md
@@ -0,0 +1,60 @@
+This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops).
+
+*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.*
+
+## Host platform
+
+* OS version: Linux-5.14.0-427.33.1.el9_4.x86_64-x86_64-with-glibc2.29
+* CPU version: x86_64
+* Python version: 3.8.10 (default, Sep 11 2024, 16:02:53) 
+[GCC 9.4.0]
+* MLCommons CM version: 3.4.1
+
+## CM Run Command
+
+See [CM installation guide](https://docs.mlcommons.org/inference/install/).
+
+```bash
+pip install -U cmind
+
+cm rm cache -f
+
+cm pull repo mlcommons@cm4mlops --checkout=852b297c18a90edb8a9c975dd7ee7cf731e1e347
+
+cm run script \
+	--tags=run-mlperf,inference,_r4.1-dev,_scc24-main \
+	--model=sdxl \
+	--implementation=nvidia \
+	--max_query_count=5000 \
+	--min_query_count=504 \
+	--framework=tensorrt \
+	--category=datacenter \
+	--scenario=Offline \
+	--execution_mode=test \
+	--device=cuda \
+	--max_batchsize=8 \
+	--quiet \
+	--rerun
+```
+*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts),
+ you should simply reload mlcommons@cm4mlops without checkout and clean CM cache as follows:*
+
+```bash
+cm rm repo mlcommons@cm4mlops
+cm pull repo mlcommons@cm4mlops
+cm rm cache -f
+
+```
+
+## Results
+
+Platform: mlperf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main
+
+Model Precision: int8
+
+### Accuracy Results 
+`CLIP_SCORE`: `14.02827`, Required accuracy for closed division `>= 31.68632` and `<= 31.81332`
+`FID_SCORE`: `84.33062`, Required accuracy for closed division `>= 23.01086` and `<= 23.95008`
+
+### Performance Results 
+`Samples per second`: `8.2807`
diff --git a/...riginal-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/accuracy_console.out b/...riginal-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/accuracy_console.out
@@ -0,0 +1,88 @@
+[2024-11-18 22:46:22,137 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping.
+[2024-11-18 22:46:22,217 main.py:229 INFO] Detected system ID: KnownSystem.sc1
+[2024-11-18 22:46:24,991 generate_conf_files.py:107 INFO] Generated measurements/ entries for sc1_TRT/stable-diffusion-xl/Offline
+[2024-11-18 22:46:24,991 __init__.py:46 INFO] Running command: python3 -m code.stable-diffusion-xl.tensorrt.harness --logfile_outdir="/home/lry/CM/repos/local/cache/6c0ba4746fa74e77/test_results/mlperf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=5000 --test_mode="AccuracyOnly" --gpu_batch_size=8 --mlperf_conf_path="/home/lry/CM/repos/local/cache/3e2d12440d5a4a93/inference/mlperf.conf" --tensor_path="build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/" --use_graphs=true --user_conf_path="/home/lry/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/0fe769204cb64955852be59f43b33ad5.conf" --gpu_inference_streams=1 --gpu_copy_streams=1 --gpu_engines="./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan" --scenario Offline --model stable-diffusion-xl
+[2024-11-18 22:46:24,991 __init__.py:53 INFO] Overriding Environment
+[2024-11-18 22:46:27,765 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping.
+2024-11-18 22:46:30,481	INFO worker.py:1567 -- Connecting to existing Ray cluster at address: 10.0.0.1:6379...
+2024-11-18 22:46:30,489	INFO worker.py:1743 -- Connected to Ray cluster. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m
+[2024-11-18 22:46:30,733 harness.py:207 INFO] Start Warm Up!
+[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:34,300 backend.py:428 INFO] initialized
+[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:34,402 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
+[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:34,654 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
+[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:35,018 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan.
+[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:39,168 backend.py:97 INFO] Enabling cuda graphs for unet
+[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:39,604 backend.py:155 INFO] captured graph for BS=1
+[36m(SDXLCore pid=18778, ip=10.0.0.3)[0m [2024-11-18 22:46:32,459 backend.py:428 INFO] initialized[32m [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)[0m
+[36m(SDXLCore pid=220848)[0m [2024-11-18 22:46:37,641 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan.[32m [repeated 29x across cluster][0m
+[36m(SDXLCore pid=220850)[0m [2024-11-18 22:46:40,416 backend.py:155 INFO] captured graph for BS=2
+[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:46:40,449 backend.py:97 INFO] Enabling cuda graphs for unet[32m [repeated 8x across cluster][0m
+[36m(SDXLCore pid=220852)[0m [2024-11-18 22:46:44,655 backend.py:155 INFO] captured graph for BS=6[32m [repeated 45x across cluster][0m
+[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:46:36,615 backend.py:428 INFO] initialized
+[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:46:38,113 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan.[32m [repeated 4x across cluster][0m
+[2024-11-18 22:47:06,644 harness.py:209 INFO] Warm Up Done!
+[2024-11-18 22:47:06,644 harness.py:211 INFO] Start Test!
+[2024-11-18 22:47:06,794 backend.py:852 INFO] 500
+[36m(SDXLCore pid=18774, ip=10.0.0.3)[0m [2024-11-18 22:47:03,586 backend.py:630 INFO] generate_images
+[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:46:45,509 backend.py:155 INFO] captured graph for BS=8[32m [repeated 25x across cluster][0m
+[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:47:11,266 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m
+[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:47:18,975 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m
+[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:47:26,675 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m
+[36m(SDXLCore pid=220848)[0m [2024-11-18 22:47:35,394 backend.py:630 INFO] generate_images[32m [repeated 4x across cluster][0m
+[36m(SDXLCore pid=220848)[0m [2024-11-18 22:47:45,024 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m
+[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [2024-11-18 22:47:49,957 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m
+[36m(SDXLCore pid=220848)[0m [2024-11-18 22:48:04,367 backend.py:630 INFO] generate_images[32m [repeated 9x across cluster][0m
+[2024-11-18 22:48:16,996 backend.py:901 INFO] [Server] Received 500 total samples
+[2024-11-18 22:48:16,999 backend.py:911 INFO] [Device 0] Reported 56 samples
+[2024-11-18 22:48:17,001 backend.py:911 INFO] [Device 1] Reported 56 samples
+[2024-11-18 22:48:17,002 backend.py:911 INFO] [Device 2] Reported 56 samples
+[2024-11-18 22:48:17,004 backend.py:911 INFO] [Device 3] Reported 56 samples
+[2024-11-18 22:48:17,006 backend.py:911 INFO] [Device 4] Reported 56 samples
+[2024-11-18 22:48:17,008 backend.py:911 INFO] [Device 5] Reported 55 samples
+[2024-11-18 22:48:17,009 backend.py:911 INFO] [Device 6] Reported 55 samples
+[2024-11-18 22:48:17,011 backend.py:911 INFO] [Device 7] Reported 55 samples
+[2024-11-18 22:48:17,013 backend.py:911 INFO] [Device 8] Reported 55 samples
+[2024-11-18 22:48:17,013 harness.py:214 INFO] Test Done!
+[2024-11-18 22:48:17,013 harness.py:216 INFO] Destroying SUT...
+[2024-11-18 22:48:17,013 harness.py:219 INFO] Destroying QSL...
+[36m(SDXLCore pid=220847)[0m [2024-11-18 22:48:06,100 backend.py:630 INFO] generate_images[32m [repeated 4x across cluster][0m
+benchmark : Benchmark.SDXL
+buffer_manager_thread_count : 0
+data_dir : /home/lry/CM/repos/local/cache/d2b9079c1073417b/data
+gpu_batch_size : 8
+gpu_copy_streams : 1
+gpu_inference_streams : 1
+input_dtype : int32
+input_format : linear
+log_dir : /home/lry/CM/repos/local/cache/3443882dd9374096/repo/closed/NVIDIA/build/logs/2024.11.18-22.46.18
+mlperf_conf_path : /home/lry/CM/repos/local/cache/3e2d12440d5a4a93/inference/mlperf.conf
+model_path : /home/lry/CM/repos/local/cache/d2b9079c1073417b/models/SDXL/
+offline_expected_qps : 0.0
+precision : int8
+preprocessed_data_dir : /home/lry/CM/repos/local/cache/d2b9079c1073417b/preprocessed_data
+scenario : Scenario.Offline
+system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9684X 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=96, threads_per_core=2): 2}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=791.59486, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=791594860000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA H100 80GB HBM3', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=79.6474609375, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=85520809984), max_power_limit=700.0, pci_id='0x233010DE', compute_sm=90): 5})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=2), system_id='sc1')
+tensor_path : build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/
+test_mode : AccuracyOnly
+use_graphs : True
+user_conf_path : /home/lry/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/0fe769204cb64955852be59f43b33ad5.conf
+system_id : sc1
+config_name : sc1_stable-diffusion-xl_Offline
+workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
+optimization_level : plugin-enabled
+num_profiles : 1
+config_ver : custom_k_99_MaxP
+accuracy_level : 99%
+inference_server : custom
+skip_file_checks : False
+power_limit : None
+cpu_freq : None
+[36m(SDXLCore pid=220850)[0m [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
+[36m(SDXLCore pid=220850)[0m [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
+[36m(SDXLCore pid=220850)[0m [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan
+[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan[32m [repeated 30x across cluster][0m
+[36m(SDXLCore pid=18776, ip=10.0.0.3)[0m [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan[32m [repeated 3x across cluster][0m
+[2024-11-18 22:48:18,480 run_harness.py:166 INFO] Result: Accuracy run detected.
+
+======================== Result summaries: ========================
+