From 743113088dff0a01f92e1f13bf24018db155b44b Mon Sep 17 00:00:00 2001
From: Raymond Kim <109366641+tt-rkim@users.noreply.github.com>
Date: Mon, 10 Jun 2024 23:57:00 -0400
Subject: [PATCH] #8764: Part 3 of docs and model demos changes (#9350)

* #8764: Fix stable_diffusion interactive_demo typo and add note about WH_ARCH_YAML

* #8764: Clean up the bert README a bit more and get rid of extra metal_BERT_large_11 test_demo.py call in the nightly scripts

* #8764: Run batch 7 on WH B0 (X2 since model runners are defined as X2 still) again

* #8764: Re-word WH_ARCH_YAML warning so that it's clearer and add specific working batches for BERT WH

* #8764: MAYBE - correct output for 3rd test_demo.py metal_BERT_11 batch

* #8764: Re-skip squadv2 demo because it seems to ND hang

* 8764: Bump down single card demo significantly since we don't need 2.5hrs

* #8764: Clean up metal_BERT_large_11 README significantly with clear notes and a table of what works on GS/WH

* #8764: Lower squadv2 BERT demo to 20 iterations because 50 is sooo many and make command for wh x2 BERT demo more generic w/ batch_7, since batch_7 is widely supported on X2

* #8764: Correct batch support for WH table
---
 .github/workflows/single-card-demo-tests.yaml |  2 +-
 models/demos/metal_BERT_large_11/README.md    | 51 ++++++++++++++-----
 models/demos/metal_BERT_large_11/demo/demo.py |  2 +-
 .../metal_BERT_large_11/tests/test_demo.py    |  3 +-
 .../demos/wormhole/stable_diffusion/README.md |  8 +++
 .../wormhole/stable_diffusion/demo/demo.py    |  2 +-
 .../single_card/nightly/run_gs_only.sh        |  2 -
 .../run_demos_single_card_n300_tests.sh       |  2 +-
 8 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/.github/workflows/single-card-demo-tests.yaml b/.github/workflows/single-card-demo-tests.yaml
index f572c656ffe..a0de6285537 100644
--- a/.github/workflows/single-card-demo-tests.yaml
+++ b/.github/workflows/single-card-demo-tests.yaml
@@ -56,7 +56,7 @@ jobs:
         run: tar -xvf ttm_${{ matrix.test-group.arch }}.tar
       - uses: ./.github/actions/install-python-deps
       - name: Run demo regression tests
-        timeout-minutes: 150
+        timeout-minutes: 45
         run: |
           source ${{ github.workspace }}/python_env/bin/activate
           cd $TT_METAL_HOME
diff --git a/models/demos/metal_BERT_large_11/README.md b/models/demos/metal_BERT_large_11/README.md
index 0ce1ecdc619..6bf69decefc 100644
--- a/models/demos/metal_BERT_large_11/README.md
+++ b/models/demos/metal_BERT_large_11/README.md
@@ -1,26 +1,33 @@
 # metal_BERT_large 11 Demo
 
+>[!WARNING]
+>
+> This model demo does not work on N150 Wormhole cards.
+
 ## How to Run
 
-The optimized demos will parallelize batch on one of the device grid dimensions.The grid size used is batch X 8 or 8 X batch depending on your device grid.
-For unharvested Grayskull it supports batch 2 - 12, so you can use `batch_12` for the following commands.
-For Wormhole N300 it supports batch 2 - 7, so you can use `batch_7` for the following commands. N300 can also support batch 8, if `WH_ARCH_YAML=wormhole_b0_80_arch_eth_dispatch.yaml` is added to the env variables, `batch_8` can be added to the command.
+### Batch support for all architectures
 
-Replace `BATCH_SIZE` with the appropriate size depending on your device
-Use `pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo[models/demos/metal_BERT_large_11/demo/input_data.json-1-BATCH_SIZE]` to run the demo for Grayskull.
-If you wish to run the demo with a different input use `pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo[address_to_your_json_file.json-1-BATCH_SIZE]`. This file is expected to have exactly `BATCH_SIZE` inputs.
+The optimized demos will parallelize batch on one of the device grid dimensions. The grid size used is `batch x 8` or `8 x batch` depending on your device grid.
 
-Our second demo is designed to run SQuADV2 dataset, run this with `pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo_squadv2 -k BATCH_SIZE`.
+For E150 (unharvested) Grayskull, the model demo supports batch 2 - 12, so you can use `batch_12` for `BATCH_SIZE` for the following commands.
 
-Expected device perf: `~410 Inferences/Second`
+For Wormhole N300, the model demo supports batch 2 - 7, so you can use `batch_7` for `BATCH_SIZE` for the following commands.
 
-To get the device performance, run `./tt_metal/tools/profiler/profile_this.py -c "pytest --disable-warnings models/demos/metal_BERT_large_11/tests/test_bert.py::test_bert[BERT_LARGE-BATCH_SIZE-BFLOAT8_B-SHARDED]"`.
-This will generate a CSV report under `<this repo dir>/generated/profiler/reports/ops/<report name>`. The report name will be shown at the end of the run.
-<!-- csv_example = "images/BERT-Large-device-profile.png" -->
+Replace `BATCH_SIZE` with the appropriate size depending on your device.
+Use `pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo -k BATCH_SIZE` to run the demo for Grayskull.
 
-Expected end-to-end perf: `Ranges from 337 to 364 Inferences/Second, depending on the machine`
+If you wish to run the demo with a different input use `pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo[address_to_your_json_file.json-1-BATCH_SIZE]`. This file is expected to have exactly `BATCH_SIZE` inputs.
+
+Our second demo is designed to run SQuADV2 dataset, run this with `pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo_squadv2 -k BATCH_SIZE`.
 
-To get the end-to-end performance, run `pytest --disable-warnings models/demos/metal_BERT_large_11/tests/test_perf_bert11.py::test_perf_bare_metal -k BATCH_SIZE`.
+The table below summarizes the information above.
+
+| Batch size | Supported on Grayskull (E150) | Supported on Wormhole (N300)         |
+|------------|-------------------------------|--------------------------------------|
+| 7          | :x:                           | :white_check_mark:                   |
+| 8          | :x:                           | See under construction section below |
+| 12         | :white_check_mark:            | :x:                                  |
 
 ## Inputs
 
@@ -35,3 +42,21 @@ The entry point to metal bert model is `TtBertBatchDram` in `bert_model.py`. The
 For fast model loading, we have cached preprocessed weights for TT tensors on Weka. These weights are directly read in and loaded to device.
 
 If your machine does not have access to Weka, during model loading it will preprocess and convert the pytorch weights from huggingface to TT tensors before placing on device.
+
+## Under construction
+
+> [!NOTE]
+>
+> This section is under construction and is not guaranteed to work under all conditions.
+>
+> If you are using Wormhole, you must set the `WH_ARCH_YAML` environment variable to use  the following batch sizes:
+>
+> - `batch_8`
+>
+> ```
+> export WH_ARCH_YAML=wormhole_b0_80_arch_eth_dispatch.yaml
+> ```
+
+We currently do not have demos that show batch sizes other than 7 or 12.
+
+N300 can also theoretically support batch 8, if `WH_ARCH_YAML=wormhole_b0_80_arch_eth_dispatch.yaml` is added to the environment variables, `batch_8` can be added to the command.
\ No newline at end of file
diff --git a/models/demos/metal_BERT_large_11/demo/demo.py b/models/demos/metal_BERT_large_11/demo/demo.py
index e1a6949406d..bb45935468f 100644
--- a/models/demos/metal_BERT_large_11/demo/demo.py
+++ b/models/demos/metal_BERT_large_11/demo/demo.py
@@ -394,7 +394,7 @@ def test_demo(
 @pytest.mark.parametrize("batch", (7, 12), ids=["batch_7", "batch_12"])
 @pytest.mark.parametrize(
     "loop_count",
-    ((50),),
+    ((20),),
 )
 def test_demo_squadv2(model_location_generator, device, use_program_cache, batch, loop_count):
     disable_persistent_kernel_cache()
diff --git a/models/demos/metal_BERT_large_11/tests/test_demo.py b/models/demos/metal_BERT_large_11/tests/test_demo.py
index 0e95fafd17a..7798a186955 100644
--- a/models/demos/metal_BERT_large_11/tests/test_demo.py
+++ b/models/demos/metal_BERT_large_11/tests/test_demo.py
@@ -10,7 +10,6 @@
 
 
 @skip_for_grayskull()
-@skip_for_wormhole_b0(reason_str="#7525: hangs on wh b0")
 @pytest.mark.parametrize("batch", (7,), ids=["batch_7"])
 @pytest.mark.parametrize(
     "input_path",
@@ -25,7 +24,7 @@ def test_demo_batch_7(batch, input_path, model_location_generator, device, use_p
         0: "scientific archaeology",
         1: "Richard I",
         2: "males",
-        3: "married outside their immediate French communities,",
+        3: "The Huguenots adapted quickly and often married outside their immediate French communities,",
         4: "biostratigraphers",
         5: "chemotaxis,",
         6: "1992,",
diff --git a/models/demos/wormhole/stable_diffusion/README.md b/models/demos/wormhole/stable_diffusion/README.md
index 265049f5204..fc3d77bfd75 100644
--- a/models/demos/wormhole/stable_diffusion/README.md
+++ b/models/demos/wormhole/stable_diffusion/README.md
@@ -11,6 +11,14 @@ Inputs by default are provided from `input_data.json`. If you wish to change the
 
 ## How to Run
 
+> [!NOTE]
+>
+> If you are using Wormhole, you must set the `WH_ARCH_YAML` environment variable.
+>
+> ```
+> export WH_ARCH_YAML=wormhole_b0_80_arch_eth_dispatch.yaml
+> ```
+
 To run the demo, make sure to build the project, activate the environment, and set the appropriate environment variables.
 For more information, refer [installation and build guide](https://github.com/tenstorrent/tt-metal/blob/main/INSTALLING.md).
 
diff --git a/models/demos/wormhole/stable_diffusion/demo/demo.py b/models/demos/wormhole/stable_diffusion/demo/demo.py
index 88f0de59632..2c7f516cf3b 100644
--- a/models/demos/wormhole/stable_diffusion/demo/demo.py
+++ b/models/demos/wormhole/stable_diffusion/demo/demo.py
@@ -644,5 +644,5 @@ def test_demo_diffusiondb(device, reset_seeds, input_path, num_prompts, num_infe
     "image_size",
     ((512, 512),),
 )
-def test_interactve_demo(device, num_inference_steps, image_size):
+def test_interactive_demo(device, num_inference_steps, image_size):
     return run_interactive_demo_inference(device, num_inference_steps, image_size)
diff --git a/tests/scripts/single_card/nightly/run_gs_only.sh b/tests/scripts/single_card/nightly/run_gs_only.sh
index 36ed969d4a0..c5bcc9f9745 100755
--- a/tests/scripts/single_card/nightly/run_gs_only.sh
+++ b/tests/scripts/single_card/nightly/run_gs_only.sh
@@ -9,8 +9,6 @@ fi
 
 echo "Running model nightly tests for GS only"
 
-env pytest models/demos/metal_BERT_large_11/tests/test_demo.py
-
 env pytest models/demos/resnet/tests/test_metal_resnet50_performant.py
 
 env pytest models/demos/resnet/tests/test_metal_resnet50_2cqs_performant.py
diff --git a/tests/scripts/single_card/run_demos_single_card_n300_tests.sh b/tests/scripts/single_card/run_demos_single_card_n300_tests.sh
index 438a4392260..01ebeab32b5 100755
--- a/tests/scripts/single_card/run_demos_single_card_n300_tests.sh
+++ b/tests/scripts/single_card/run_demos_single_card_n300_tests.sh
@@ -14,4 +14,4 @@ source tests/scripts/single_card/run_demos_single_card_n150_tests.sh
 # Not working on N150, working on N300
 unset WH_ARCH_YAML
 rm -rf built
-pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo[models/demos/metal_BERT_large_11/demo/input_data.json-1-batch_7]
+pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py -k batch_7