-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] Stable Diffusion interactive demo is hanging on N150 #15436
Comments
On my latest test from main (commit: 78075c6): It did not hang but failed with an error at 12%.
|
@mywoodstock to help with first level triage. Removing other owners. |
@dvartaniansTT @mbahnasTT Is this issue non-deterministic in where it hangs/fails? Since this is a regression, we should to identify the commit that broke it and revert it. |
@mywoodstock @dvartaniansTT Looking at the last 2 weeks of pipelines, it seems like the failure present about 50% of the time. I'm going to try and determine if it is machine-dependant or truly ND. |
I am getting a deterministic failure right at the beginning (after the prompt). Narrowed down this regression to this PR: #15394 |
@esmalTT re-assigning to you :) |
The pr was made of 2 commits. Do you know if it was the first or second commit? The first commit enables correct asserts/warnings but doesn't change functionality of any op/infra. The second commit fixes 2 ops to properly size cbs. Could be something was undersized previously (concat?)? |
@mywoodstock what is the deterministic failure you see? Is it the same as the one mentioned in this issue? |
@tt-aho The failure is different from the one mentioned in this issue:
This happens for config tensor -- shape is [64, 24], with alignment is [64, 32], height sharded across 64 cores. |
Probably its an "oversizing" issue here: |
Yes, it should be size per core. Right now it only asserts in debug mode, and will log a warning in release. Will be changed to assert in release once existing tests are fixed. |
Describe the bug
Stable Diffusion interactive demo :
pytest models/demos/wormhole/stable_diffusion/demo/demo.py::test_interactive_demo
is hanging.I have tested with both FWs: 80.13.2.0 and 80.10.0.0.
It best it runs up to 98%. Then crashes: see attached.
To Reproduce
Steps to reproduce the behavior:
pytest models/demos/wormhole/stable_diffusion/demo/demo.py::test_interactive_demo
Expected behavior
the interactive demo is used for the web demo as well. We need this issue resolved to enable the SD web demo.
Screenshots
Please complete the following environment information:
The text was updated successfully, but these errors were encountered: