-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix wmma api parity #6
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change datatypes and enums also to rocwmma equivalents. I have commented on one of them above.
csrc/kernels.hip
Outdated
wmma::fragment<wmma::matrix_b, 8, 32, 16, half, wmma::col_major> b_frag; | ||
wmma::fragment<wmma::accumulator, 8, 32, 16, half> c_frag; | ||
wmma::fill_fragment(c_frag, 0.0f); | ||
rocwmma::fragment<wmma::matrix_a, 8, 32, 16, half, wmma::row_major> a_frag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace matrix_a and row_major with rocwmma equivalents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
Hello, I am sorry to disturb you but I cannot find any place I can report an issue related to this library. |
Hi @seungduk-yanolja , please try installing it from rocm_enabled branch, the instructions are on that page, please be aware that full enablement is still pending. You can report any future issues on https://github.com/ROCm/rocm repo. |
Reported the issue here: ROCm/ROCm#2885 Hi @pnunna93, yes, I installed it from the The backtrace provided indicates that the core dump resulted from a segmentation fault (SIGABRT) triggered within the Python process. Specifically, the crash occurs during the dynamic loading of a shared library related to the The key points in the backtrace indicating the source of the issue are:
Given the complexity of debugging segmentation faults in dynamically loaded libraries, especially within the context of GPU computing, resolving such issues can sometimes require deep technical knowledge of the libraries and the underlying hardware. Collaboration with the community or seeking support from the developers of the libraries involved may be necessary. |
Hi @seungduk-yanolja , please reinstall hipblaslt with these steps: You may need to copy and relink hipblaslt .so files from build dir to /opt/rocm/lib/ if it doesn't automatically get replaced after build. |
It looks like the same command lines as described in the Update: I tried to install hipBLASLt again but there was an error (invalid memory access) and the whole filesystem became read-only. I rebooted the machine and then it did not correctly recognize the GPUs. I rebooted the IPMI and then it became normal. At this moment, what I can do with this machine (MI300X) is run vLLM with 4 out of 8 GPUs because the output became so weird when I used all 8 GPUs. Will try and explore more what I can do. |
Hey all! I'm Titus, one of the bitsandbytes maintainers. We currently have a strong push underway to officially make different hardware backends than CUDA possible in BNB. Would you be willing to help us to get the AMD part right and consolidate the code-bases? |
Hi @Titus-von-Koeller , sure! we were planning to reach out to you once we closed some internal dependencies. Is there a forum we can discuss ? |
Hi @seungduk-yanolja, sounds like there is an issue with hipblaslt build/linking. The version I pointed to has ExtOpMasterLibrary class but something else is going wrong in the build. Please check back on the ROCm issue, they would be able to help. Thanks. |
Thank you all. I do not have access to the machine anymore since it was a short-time PoC. There is another PoC scheduled next month so will try again. Thanks again. |
hipify the wmma api call with rocwmma