You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After calling optimize_model() on a GPT2Model instance from HuggingFace's transformers, the model's forward pass will raise a RuntimeError: CUDA error: operation not permitted when stream is capturing.
A very similar issue is #15002 but it was closed without solution.
Steps to reproduce
from transformers import GPT2Config, GPT2Model
import torch
from kernl.model_optimization import optimize_model
model = GPT2Model(GPT2Config()).eval().cuda()
optimize_model(model)
with torch.cuda.amp.autocast():
print(model(torch.tensor([[0]], device="cuda")))
Expected Behavior
The model's output should be printed, as would be the case without the line optimize_model(model)
afaik around pytorch 2.1 ish there was a change made to how pytorch represents the computation graph, now they use Sympy. I don't think kernl has been updated to accept that new computation graph format. However since you are using a common huggingface model, torch.compile, deepspeed, or BetterTransformers will probably work.
thats a bit weird - torch.compile (with dynamic=True) gives a 1.6x speedup for me (google colab A100). Certainly not a slowdown. What is your hardware?
@CorentinJ Also maybe unrelated I noticed you work at resemble AI. Are you trying to make tortoise-tts go faster by trying to make HF GPT2 faster for the autoregressive part of tortoise-tts? I'm working on a similar model to tortoise-tts and have it deployed: https://voicegen.org/. Maybe we can compare notes and help each other out. My email is wilson97@gmail.com if you're interested.
Description
After calling optimize_model() on a
GPT2Model
instance from HuggingFace's transformers, the model's forward pass will raise aRuntimeError: CUDA error: operation not permitted when stream is capturing
.A very similar issue is #15002 but it was closed without solution.
Steps to reproduce
Expected Behavior
The model's output should be printed, as would be the case without the line
optimize_model(model)
Actual Behavior
Stack trace
Your environment
Self-service
Code of Conduct
The text was updated successfully, but these errors were encountered: