-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unify xpu and cpu backend and use paged attention #1009
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* refine class IPEXPagedCache's update method Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * replace tensor on xpu to List to avoid memory copy Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * split IPEXPagedCache's update function into `update_for_prefill` and `update_for_decode` Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* enable qkv * split key value into 2 lists
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
#979) * enable gpt2, falcon has core dump error in PagedAttention.single_query_cached_kv_attention * enable new_decoder_arch falcon * only keep 1 config * rm autocast
* fix bug when run IPEXCausalModel forward directly; fix bug when using `save_pretrain` Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * add LinearGelu Op support for XPU Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * fix unit test error Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * adjust unit test case Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * fix bug Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* skip assited decoding unit test for models using paged attention Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * XPU CI tests get almost all passed Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix ci config * fix test versions * fix ipex version Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* use python3.9 test Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* change ipex transformers limited verison in setup * fix inc tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
@IlyasMoutawwakil @echarlaix , pls help review, we can also have a meeting to review it if needed. Thx. |
* fix bert and vit patch * fix vit and bert save Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@yao-matrix reviewing right now |
Hi @IlyasMoutawwakil , please also merge this PR #1024. Thanks! |
* fix reorder cache for non-patch models Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * disable torch < 2.3 tests, we won't use torch < 2.4 Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix test beam serach Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix cache selection Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * upgrad to transformers4.46 Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * change ipex test yaml transformers version to 4.46 Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* set device as the same as origin model * fix device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Hi @IlyasMoutawwakil . I have replied and fixed your comments, please take the 2nd round review. Thanks~ |
* simplify forward and save pretrained since no jit support * fix format * rm warmup because no jit mode anymore * simplify forward for causal lm model * fix paged pkv forward * disable use_cache when just run forward --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
|
||
if isinstance(model, torch.jit.RecursiveScriptModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TorchScript models will not be compatible anymore which is an important breaking change, we need to catch this to inform users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we need to update the documentation
For now, support is only enabled for CPUs and the original model will be exported via TorchScript. In the future `torch.compile` will be used and model exported via TorchScript will get deprecated. |
optimum/intel/ipex/modeling_base.py
Outdated
|
||
return cls(model, config=config, model_save_dir=model_save_dir, **kwargs) | ||
task = cls.export_feature | ||
model = TasksManager.get_model_from_task( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use cls.auto_model_class
?
def test_compare_with_and_without_past_key_values(self): | ||
model_id = "echarlaix/tiny-random-gpt2-torchscript" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to add a test to make sure an IPEX model that has been saved and then pushed on the hub is still compatible with the latest the optimum-intel version and that the model can still be correctly loaded and used for inference. Also could make sense to check the resulting output (comparing it with a transformers model) wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid we cannot support the previous exported language model anymore because these jit models use IAVK cache which has a different logic with paged attention. It will make the code extremely massive and hard to maintain, and it will confuse users too. Besides, these jit models are also out of date compared to the current transformers version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way is to fallback IPEXModelForCausalLM to TSModelForCausalLM when loading a jit model, but it requires config.torchscript == True
so we can know it's a torch script model. So you might need to update the "echarlaix/tiny-random-gpt2-torchscript" config parameter. I have updated the model's config, please check here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I'm suggesting here is to create a new model with this implementation, push it on the hub and have a test to make sure this model is still compatible / can be correctly loaded and inference works as expected (pushing an other model than "echarlaix/tiny-random-gpt2-torchscript")
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* nice code * device type adjustment Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* enable compile for non-generation tasks * add no_grad in forward * warmup compiled model * disable compile not ready models * set system level optimize for torch.compile * fix typo * add comments * set torch minimum version for compiling Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Hi @echarlaix @IlyasMoutawwakil , please review the new changes. Thanks |
) | ||
return TSModelForCausalLM.from_pretrained(model_id, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An instance of TSModelForCausalLM
will be created for all IPEXModel
(even for encoder models) which doesn't really make sense to me. Also it's not tested anywhere from what I see, I prefer to raise an error here instead of keeping support that we're not sure works / is compatible with the previous integration
def test_compare_with_and_without_past_key_values(self): | ||
model_id = "echarlaix/tiny-random-gpt2-torchscript" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I'm suggesting here is to create a new model with this implementation, push it on the hub and have a test to make sure this model is still compatible / can be correctly loaded and inference works as expected (pushing an other model than "echarlaix/tiny-random-gpt2-torchscript")
|
||
return cls(model, config=config, model_save_dir=model_save_dir, **kwargs) | ||
model = cls.auto_model_class.from_pretrained(model_id, **kwargs) | ||
return cls(model, config=model.config, export=True, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would export be needed ?
return cls(model, config=model.config, export=True, **kwargs) | |
return cls(model, config=model.config, **kwargs) |
|
||
if isinstance(model, torch.jit.RecursiveScriptModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we need to update the documentation
For now, support is only enabled for CPUs and the original model will be exported via TorchScript. In the future `torch.compile` will be used and model exported via TorchScript will get deprecated. |
What does this PR do?
Fixes # (issue)
Before submitting