Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unify xpu and cpu backend and use paged attention #1009

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open

Conversation

sywangyi
Copy link
Collaborator

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

sywangyi and others added 14 commits October 8, 2024 22:57
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* refine class IPEXPagedCache's update method

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* replace tensor on xpu to List to avoid memory copy

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* split IPEXPagedCache's update function into `update_for_prefill` and `update_for_decode`

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* enable qkv

* split key value into 2 lists
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
#979)

* enable gpt2, falcon has core dump error in PagedAttention.single_query_cached_kv_attention

* enable new_decoder_arch falcon

* only keep 1 config

* rm autocast
* fix bug when run IPEXCausalModel forward directly; fix bug when using `save_pretrain`

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* add LinearGelu Op support for XPU

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix unit test error

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* adjust unit test case

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix bug

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* skip assited decoding unit test for models using paged attention

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* XPU CI tests get almost all passed

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
@sywangyi sywangyi changed the title Paged attn unify xpu and cpu backend and use paged attention Nov 22, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix ci config

* fix test versions

* fix ipex version

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@sywangyi sywangyi marked this pull request as draft November 22, 2024 01:34
* use python3.9 test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@sywangyi sywangyi marked this pull request as ready for review November 22, 2024 03:00
jiqing-feng and others added 2 commits November 22, 2024 13:11
* change ipex transformers limited verison in setup
* fix inc tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
@yao-matrix
Copy link

@IlyasMoutawwakil @echarlaix , pls help review, we can also have a meeting to review it if needed. Thx.

* fix bert and vit patch
* fix vit and bert save


Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@IlyasMoutawwakil
Copy link
Member

@yao-matrix reviewing right now

@jiqing-feng
Copy link
Collaborator

Hi @IlyasMoutawwakil , please also merge this PR #1024. Thanks!

* fix reorder cache for non-patch models

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* disable torch < 2.3 tests, we won't use torch < 2.4

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix test beam serach

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cache selection

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* upgrad to transformers4.46

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* change ipex test yaml transformers version to 4.46

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
setup.py Outdated Show resolved Hide resolved
* set device as the same as origin model
* fix device

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng
Copy link
Collaborator

Hi @IlyasMoutawwakil . I have replied and fixed your comments, please take the 2nd round review. Thanks~

* simplify forward and save pretrained since no jit support

* fix format

* rm warmup because no jit mode anymore

* simplify forward for causal lm model

* fix paged pkv  forward

* disable use_cache when just run forward

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

if isinstance(model, torch.jit.RecursiveScriptModule):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TorchScript models will not be compatible anymore which is an important breaking change, we need to catch this to inform users

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we need to update the documentation

For now, support is only enabled for CPUs and the original model will be exported via TorchScript. In the future `torch.compile` will be used and model exported via TorchScript will get deprecated.


return cls(model, config=config, model_save_dir=model_save_dir, **kwargs)
task = cls.export_feature
model = TasksManager.get_model_from_task(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use cls.auto_model_class ?

def test_compare_with_and_without_past_key_values(self):
model_id = "echarlaix/tiny-random-gpt2-torchscript"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to add a test to make sure an IPEX model that has been saved and then pushed on the hub is still compatible with the latest the optimum-intel version and that the model can still be correctly loaded and used for inference. Also could make sense to check the resulting output (comparing it with a transformers model) wdyt ?

Copy link
Collaborator

@jiqing-feng jiqing-feng Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid we cannot support the previous exported language model anymore because these jit models use IAVK cache which has a different logic with paged attention. It will make the code extremely massive and hard to maintain, and it will confuse users too. Besides, these jit models are also out of date compared to the current transformers version.

Copy link
Collaborator

@jiqing-feng jiqing-feng Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way is to fallback IPEXModelForCausalLM to TSModelForCausalLM when loading a jit model, but it requires config.torchscript == True so we can know it's a torch script model. So you might need to update the "echarlaix/tiny-random-gpt2-torchscript" config parameter. I have updated the model's config, please check here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I'm suggesting here is to create a new model with this implementation, push it on the hub and have a test to make sure this model is still compatible / can be correctly loaded and inference works as expected (pushing an other model than "echarlaix/tiny-random-gpt2-torchscript")

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* nice code
* device type adjustment

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* enable compile for non-generation tasks
* add no_grad in forward
* warmup compiled model
* disable compile not ready models
* set system level optimize for torch.compile
* fix typo
* add comments
* set torch minimum version for compiling

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng
Copy link
Collaborator

Hi @echarlaix @IlyasMoutawwakil , please review the new changes. Thanks

)
return TSModelForCausalLM.from_pretrained(model_id, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An instance of TSModelForCausalLM will be created for all IPEXModel (even for encoder models) which doesn't really make sense to me. Also it's not tested anywhere from what I see, I prefer to raise an error here instead of keeping support that we're not sure works / is compatible with the previous integration

def test_compare_with_and_without_past_key_values(self):
model_id = "echarlaix/tiny-random-gpt2-torchscript"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I'm suggesting here is to create a new model with this implementation, push it on the hub and have a test to make sure this model is still compatible / can be correctly loaded and inference works as expected (pushing an other model than "echarlaix/tiny-random-gpt2-torchscript")


return cls(model, config=config, model_save_dir=model_save_dir, **kwargs)
model = cls.auto_model_class.from_pretrained(model_id, **kwargs)
return cls(model, config=model.config, export=True, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would export be needed ?

Suggested change
return cls(model, config=model.config, export=True, **kwargs)
return cls(model, config=model.config, **kwargs)


if isinstance(model, torch.jit.RecursiveScriptModule):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we need to update the documentation

For now, support is only enabled for CPUs and the original model will be exported via TorchScript. In the future `torch.compile` will be used and model exported via TorchScript will get deprecated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants