Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load diffusers in native FP16/BF16 precision to reduce the memory usage #1033

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

mvafin
Copy link

@mvafin mvafin commented Nov 26, 2024

What does this PR do?

Load diffusers model in native float16 or bfloat16 precision and apply 16bit patching for that case.

Model Before After
FLUX-schnell ~120Gb image ~60Gb image
stable-diffusion-3.5-medium ~38Gb ~18Gb

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

mvafin and others added 2 commits November 26, 2024 13:09
Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com>
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
@mvafin mvafin marked this pull request as ready for review November 26, 2024 15:59
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@slyalin
Copy link
Contributor

slyalin commented Nov 29, 2024

@mvafin, any chance to merge it soon? What is the blocker?

Copy link
Collaborator

@nikita-savelyevv nikita-savelyevv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Are non-diffusion models already exported efficiently like this?

@@ -332,6 +333,48 @@ class StoreAttr(object):
return model

GPTQQuantizer.post_init_model = post_init_model
elif library_name == "diffusers" and is_safetensors_available() and is_openvino_version(">=", "2024.6"):
if Path(model_name_or_path).is_dir():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please extract and encapsulate this code to a function with a meaningful name?

@mvafin
Copy link
Author

mvafin commented Nov 29, 2024

Cool! Are non-diffusion models already exported efficiently like this?

Only for text generation models. But it even beater there, since we do not use memory to load weights, we just copy them from one location on disk to another using mmap, for diffusers we couldn't achieve this yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants