Load diffusers in native FP16/BF16 precision to reduce the memory usage #1033

mvafin · 2024-11-26T11:32:19Z

What does this PR do?

Load diffusers model in native float16 or bfloat16 precision and apply 16bit patching for that case.

Model	Before	After
FLUX-schnell	~120Gb	~60Gb
stable-diffusion-3.5-medium	~38Gb	~18Gb

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

optimum/exporters/openvino/__main__.py

Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com>

optimum/exporters/openvino/__main__.py

Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>

optimum/exporters/openvino/__main__.py

HuggingFaceDocBuilderDev · 2024-11-27T08:07:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

slyalin · 2024-11-29T12:25:14Z

@mvafin, any chance to merge it soon? What is the blocker?

nikita-savelyevv

Cool! Are non-diffusion models already exported efficiently like this?

AlexKoff88 · 2024-11-29T13:13:16Z

optimum/exporters/openvino/__main__.py

@@ -332,6 +333,48 @@ class StoreAttr(object):
                return model

            GPTQQuantizer.post_init_model = post_init_model
+    elif library_name == "diffusers" and is_safetensors_available() and is_openvino_version(">=", "2024.6"):
+        if Path(model_name_or_path).is_dir():


Can you please extract and encapsulate this code to a function with a meaningful name?

mvafin · 2024-11-29T18:09:45Z

Cool! Are non-diffusion models already exported efficiently like this?

Only for text generation models. But it even beater there, since we do not use memory to load weights, we just copy them from one location on disk to another using mmap, for diffusers we couldn't achieve this yet.

Load diffusers in native FP16/BF16 precision to reduce the memory usage

e6033d2

eaidova reviewed Nov 26, 2024

View reviewed changes

optimum/exporters/openvino/__main__.py Outdated Show resolved Hide resolved

eaidova reviewed Nov 26, 2024

View reviewed changes

optimum/exporters/openvino/__main__.py Outdated Show resolved Hide resolved

eaidova reviewed Nov 26, 2024

View reviewed changes

optimum/exporters/openvino/__main__.py Outdated Show resolved Hide resolved

mvafin and others added 2 commits November 26, 2024 13:09

Apply suggestions from code review

1177875

Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com>

Fix code

4d251a0

rkazants reviewed Nov 26, 2024

View reviewed changes

optimum/exporters/openvino/__main__.py Outdated Show resolved Hide resolved

Update optimum/exporters/openvino/__main__.py

73b15bf

Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>

mvafin requested review from rkazants and eaidova November 26, 2024 12:56

mvafin added 2 commits November 26, 2024 14:39

Find first floating point tensor instead of first tensor

cb85de2

Fix style

36d3c99

mvafin marked this pull request as ready for review November 26, 2024 15:59

eaidova reviewed Nov 27, 2024

View reviewed changes

optimum/exporters/openvino/__main__.py Outdated Show resolved Hide resolved

mvafin commented Nov 27, 2024

View reviewed changes

optimum/exporters/openvino/__main__.py Outdated Show resolved Hide resolved

Update optimum/exporters/openvino/__main__.py

4ce2feb

mvafin commented Nov 27, 2024

View reviewed changes

optimum/exporters/openvino/__main__.py Outdated Show resolved Hide resolved

Update optimum/exporters/openvino/__main__.py

d5f13db

eaidova approved these changes Nov 27, 2024

View reviewed changes

mvafin added 2 commits November 27, 2024 09:21

Check if safetensors available

33e7deb

Fix style

55a72fc

eaidova requested review from nikita-savelyevv and AlexKoff88 and removed request for rkazants November 29, 2024 12:27

slyalin mentioned this pull request Nov 29, 2024

Loading pipeline in precision it was saved in huggingface/diffusers#9797

Open

nikita-savelyevv approved these changes Nov 29, 2024

View reviewed changes

AlexKoff88 reviewed Nov 29, 2024

View reviewed changes

AlexKoff88 requested a review from echarlaix November 29, 2024 13:37

AlexKoff88 requested a review from IlyasMoutawwakil November 29, 2024 13:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load diffusers in native FP16/BF16 precision to reduce the memory usage #1033

Load diffusers in native FP16/BF16 precision to reduce the memory usage #1033

mvafin commented Nov 26, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 27, 2024

slyalin commented Nov 29, 2024

nikita-savelyevv left a comment

AlexKoff88 Nov 29, 2024

mvafin commented Nov 29, 2024

Load diffusers in native FP16/BF16 precision to reduce the memory usage #1033

Are you sure you want to change the base?

Load diffusers in native FP16/BF16 precision to reduce the memory usage #1033

Conversation

mvafin commented Nov 26, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Nov 27, 2024

slyalin commented Nov 29, 2024

nikita-savelyevv left a comment

Choose a reason for hiding this comment

AlexKoff88 Nov 29, 2024

Choose a reason for hiding this comment

mvafin commented Nov 29, 2024

mvafin commented Nov 26, 2024 •

edited

Loading