-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added notebook to showcase quantization of Sentence Transformers model #955
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AlexKoff88 this is a great example.
], | ||
"source": [ | ||
"# FP32 baseline model\n", | ||
"!benchmark_app -m all-MiniLM-L6-v2/openvino_model.xml -shape \"input_ids[1,384],attention_mask[1,384],token_type_ids[1,384]\" -api sync -niter 200" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reshapes the model to static shapes, which has great improvement for especially INT8. But most people will not use static shapes in practice, and padding/truncating to 384 is not always desired. IMO it is fairer to compare performance by looping over a dataset (e.g. modifying the evaluate function to add timings) but then there is not as much of a performance difference. If we keep benchmark_app, it would be good to at least explain the static shapes. (Using data_shape instead of shape in benchmark_app does not reshape the model, but you still use the same shape length everywhere, so still not a standard use case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Helena. I would not agree in this specific case as the tokenizer truncates data anyway so it is about static shape. But I can add information about it.
Co-authored-by: Helena Kloosterman <helena.kloosterman@intel.com>
Co-authored-by: Helena Kloosterman <helena.kloosterman@intel.com>
Co-authored-by: Helena Kloosterman <helena.kloosterman@intel.com>
Why is the DATASET_NAME = "squad"
dataset = datasets.load_dataset(DATASET_NAME) |
Thanks @l-bat. Fixed |
PR is ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the addition @AlexKoff88. Could also be added to https://github.com/huggingface/optimum-intel/blob/v1.20.0/notebooks/openvino/README.md
will do in the follow-up PR |
No description provided.