Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Quantization Aware Training #3031

Open
lejinvarghese opened this issue Nov 3, 2024 · 1 comment
Open

Support for Quantization Aware Training #3031

lejinvarghese opened this issue Nov 3, 2024 · 1 comment

Comments

@lejinvarghese
Copy link

Is it possible to perform Quantization Aware Training on Sentence Transformers, beyond fp16 and bf16 that are directly supported in the transformer training_args? Are there other options for doing binary quantization during training, other than using the Intel/Neural Compressor INCTrainer or OpenVino OVTrainer?

@tomaarsen
Copy link
Collaborator

Hello!

I'm afraid the INCTrainer/OVTrainer/ORTTrainer aren't directly compatible with Sentence Transformers. Beyond that, you can load models in specific quantization with bitsandbytes via model_kwargs, and you can use PEFT as well (though that doesn't do quantization per se).

I do want to preface that there's a difference between quantization from Speeding up Inference and Embedding Quantization. The former allows for faster inference, and the latter is a post-processing of the output embeddings so that downstream tasks (e.g. retrieval) are faster.

So there's a difference between:

  • Quantization-aware training, i.e. training such that the model has minimal performance loss when quantizing the model weights for faster inference.
  • Quantization-aware training, i.e. training such that the model has minimal performance loss when quantizing the output embeddings for faster downstream tasks.

For the first one, there's not great options out of the box to my knowledge, and for the latter you can consider the Binary Passage Retrieval Loss (bpr_loss) which is compatible with Sentence Transformers.

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants