You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is it possible to perform Quantization Aware Training on Sentence Transformers, beyond fp16 and bf16 that are directly supported in the transformer training_args? Are there other options for doing binary quantization during training, other than using the Intel/Neural Compressor INCTrainer or OpenVino OVTrainer?
The text was updated successfully, but these errors were encountered:
I'm afraid the INCTrainer/OVTrainer/ORTTrainer aren't directly compatible with Sentence Transformers. Beyond that, you can load models in specific quantization with bitsandbytes via model_kwargs, and you can use PEFT as well (though that doesn't do quantization per se).
I do want to preface that there's a difference between quantization from Speeding up Inference and Embedding Quantization. The former allows for faster inference, and the latter is a post-processing of the output embeddings so that downstream tasks (e.g. retrieval) are faster.
So there's a difference between:
Quantization-aware training, i.e. training such that the model has minimal performance loss when quantizing the model weights for faster inference.
Quantization-aware training, i.e. training such that the model has minimal performance loss when quantizing the output embeddings for faster downstream tasks.
For the first one, there's not great options out of the box to my knowledge, and for the latter you can consider the Binary Passage Retrieval Loss (bpr_loss) which is compatible with Sentence Transformers.
Is it possible to perform
Quantization Aware Training
on Sentence Transformers, beyond fp16 and bf16 that are directly supported in the transformer training_args? Are there other options for doing binary quantization during training, other than using the Intel/Neural Compressor INCTrainer or OpenVino OVTrainer?The text was updated successfully, but these errors were encountered: