Support for Quantization Aware Training #3031

lejinvarghese · 2024-11-03T13:43:46Z

Is it possible to perform Quantization Aware Training on Sentence Transformers, beyond fp16 and bf16 that are directly supported in the transformer training_args? Are there other options for doing binary quantization during training, other than using the Intel/Neural Compressor INCTrainer or OpenVino OVTrainer?

The text was updated successfully, but these errors were encountered:

tomaarsen · 2024-11-06T13:03:25Z

Hello!

I'm afraid the INCTrainer/OVTrainer/ORTTrainer aren't directly compatible with Sentence Transformers. Beyond that, you can load models in specific quantization with bitsandbytes via model_kwargs, and you can use PEFT as well (though that doesn't do quantization per se).

I do want to preface that there's a difference between quantization from Speeding up Inference and Embedding Quantization. The former allows for faster inference, and the latter is a post-processing of the output embeddings so that downstream tasks (e.g. retrieval) are faster.

So there's a difference between:

Quantization-aware training, i.e. training such that the model has minimal performance loss when quantizing the model weights for faster inference.
Quantization-aware training, i.e. training such that the model has minimal performance loss when quantizing the output embeddings for faster downstream tasks.

For the first one, there's not great options out of the box to my knowledge, and for the latter you can consider the Binary Passage Retrieval Loss (bpr_loss) which is compatible with Sentence Transformers.

Tom Aarsen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Quantization Aware Training #3031

Support for Quantization Aware Training #3031

lejinvarghese commented Nov 3, 2024

tomaarsen commented Nov 6, 2024

Support for Quantization Aware Training #3031

Support for Quantization Aware Training #3031

Comments

lejinvarghese commented Nov 3, 2024

tomaarsen commented Nov 6, 2024