query adapter native in training #3084

achibb · 2024-11-25T09:40:06Z

Hi there!

Now that using adapters works, does it make sense to include it that you can use an adapter for the query / and the sentence2 natively with model.train?

tomaarsen · 2024-11-25T10:18:20Z

Hello!

That's kind of a cool setup - I'm more familiar with e.g. finetuning 2 adapters (one for queries, one for docs), but you can indeed also do 1 query adapter and just the normal SentenceTransformerTrainer for the documents. Just intuitively, you might get the best performance if you first finetune some model with your documents normally and then finetune an adapter on top of the document-finetuned model.
Otherwise you have to make your adapter on the base model - that might do a bit worse.

Having said that, this is all just guesses. I'm still training PEFT models myself to get a feel for the performance & so I can write documentation. This is my sneak peek:

Here the PEFT model reached 0.4705 NDCG@10 on the NanoBEIR datasets, whereas the base model reached 0.4728 NDCG@10 on that same dataset. At the same time, the PEFT model requires a lot less memory during training. I still have to try and scale this up to a larger model.

In short; I can't really say right now - I'm not familiar enough with PEFT and embedding models. If you'd really like to know, perhaps you can ask the Jina folks. They've trained a few models with PEFT like https://huggingface.co/jinaai/jina-embeddings-v3.

Tom Aarsen

achibb · 2024-11-25T11:48:53Z

thanks as always for the help and the quick reply sounds great:

https://weaviate.io/papers/axn

here they explain a cool paper where they use a query adapter (paired with iterations and a cross encoder), to achieve cross encoder quality. Might be generally interesting :-)

tomaarsen · 2024-11-25T15:10:58Z

I hadn't heard of that one yet - fascinating! Knowledge distillation is very powerful, we use it here as well for the MarginMSELoss: https://sbert.net/examples/training/ms_marco/README.html#marginmse

But our solution is quite a tad simpler, hah. I think their solution might not work out of the box, as there's (I don't think, anyways) a way right now to add an adapter, but only use that adapter for a portion of all inference (i.e. only the queries). It would require a custom loss/trainer/model I reckon.

Tom Aarsen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query adapter native in training #3084

query adapter native in training #3084

achibb commented Nov 25, 2024

tomaarsen commented Nov 25, 2024

achibb commented Nov 25, 2024

tomaarsen commented Nov 25, 2024

query adapter native in training #3084

query adapter native in training #3084

Comments

achibb commented Nov 25, 2024

tomaarsen commented Nov 25, 2024

achibb commented Nov 25, 2024

tomaarsen commented Nov 25, 2024