Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query adapter native in training #3084

Open
achibb opened this issue Nov 25, 2024 · 3 comments
Open

query adapter native in training #3084

achibb opened this issue Nov 25, 2024 · 3 comments

Comments

@achibb
Copy link

achibb commented Nov 25, 2024

Hi there!

Now that using adapters works, does it make sense to include it that you can use an adapter for the query / and the sentence2 natively with model.train?

@tomaarsen
Copy link
Collaborator

Hello!

That's kind of a cool setup - I'm more familiar with e.g. finetuning 2 adapters (one for queries, one for docs), but you can indeed also do 1 query adapter and just the normal SentenceTransformerTrainer for the documents. Just intuitively, you might get the best performance if you first finetune some model with your documents normally and then finetune an adapter on top of the document-finetuned model.
Otherwise you have to make your adapter on the base model - that might do a bit worse.

Having said that, this is all just guesses. I'm still training PEFT models myself to get a feel for the performance & so I can write documentation. This is my sneak peek:

Here the PEFT model reached 0.4705 NDCG@10 on the NanoBEIR datasets, whereas the base model reached 0.4728 NDCG@10 on that same dataset. At the same time, the PEFT model requires a lot less memory during training. I still have to try and scale this up to a larger model.

In short; I can't really say right now - I'm not familiar enough with PEFT and embedding models. If you'd really like to know, perhaps you can ask the Jina folks. They've trained a few models with PEFT like https://huggingface.co/jinaai/jina-embeddings-v3.

  • Tom Aarsen

@achibb
Copy link
Author

achibb commented Nov 25, 2024

thanks as always for the help and the quick reply sounds great:

https://weaviate.io/papers/axn

here they explain a cool paper where they use a query adapter (paired with iterations and a cross encoder), to achieve cross encoder quality. Might be generally interesting :-)

@tomaarsen
Copy link
Collaborator

I hadn't heard of that one yet - fascinating! Knowledge distillation is very powerful, we use it here as well for the MarginMSELoss: https://sbert.net/examples/training/ms_marco/README.html#marginmse

But our solution is quite a tad simpler, hah. I think their solution might not work out of the box, as there's (I don't think, anyways) a way right now to add an adapter, but only use that adapter for a portion of all inference (i.e. only the queries). It would require a custom loss/trainer/model I reckon.

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants