-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FSDP Training with Sentence Transformer #3023
Comments
I took a stab on training with FSDP, and encountered quite a few issues: huggingface/accelerate#3201 |
Hello! There are some details here for me to get it running originally: https://sbert.net/docs/sentence_transformer/training/distributed.html#fsdp
|
Thanks! I am working on this direction now and would like to hear your input! While you subclass the transformer trainer class and create the sentence transformer trainer, is there a guideline that you follow to write the customized trainer? I notice that you overwrite the compute loss, prepare inputs, and other class methods. Are you following some template or guidelines or just check the trainer source code line by line to make changes? |
I don't really check it line-by-line, but I'm somewhat familiar with the overall structure of the That's why the Sentence Transformers trainer file is only ~900 lines long, compared to ~5k for the base Trainer. |
Hi @tomaarsen, if a model is wrapped, can we directly update the model in the loss function with |
Based on my experiments, the evaluator cannot work out-of-the-box with fsdp, and it keeps throwing |
I have successfully finetuned llama3 for text embedding with FSDP and sentence-transformer with some modifications. |
The core issue is to make the model in the loss function be aware of the FSDP setting. I may create a new PR if necessary. |
Apologies for the delay! The This is why we have to override the original As for the evaluator: FSDP splits the model up into pieces and separates it across the various devices. This is happening inside of the sentence-transformers/sentence_transformers/trainer.py Lines 441 to 442 in 68dfbe6
Ideally, we'd have e.g. DDP support here where we can split the computations for the evaluator across devices, but that hasn't been implemented. FSDP would indeed be even nicer, but a lot more complex as well. Either way, the evaluator breaks with FSDP because we only run it on the first process. Nice work on getting it to work! I'm open to PRs.
|
Given there are so many LLM-based models on top of MTEB benchmark nowadays, is there a canonical way to train with FSDP now? I'm trying to explore along this direction, but I just want to ask if there already exists some examples before I rebuild the wheel.
The text was updated successfully, but these errors were encountered: