-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update train_sts_seed_optimization with SentenceTransformerTrainer #3092
Conversation
# Configure the training. We skip evaluation in this example | ||
warmup_steps = math.ceil(len(train_dataloader) * num_epochs * 0.1) # 10% of train data for warm-up | ||
warmup_steps = math.ceil(len(train_dataset) * num_epochs * 0.1) # 10% of train data for warm-up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SentenceTransformerTrainingArguments
has a warmup_ratio=0.1
that we can use instead.
|
||
# Stopping and Evaluating after 30% of training data (less than 1 epoch) | ||
# We find from (Dodge et al.) that 20-30% is often ideal for convergence of random seed | ||
steps_per_epoch = math.ceil(len(train_dataloader) * stop_after) | ||
steps_per_epoch = math.ceil(len(train_dataset) * stop_after) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is used right now
steps_per_epoch=steps_per_epoch, | ||
evaluation_steps=1000, | ||
# 5. Define the training arguments | ||
args = SentenceTransformerTrainingArguments( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the stop_after
isn't actually making it stop after this many steps.
Normally you can use max_steps
, but then I think it messes with the scheduler, ideally we want the scheduler to be "normal" but then still stop after stop_after
steps, but I'm not sure if that's the old behaviour either.
I'm also curious what you mean with the outdated docs - I'd also like that to be fixed if possible. |
ahh yeah ill change those. sorry about the docs, i was accidentally referring to the old fit method here - https://www.sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#:~:text=steps_per_epoch%20%E2%80%93%20Number%20of%20training%20steps,the%20DataLoader%20size%20from%20train_objectives.&text=warmup_steps%20%E2%80%93%20Behavior%20depends%20on%20the,to%20the%20maximal%20learning%20rate. and saw args like steps_per_epoch and warmup_steps that werent there in the Trainer |
also, i dont quite understand the stop_after bit as well, is a custom callback expected? |
Makes sense, this is a little confusing. I think the idea is that we create 1 epoch of e.g. 100k steps. The seed (e.g. for data sampling) has been shown to be fairly important for training embedding models, so we want to train e.g. the first 30k steps out of the 100k and then see where we're at. Then we can pick the seed that performed the best after just a bit of training. But if we use Instead, we want the scheduler to think that we're doing 100k steps, but indeed we want the training to stop after 30k (or
|
This was my final log with this script at the default parameters:
So there's indeed a pretty big difference, 0.827 VS 0.851.
|
ah thanks for explaining, this makes sense. |
A difference? Between the evaluation scores you mean?
|
ahh okay okay |
Thanks for tackling this!
|
This PR updates the example script train_sts_seed_optimization.py with SentenceTransformerTrainer.
I also noticed the documentation was quite outdated when I was referring for some args, should we try and look to update them too?
@tomaarsen