-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Q] LT-SFT and enc-dec models? #2
Comments
Hi Adam, I've done some experiments on BART with LT-SFT and I can confirm that it works, so I'm pretty sure T5 should work as well. I think you should be able to use LotteryTicketSparseFineTuner without modification, although the boilerplate code in the example scripts will likely require some adjustment for generative models. It's important to note that as with the BERT style models, you should generally decouple the input and output embedding matrices and freeze the output embeddings to achieve good performance. |
@AlanAnsell thank You for quick reply. Could You share scripts with BART experiments? It would be great starting point for further experimentation and adaptation for T5 architecture. |
Unfortunately I can't share those experiments with you right now, but I generally expect that adaptation shouldn't be too difficult, e.g. for BART I replaced DataCollatorForLanguageModeling with DataCollatorForDenoisingTasks I found here: https://github.com/morganmcg1/rotobart/blob/main/data_collator.py. |
I'm wondering if this method should (theoretically) work with enc-dec models? Have You tried to train those models with code from this repository? I'm interested in utilizing this approach with T5 model.
The text was updated successfully, but these errors were encountered: