-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vote on new features in Discussions #694
Comments
Hi developers, Firstly, thanks for the great work that can demonstrate the power of PyTorch newly released features! I just have one confusion about the usage of FSDP2 To put it more clear, in most use cases of training LLM such like Lllama2, the precision of From the profiling results, we found this approach (warpping Apart from that, there are also some other use cases: dtype of MoE gating layers is required to be So, does mixed precision within a Thanks! |
@zigzagcai RMSNorm only has activations in fp32, the weights are still bf16. |
cc: @awgu |
it should be simple but Gradient Accumulation it is very useful for sfting big models. |
Hi torchtitanists,
Thank you for your interests in torchtitan!
We created #693 for the community to add feature requests and vote on them. We'll try to prioritize on the most requested features. Please share what you'd like to see next!
The text was updated successfully, but these errors were encountered: