Skip to content

Issues: pytorch/torchtitan

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

First Shard Group Save and Load Checkpoint for HSDP question Further information is requested
#709 opened Nov 29, 2024 by qsh-zh
Is autocast needed with FSDP2? question Further information is requested
#700 opened Nov 25, 2024 by garrett361
[rfc] torchtitan release practices release_blocking Issues that are blocking the milestone / release completion
#688 opened Nov 22, 2024 by tianyu-l torchtitan v1.0.0 release
[Parallelism] Implement vocabulary parallelism enhancement New feature or request
#680 opened Nov 15, 2024 by casper-hansen
Any suggestion for Llama-3.1-70b(128k seq len) deploy mesh with torchtian? enhancement New feature or request question Further information is requested
#678 opened Nov 15, 2024 by medivh-xp
Fine-Tuning Llama Model with Large Context and Customized Dataset Using Torchtitan enhancement New feature or request question Further information is requested
#677 opened Nov 14, 2024 by Amerehei
Very low wps with H200 Gpus question Further information is requested
#676 opened Nov 13, 2024 by aniltrkkn
[Config] Make the checkpoint step configurable. enhancement New feature or request good first issue Good for newcomers
#662 opened Oct 30, 2024 by casper-hansen
Questions about FSDP2 support and memory usage. question Further information is requested
#658 opened Oct 29, 2024 by tangjiasheng
torch.distributed.breakpoint(rank=1) hangs because of --local-ranks-filter 0 documentation Improvements or additions to documentation
#652 opened Oct 25, 2024 by weifengpy
[Multimodal] Adding OBELICS DataLoader enhancement New feature or request
#650 opened Oct 24, 2024 by TJ-Solergibert
Convergence testing best practices documentation Improvements or additions to documentation
#648 opened Oct 22, 2024 by gnadathur torchtitan v1.0.0 release
[Config] Make FSDP reshard_after_forward: bool configurable enhancement New feature or request
#644 opened Oct 22, 2024 by awgu
What is the expected inference steps after I apply torchao in training? question Further information is requested
#638 opened Oct 21, 2024 by goldhuang
add H100 in CI better_engineering Repo code quality improvements integration test Adding integration tests
#632 opened Oct 18, 2024 by tianyu-l
create a note on torchtitan official release documentation Improvements or additions to documentation release_blocking Issues that are blocking the milestone / release completion
#631 opened Oct 18, 2024 by tianyu-l torchtitan v1.0.0 release
Non-DP runs default to float32 precision enhancement New feature or request
#630 opened Oct 18, 2024 by carmocca
add Llama 3.2 support enhancement New feature or request
#625 opened Oct 18, 2024 by tianyu-l
[Triton] Implement Liger Kernels enhancement New feature or request
#623 opened Oct 17, 2024 by casper-hansen
Ability to train based on epoch enhancement New feature or request good first issue Good for newcomers
#613 opened Oct 13, 2024 by abatilo
[Compile] Understand why FSDP2 saves both SDPA out and wo in for bwd question Further information is requested
#610 opened Oct 11, 2024 by awgu
ProTip! Follow long discussions with comments:>50.