-
Notifications
You must be signed in to change notification settings - Fork 214
Issues: pytorch/torchtitan
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
First Shard Group Save and Load Checkpoint for HSDP
question
Further information is requested
#709
opened Nov 29, 2024 by
qsh-zh
Is Further information is requested
autocast
needed with FSDP2?
question
#700
opened Nov 25, 2024 by
garrett361
[rfc] torchtitan release practices
release_blocking
Issues that are blocking the milestone / release completion
torch.compile(sync_float8_amax_and_scale_history) not working with triton latest main
bug
Something isn't working
#681
opened Nov 19, 2024 by
goldhuang
[Parallelism] Implement vocabulary parallelism
enhancement
New feature or request
#680
opened Nov 15, 2024 by
casper-hansen
Any suggestion for Llama-3.1-70b(128k seq len) deploy mesh with torchtian?
enhancement
New feature or request
question
Further information is requested
#678
opened Nov 15, 2024 by
medivh-xp
Fine-Tuning Llama Model with Large Context and Customized Dataset Using Torchtitan
enhancement
New feature or request
question
Further information is requested
#677
opened Nov 14, 2024 by
Amerehei
Very low wps with H200 Gpus
question
Further information is requested
#676
opened Nov 13, 2024 by
aniltrkkn
[Config] Make the checkpoint New feature or request
good first issue
Good for newcomers
step
configurable.
enhancement
#662
opened Oct 30, 2024 by
casper-hansen
Questions about FSDP2 support and memory usage.
question
Further information is requested
#658
opened Oct 29, 2024 by
tangjiasheng
torch.distributed.breakpoint(rank=1) hangs because of --local-ranks-filter 0
documentation
Improvements or additions to documentation
#652
opened Oct 25, 2024 by
weifengpy
[Multimodal] Adding OBELICS DataLoader
enhancement
New feature or request
#650
opened Oct 24, 2024 by
TJ-Solergibert
[Config] Make FSDP New feature or request
reshard_after_forward: bool
configurable
enhancement
#644
opened Oct 22, 2024 by
awgu
What is the expected inference steps after I apply torchao in training?
question
Further information is requested
#638
opened Oct 21, 2024 by
goldhuang
add H100 in CI
better_engineering
Repo code quality improvements
integration test
Adding integration tests
#632
opened Oct 18, 2024 by
tianyu-l
create a note on torchtitan official release
documentation
Improvements or additions to documentation
release_blocking
Issues that are blocking the milestone / release completion
Non-DP runs default to float32 precision
enhancement
New feature or request
#630
opened Oct 18, 2024 by
carmocca
[Triton] Implement Liger Kernels
enhancement
New feature or request
#623
opened Oct 17, 2024 by
casper-hansen
Ability to train based on epoch
enhancement
New feature or request
good first issue
Good for newcomers
#613
opened Oct 13, 2024 by
abatilo
[Compile] Understand why FSDP2 saves both SDPA out and wo in for bwd
question
Further information is requested
#610
opened Oct 11, 2024 by
awgu
Granular layer selection during Pipeline Parallelism
enhancement
New feature or request
#598
opened Oct 3, 2024 by
bhuvan777
Previous Next
ProTip!
Follow long discussions with comments:>50.