v0.53.1-rc6
Pre-release
Pre-release
github-actions
released this
30 Nov 01:58
·
2 commits
to main
since this release
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12092456407
🚀 Features
📦 Uncategorized
- [CCL] Add negative dim support
- PR: #15305
- #12151: Replace
avg_pool2d
withglobal_avg_pool2d
- PR: #14330
- Update Qwen README.md to remove Llama references
- PR: #15409
- Optimized Llama 3.x perf with sharded residual
- PR: #15142
- #15247: Add unit test to show segfault with sharded config problem
- PR: #15249
- Fix num cores for dram sharded MM
- PR: #15373
- #0: Dispatch RTAs early in some cases
- PR: #15391
- #0: New RiscV architecture extension attributes
- PR: #15403
- #15361: Conv2d width sharded fails with tilized input
- PR: #15369
- #6659: remove dead code
- PR: #15427
- CB Size Validation Fix Rollout
- PR: #15394
- #12979: Merge erisc data & bss sections
- PR: #15267
- Fold batches into channels and use grouped convolutions in UNet Shallow
- PR: #14437
- [TT-Train] Added Yaml Configs support
- PR: #15352
- #7493: Accidently added two tests that should have been deleted durin…
- PR: #15431
- #0: Add InsertBraces: true to .clang-format
- PR: #15438
- Update unary doc examples set2
- PR: #15424
- Update unary doc examples set3
- PR: #15425
- Add all gather perf to pipeline for TG
- PR: #15001
- #0: update ref links to eltwise pytorch2 sweeps
- PR: #15355
- Relase metadata blocks on allocator destruction
- PR: #15410
- #15440: fix stack overflow in vc_packet_router
- PR: #15441
- Revert changes to Falcon7b matmul configs to fix CI tests after default matmul configs were modified
- PR: #15439
- Put the Git repo in a happy state before attempting to checkout
- PR: #15461
- Publish Release Images
- PR: #15013
- Rename Tutorial - Add Two Integers in a Baby RISC-V.md to Tutorial_Ad…
- PR: #15464
- Create Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
- PR: #15465
- UMD create_mock_cluster and fix PhysicalCoordinate
- PR: #15411
- Update Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
- PR: #15466
- Update Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
- PR: #15467
- #15123 fix performance unnecessary value param
- PR: #15187
- [tt-train] Add nanogpt tests with AdamW and MorehAdamW
- PR: #15443
- [tt-train]Rremoved slow odd option for fp16 host->device conversion.
- PR: #15470
- Remove dead param
- PR: #15472
- Allow builds to run to their natural end
- PR: #15477
- [skip ci] Update .clang-format
- PR: #15484
- Formatting pass on ttnn directory, only files not in open PRs
- PR: #15486
- Formatting pass on tt_metal directory, only files not in open PRs
- PR: #15487
- #15171: Better parallelization strategy
- PR: #15172
- [skip ci] Add clang-format to CI
- PR: #15489
- Update .clang-format-ignore
- PR: #15494
- Update ignore revs
- PR: #15495
- Add perf report docs and slightly improve output
- PR: #14664
- #13332: add ttnn implementation for Bert-Tiny model
- PR: #13471
- #0: LLM Tech Report: Intro
- PR: #15081
- Fix the test for whether to install the wheel, and also exit the script on the first error
- PR: #15480
- Populate the version based on Git Describe
- PR: #15400
- LLM tech report performance analysis
- PR: #15104
- #13875: fix tilize and attn matmul on BH
- PR: #15459
- Yolo Optimization
- PR: #15418
- #0: Port eltwise and some misc ops to use TensorSpec
- PR: #15471
- [tt-train] Implement composite LayerNorm
- PR: #15507
- #14974: ttnn::{full,empty}_like Tensor creation API for MeshDevice
- PR: #15333
- Add support to change FD cores from row to col placement
- PR: #15316
- [skip ci] Fix Formatting of host_api.hpp tables
- PR: #15515
- Use UMD's public API - no more fishing into private paths
- PR: #15322
- Use the post-merge commit of UMD
- PR: #15522
- [skip ci] Disable clang-format in pre-commit and move version ahead
- PR: #15521
- Support for new matmul1d op with gather_in0
- PR: #14964
- Remove dead includes that break without proper include paths
- PR: #15530
- More formatting of ttnn
- PR: #15519
- Make MEM_LOCAL_BASE accessible behind Hal
- PR: #15315
- Add support for rank-n tensors to tilize and untilize
- PR: #15520
- Add reduce scatter perf to tg
- PR: #15160
- Fix Circular Buffer Allocation in untilize_with_halo
- PR: #15492
- [skip ci] use git-clang-format in CI
- PR: #15528
- #13676: i1 op kernel implementation and improve i0_bw pcc
- PR: #15325
- Increase perf margin on unet
- PR: #15532
- #12558: TTNN implementation of MNIST model
- PR: #12647
- Remove unsupported shapes to make pipeline green
- PR: #15531
- #13401: Add data parallel support for Bert-Tiny model
- PR: #14033
- #15297: Allow MeshDevice to be initialized for chips without eth coordinates
- PR: #15475
- #0: Disable clang-format precommit check once again due to errors
- PR: #15556
- #15337: Fix incorrectly sized cb in remote cb microbenchmark
- PR: #15506
- [skip ci] Update CONTRIBUTING.md with pre-commit info
- PR: #15537
- Remove ClusterDescriptor path from constructor
- PR: #15554
- Add performance and accuracy configurations to Llama 3
- PR: #15545
- Disable upblock 3 and 4 unet unit tests
- PR: #15568
- Re-enable git-clang-format for pre-commit again
- PR: #15562
- Fix race condition in DRAM sharded MM
- PR: #15569
- Fix to concat support for tensors with tile padding
- PR: #15513
- Adjust perf test targets for Falcon7b-t3k-decode-noasync to account for CI instability
- PR: #15573