Release v0.53.1-rc6 · tenstorrent/tt-metal

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12092456407

🚀 Features

#15540: support bfp8_b 1d tensor in ttnn
- PR: #15541
#15542: Add dtype to to_torch
- PR: #15543

📦 Uncategorized

[CCL] Add negative dim support
- PR: #15305
#12151: Replace avg_pool2d with global_avg_pool2d
- PR: #14330
Update Qwen README.md to remove Llama references
- PR: #15409
Optimized Llama 3.x perf with sharded residual
- PR: #15142
#15247: Add unit test to show segfault with sharded config problem
- PR: #15249
Fix num cores for dram sharded MM
- PR: #15373
#0: Dispatch RTAs early in some cases
- PR: #15391
#0: New RiscV architecture extension attributes
- PR: #15403
#15361: Conv2d width sharded fails with tilized input
- PR: #15369
#6659: remove dead code
- PR: #15427
CB Size Validation Fix Rollout
- PR: #15394
#12979: Merge erisc data & bss sections
- PR: #15267
Fold batches into channels and use grouped convolutions in UNet Shallow
- PR: #14437
[TT-Train] Added Yaml Configs support
- PR: #15352
#7493: Accidently added two tests that should have been deleted durin…
- PR: #15431
#0: Add InsertBraces: true to .clang-format
- PR: #15438
Update unary doc examples set2
- PR: #15424
Update unary doc examples set3
- PR: #15425
Add all gather perf to pipeline for TG
- PR: #15001
#0: update ref links to eltwise pytorch2 sweeps
- PR: #15355
Relase metadata blocks on allocator destruction
- PR: #15410
#15440: fix stack overflow in vc_packet_router
- PR: #15441
Revert changes to Falcon7b matmul configs to fix CI tests after default matmul configs were modified
- PR: #15439
Put the Git repo in a happy state before attempting to checkout
- PR: #15461
Publish Release Images
- PR: #15013
Rename Tutorial - Add Two Integers in a Baby RISC-V.md to Tutorial_Ad…
- PR: #15464
Create Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
- PR: #15465
UMD create_mock_cluster and fix PhysicalCoordinate
- PR: #15411
Update Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
- PR: #15466
Update Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
- PR: #15467
#15123 fix performance unnecessary value param
- PR: #15187
[tt-train] Add nanogpt tests with AdamW and MorehAdamW
- PR: #15443
[tt-train]Rremoved slow odd option for fp16 host->device conversion.
- PR: #15470
Remove dead param
- PR: #15472
Allow builds to run to their natural end
- PR: #15477
[skip ci] Update .clang-format
- PR: #15484
Formatting pass on ttnn directory, only files not in open PRs
- PR: #15486
Formatting pass on tt_metal directory, only files not in open PRs
- PR: #15487
#15171: Better parallelization strategy
- PR: #15172
[skip ci] Add clang-format to CI
- PR: #15489
Update .clang-format-ignore
- PR: #15494
Update ignore revs
- PR: #15495
Add perf report docs and slightly improve output
- PR: #14664
#13332: add ttnn implementation for Bert-Tiny model
- PR: #13471
#0: LLM Tech Report: Intro
- PR: #15081
Fix the test for whether to install the wheel, and also exit the script on the first error
- PR: #15480
Populate the version based on Git Describe
- PR: #15400
LLM tech report performance analysis
- PR: #15104
#13875: fix tilize and attn matmul on BH
- PR: #15459
Yolo Optimization
- PR: #15418
#0: Port eltwise and some misc ops to use TensorSpec
- PR: #15471
[tt-train] Implement composite LayerNorm
- PR: #15507
#14974: ttnn::{full,empty}_like Tensor creation API for MeshDevice
- PR: #15333
Add support to change FD cores from row to col placement
- PR: #15316
[skip ci] Fix Formatting of host_api.hpp tables
- PR: #15515
Use UMD's public API - no more fishing into private paths
- PR: #15322
Use the post-merge commit of UMD
- PR: #15522
[skip ci] Disable clang-format in pre-commit and move version ahead
- PR: #15521
Support for new matmul1d op with gather_in0
- PR: #14964
Remove dead includes that break without proper include paths
- PR: #15530
More formatting of ttnn
- PR: #15519
Make MEM_LOCAL_BASE accessible behind Hal
- PR: #15315
Add support for rank-n tensors to tilize and untilize
- PR: #15520
Add reduce scatter perf to tg
- PR: #15160
Fix Circular Buffer Allocation in untilize_with_halo
- PR: #15492
[skip ci] use git-clang-format in CI
- PR: #15528
#13676: i1 op kernel implementation and improve i0_bw pcc
- PR: #15325
Increase perf margin on unet
- PR: #15532
#12558: TTNN implementation of MNIST model
- PR: #12647
Remove unsupported shapes to make pipeline green
- PR: #15531
#13401: Add data parallel support for Bert-Tiny model
- PR: #14033
#15297: Allow MeshDevice to be initialized for chips without eth coordinates
- PR: #15475
#0: Disable clang-format precommit check once again due to errors
- PR: #15556
#15337: Fix incorrectly sized cb in remote cb microbenchmark
- PR: #15506
[skip ci] Update CONTRIBUTING.md with pre-commit info
- PR: #15537
Remove ClusterDescriptor path from constructor
- PR: #15554
Add performance and accuracy configurations to Llama 3
- PR: #15545
Disable upblock 3 and 4 unet unit tests
- PR: #15568
Re-enable git-clang-format for pre-commit again
- PR: #15562
Fix race condition in DRAM sharded MM
- PR: #15569
Fix to concat support for tensors with tile padding
- PR: #15513
Adjust perf test targets for Falcon7b-t3k-decode-noasync to account for CI instability
- PR: #15573

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.53.1-rc6

🚀 Features

📦 Uncategorized