Release v0.53.0-rc14 · tenstorrent/tt-metal

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/11301561247

📦 Uncategorized

#11962: remove uint8 unpack reconfig code
- PR: #13218
Update slack notification owner for t3k-model-perf-falcon7b
- PR: #13289
#12040: add transpose trace sweeps
- PR: #13252
Divanovic/llama tg demo
- PR: #13105
#0: Fix bug in perplexity script for Llama
- PR: #13301
#0: Update cast in ncrisc BH init code
- PR: #13295
#0: Move remote chip event synchronization to dispatch core
- PR: #13256
Vanilla Unet conv unit_test
- PR: #13267
#11740: Extend post commit coverage and add sweep test
- PR: #13040
#13269: Revise moreh_norm, moreh_norm_backward operations
- PR: #13270
#13140: Cleanup Binary Backward ops
- PR: #13286
#13315: Revise moreh_bmm, moreh_bmm_backward operations
- PR: #13316
#0: TG Llama3-70b - fix frequent tests
- PR: #13322
Revert "#11962: remove uint8 unpack reconfig code"
- PR: #13306
Llama318b continuous batching + Paged Attention Support
- PR: #13205
#0: Remove demo output files from Llama3.1-8B
- PR: #13325
#11592: use the semaphore indices returned by CreateSemaphore
- PR: #13297
#9370: removed ndpcc work around and debug code in sdpa decode and re-enabled CI
- PR: #13299
#0: Bump trace region size to 20MB for T3K LLAMA2
- PR: #13309
Not holding state for freshening profiler logs
- PR: #13335
#13136: Consolidate all_gather and line_all_gather to common api
- PR: #13148
#11005: Added CreateKernelFromString()
- PR: #12789
#11622: sweep concat traces
- PR: #13345
#0: Bump ttnn bert perf threshold to account for recent refactoring
- PR: #13346
#0: fix CCL nightly and frequent test reqression suites
- PR: #13349
#13142: Add documentation for device ops, memory config
- PR: #13166
#13128: Add cmake options to control what tests get built
- PR: #13251
[skip ci] Update CODEOWNERS for CMakeLists.txt
- PR: #13221
Update matrix_engine.md
- PR: #13350
#13258: build_metal.sh enhancements
- PR: #13259
Flash decode improvements r3
- PR: #13351
#0: shortened flash decode tests to avoid potential timeout in fast dispatch
- PR: #13358
#12632: Migrate moreh_layer_norm operation from tt_eager to ttnn
- PR: #12633
#11844: Add dispatch_s for asynchronously sending go signals
- PR: #13069
#12805: Migrate moreh_sum_backward operation from tt_eager to ttnn
- PR: #12806
#13187: revise moreh_mean and moreh_mean_backward
- PR: #13260
#12687: port moreh_group_norm and moreh_group_norm_backward from tt_dnn to ttnn
- PR: #12755
#12694 Refactor moreh_linear and moreh_linear_backward
- PR: #12812
#13246: Remove unary_backward_op.hpp
- PR: #13247
#0: integrate distributed sharded layernrm with llama-tg
- PR: #13225
Add support for matmul 1D having L1 sharded weights
- PR: #13094
#11791: linker script cleanups
- PR: #13305
#0: Add copy sweep
- PR: #13356
#12214: refactor moreh_sgd from deprecated to ttnn
- PR: #12378
[Nightly fast dispatch CI] Fix Llama3.1-8B tests running out of memory
- PR: #13362
Update perf target for one falcon7b config due to CI variation
- PR: #13355
Add bitwise ops sweeps, add gen_rand_bitwise_left_shift function
- PR: #13366
Multiple watcher-related updates
- PR: #13029
#11621: add filler sweeps for expand, fill, split_with_sizes, index_select and .t
- PR: #13359
#13363: Surface job errors where Set up runner does not complete successfully
- PR: #13379
#13127: Remove shape_without_padding() pybinding and usage
- PR: #13369
#11208: Refactor ProgramCache to remove nested type erasure
- PR: #13216
#11208: Slotmap datastructure for creating resource pools
- PR: #13378
#13365: added program caching for page tensor for flash decode
- PR: #13381
Update llama ttft in README.md
- PR: #13389
#0: Add tech report for inf/nan handling
- PR: #13391
#11403: SubMesh Support + Porting/Stamping T3K Tests to Galaxy
- PR: #12962
Add new ttnn sweeps
- PR: #13239
Remove profiler core flat id look up
- PR: #13377
#11789: Fix firmware/kernel padding/alignment
- PR: #13367
#8534: Publish tt-metal docs to the central site
- PR: #10356
#0: Sweeps Logger Fixes
- PR: #13423
Mchiou/13011 dump firmware and system logs if ci jobs fail
- PR: #13231
#13419: Handle cases where GitHub timeout on a job cuts off the data in a test in a Junit XML, leaving no data to use
- PR: #13425
#12605: Add governor notes and move models steps into separate steps
- PR: #12703
#13254: switch pgm dispatch to use trace, add it to CI
- PR: #13255
#10016: jit_build: link substitutes, tdma_xmov, noc
- PR: #13430
#11208: Slotmap datastructure for creating resource pools
- PR: #13427
#0: Dispatch_s + Launch Message Ring Buffer Bugfixes
- PR: #13393
#0: Reduce copy sweep to cover only bf16
- PR: #13436
#13394: Galaxy 2cq support
- PR: #13422
#0: Fix ncrisc code overflow problem
- PR: #13442
Add more pipelines to top-level "Choose your pipeline" workflows
- PR: #13446
#13127: Update ttnn::Shape struct to maintain API parity with existing tt::tt_metal::LegacyShape usages
- PR: #13382
#0: SegFormer on n150 - functional
- PR: #13384
#7091: Add git commit runbook to CONTRIBUTING.md
- PR: #13371
Moving DRAM/L1_UNRESERVED_BASE into HAL
- PR: #13296
#11401: Add supplementary tensor parallel example to regression
- PR: #12434
#13432: fix t3k ethernet tests
- PR: #13453
#0: fix mesh device fixture selection for test_distributed_layernorm
- PR: #13433
#13454: Refactor API for MeshDevice::enable_async
- PR: #13455
deprecate JAWBRIDGE
- PR: #13449
#8488: Update activation list in doc
- PR: #13282
#13424: Add documentation for opt output tensor and qid
- PR: #13443
#8428: Update sweep config and doc for polyval
- PR: #13196
#7712: Update elu, erf variant sweep config and doc
- PR: #13156
#7961: Update logical or doc and sweep config
- PR: #13188
Llama 3.1 8b DRAM-shard the LM head, 23.1 t/s/u
- PR: #13340
#12559: add ttnn implementation for convnet_mnist model
- PR: #12649
#13143: Add documentation for core, set_printoptions ops
- PR: #13199
#13144: Add documentation for tensor creation ops, matmul ops
- PR: #13155
Jvega/readme changes
- PR: #13431
#0: TG-Llama3-70b - Add compilation step to demo
- PR: #13416
TG Llama3-70b prefill frequent tests enabled
- PR: #13472
#11791: proper bss, stack only on firmware
- PR: #13375
Add more eltwise unary ops
- PR: #13465
#11307: Remove l1_buffer
- PR: #13451
Fix composite ops asserting on perf report generation
- PR: #13480
#11791: Implement Elf reading
- PR: #13388
#13482: Resolve 2CQ Trace Hangs on TG
- PR: #13484
Add DPRINT support for CB rd/wr pointers from BRISC/NCRISC
- PR: #13489
Refactor TT-NN / TT-Metal Mesh/Multi-device related into separate subdirectory
- PR: #13460
#13127: Add get_logical_shape/get_padded_shape to Tensor
- PR: #13372
#0: update CODEOWNERS for distributed subdirectories
- PR: #13503
#13127: Add simple tensor creation gtest
- PR: #13500
Fix compilation of test_create_tensor.cpp
- PR: #13506
#0: add is_ci_env to segformer model
- PR: #13497
New tests and updates of ttnn sweeps
- PR: #13417
#11307: Remove l1_data section
- PR: #13483

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.53.0-rc14

📦 Uncategorized