Release v0.51.0-rc8 · tenstorrent/tt-metal

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10016484977

📦 Uncategorized

Migrate Pad Device and All references
- PR: #9891
#0: Multi-CQ support for R-Chip
- PR: #10002
#10028: Remove skip and reduce test case for moreh_groupnorm test
- PR: #10029
#10005: Change input tensor parameter to optional in moreh_sum_backward
- PR: #10007
#10004: Revise bias tensor usage in moreh_linear_backward
- PR: #10006
#9663: support moreh_nll_loss_unreduced
- PR: #9804
#8865: Switch ported ops from tt_lib to ttnn for host dispatch time m…
- PR: #10009
#0: Update README.md grammar for idiomatic description of TT-NN
- PR: #9827
#9767: removed more no longer needed manually specified attributes for reflection
- PR: #10023
Add distributed layernorm kernel documentation
- PR: #9982
#10031: Fix -Werror=return-type error in composite_ops
- PR: #10036
#9492: update matmul path in CODEOWNERS
- PR: #10022
#9450: change silicon fixtures to session scope
- PR: #10019
Uplift UMD to grab support for configuring static TLBs and Hugepage for BH
- PR: #9934
#9441: add all typecasts to unit test
- PR: #10046
#9801: Add cb alignment fix for blackhole that was missed in rebase
- PR: #10051
#9973: Fix addrmod for reduce scalar, port over missing narrow tile c…
- PR: #10047
#10052: Add metal pack untilize test
- PR: #10057
Add ttnn matmul tests to TG unit tests
- PR: #9477
Add ssm_prefix_scan test coverage for N=16
- PR: #10061
Add PyBind to TTNN Slice (Formerly Referred to Unpad in TT Lib)
- PR: #10056
#8450: Cleanup items pending from PR #9068
- PR: #10053
#10030: fix moreh_nll_loss hang
- PR: #10040
#7736: Remove unused reduce dim & type from reduce_init*
- PR: #10060
#9871: Update backward files
- PR: #10037
#9874: Move Unary Backward ops to TTNN
- PR: #9949
Update op_perf_results
- PR: #10042
#9962: Enable flags for profiler globals in jit build
- PR: #9964
Added prefill mode for mamba modules
- PR: #10063
Increase timeout for Mamba full model tests
- PR: #10064
Support multiple user indices in paged_update_cache
- PR: #10050
#10085: Make ttnn::Buffer deallocate execute without querying a potentially destroyed buffer instance
- PR: #10095
Pack runtime arguments across brisc/ncrisc/trisc
- PR: #9781
Llama Demo Refactor
- PR: #10018
#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions
- PR: #10103
#0: Move t3k demo tests to perf pipeline because it requires perf governor
- PR: #10106
#5424: Delegated sfpu reciprocal calls to gs submodule functions
- PR: #10105
Add trace and multi cq implementations/tests for WH Resnet
- PR: #10021
#0: (MINOR) Update to v0.51.0
- PR: #10114
#0: bump python3.8 venv versioning since apt repos updated
- PR: #10111
#10099: fix semaphores init for packet mux/demux
- PR: #10134
#10112: Drop hard pin for installation instructions for python3.8-venv in dependencies
- PR: #10113
Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions"
- PR: #10135
#0: Remove stray assert forcing single CQ on R-Chips
- PR: #10098
#9490: Replace tt_dnn op's usage in C++ with TTNN
- PR: #9821
#9874: Merge Next set of unary backward ops to TTNN
- PR: #10066
#10073: Move unary backward ops to TTNN
- PR: #10065
Unary backward op migration
- PR: #10078
#10087: update tt-umd submodule
- PR: #10092
#9959: Migrated pad to ttnn sweeps
- PR: #10067
Adding distributed layernorm to llama prefill
- PR: #10054
Add pytest xdist multiprocess to single-chip demo tests
- PR: #10162
Revert "Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions""
- PR: #10171
#10071 : Move second set of Unary Backward ops to TTNN
- PR: #10038
#10083: added tt::stl::json::to_json and tt::stl::json::from_json
- PR: #10084
#10086: Add logic for splitting cmds that exceed the subcmd limit into separate cmds for semaphores
- PR: #10151
#5424: Delegated sqrt api call to thirdparty gs submodule sqrt call
- PR: #10183
#5424: Delegated sfpu api call to sqrt for wh to submodule sqrt call
- PR: #10185
#0: Fix galaxy eth dispatch init to only init the specified number of cqs (galaxy only supports single cq)
- PR: #10187
Fix undefined memory bug in ssm_prefix_scan
- PR: #10149
removed weight copies from DRAM to L1
- PR: #10189
fix syntax issues with test dispatch workflow
- PR: #10182
#9609: Reorganize libs into ttnn
- PR: #9870
#10165: Fix build error with g++-12
- PR: #10167
Adding support for dram sharded matmuls
- PR: #9878
#10076: Migrate Unary bw ops and replace tt_eager ops with ttnn ops
- PR: #10140
#10072: Move next set of Unary Backward ops to TTNN
- PR: #10080
#9082: ping individual falcon member since slack user group is not wo…
- PR: #10193
#8681: Add Floor, Trunc blocker ops
- PR: #9098
#9419: use memcpy to avoid mem misalignment
- PR: #10154
#10079: Move Unary Backward ops to TTNN
- PR: #10145
Migrate unary ops to TTNN
- PR: #10152
#9945: Skip SD for nightly FD, device perf tests, and single-card demos as it hangs on di/dt
- PR: #10179
#10045: use struct for matmul parameter passing and update doc string
- PR: #10153
#10045: remove use_1d_systolic_array from ttnn matmul
- PR: #10164
Ngrujic/profiling
- PR: #10150
#9319: Upload benchmark data for t3k falcon 7b tests
- PR: #10159
Aliu/build opt
- PR: #10096
#10107: Fix hangs w/ launch_msg size >32bytes
- PR: #10157
[CCL] Making buffer size dynamic to input slice
- PR: #10173
#7617: remove failing experimental model test
- PR: #10205
#7618: delete failing experimental model test
- PR: #10214
#0: fix prefill CI for mamba
- PR: #10227
Move Mamba tests to wh_b0_only_eth pipeline
- PR: #10206
#9747: Implement ttnn::tilize in C++
- PR: #10188
Aliu/prevent aho tanking
- PR: #10216
#10045: fix up missed parameter change in mamba block model
- PR: #10225
#9490: Added ttnn support for unary ops py file
- PR: #9883
#10101: [Blackhole Bringup] Revert Zeroacc to legacy behaviour
- PR: #10217
Update README.md
- PR: #10176
#0: Fix imports after tt_lib change
- PR: #10235
#10226: [Blackhole Bringup] Add new sfpu files
- PR: #10233
Suppress g++-12 build errors with -Wno flags
- PR: #10204
#0: Fix BH regression caused by unaligned L1_UNRESERVED_BASE
- PR: #10220
#10077: Migrate Unary comparison backward ops to TTNN with Overloading
- PR: #10198
#10175: Remove std::function and restructure ternary_bw
- PR: #10169
Falcon40b attn mask optimization
- PR: #10089
#10074: Move Unary backward ops to TTNN
- PR: #10196
Replace all TT Lib Unpad with TTNN Slice
- PR: #10104
#10082: Migrate unary bw ops to TTNN and remove std::function
- PR: #10239
#9715: Use build artifacts for profiler tests
- PR: #10218
#9021: adding resnet api into ci.
- PR: #10008
Update README.md
- PR: #10247
Move pad_on_host/unpad_on_host to host function in TTNN
- PR: #10178
#9874: Move polygamma_bw to TTNN
- PR: #10146
#5337: increase t3k frequent test timeout
- PR: #10202
Update falcon40b readme
- PR: #10261
#0: add layernorm rmsnorm pybind, move to ttnn
- PR: #10012
#0: Re-enable read cache in llama_model_optimized.
- PR: #10208
Update Mistral/Mixtral README files
- PR: #10259
#0: Update LLama2/3 readme with demo details
- PR: #10263
#0: resnet perf fix
- PR: #10273
Update Mamba README.md
- PR: #10262
OPT convs in RN50 to get better device perf
- PR: #10279
Increase timeout for N300 WH-only model pipeline
- PR: #10287
Prefill+Decode Demo Functional Implementation
- PR: #10281
[Falcon7b] Add wormhole demo perf mode and output verification tests
- PR: #10269
Update Falcon7/40b READMEs with details on model functionality and perf-mode
- PR: #10290
bump python 3.8 venv package version
- PR: #10315
Git bisect workflow on CI runners
- PR: #10316
#9613: scaffolding for weekly scheduled t3k perplexity tests
- PR: #10142
fix syntax issue with bisect script
- PR: #10328
#10231: Clean up t3k runs-on tags to minimum
- PR: #10232
#9490: Remove tt_eager unary ops and bindings
- PR: #10194
only build for arch that a dispatched workflow is running for
- PR: #10318
Allow overloading of job name with user-defined name for new dispatch workflows
- PR: #10331
#10242: Migrate unary bw ops with a generalized structure to TTNN
- PR: #10243
#10322: commented out failing t3k tests
- PR: #10327
#9491: Add structure for ternary ops in ttnn
- PR: #10240
Move downsample from tt_eager to ttnn
- PR: #9951
#10250: Migrate unary backward ops with a generalized structure to TTNN
- PR: #10253
#10280: Mistral README update
- PR: #10309
#9911: Add structure and migrate 20 composite unary ops
- PR: #9913
#0: fix rn50 block padding
- PR: #10329
#10300: get the correct operation id on subsequent run
- PR: #10303
#0: Move host tensor construction for halo into create_program to only happen on uncached runs
- PR: #10221
Flash decode v2
- PR: #10313
#9751: Restructure ttnn transformers to new folder structure
- PR: #10353
#10181: Disable test_reduce_h due to sporadic failures in slow dispatch
- PR: #10359
#10181: Disable test_reduce_h
- PR: #10362
Update README.md
- PR: #10369
move groupnorm from ttlib to ttnn
- PR: #10363
Update README.md
- PR: #10370
Update README.md - missing footnote
- PR: #10372
#0: Update ttnn resnet 2cq bound due to variability
- PR: #10368
#7528: add new ethernet microbenchmark, cleanup and re-enable others
- PR: #9966
#10238: migrate 7 unary ops into ttnn
- PR: #10264
#10333: Migrate prod_bw to TTNN
- PR: #10345
#10320: Enable falcon40b tests again
- PR: #10387
Add fused layernorm to falcon40b
- PR: #9502
#8342: Add info to matmul that tensors need to be on device
- PR: #10260
#10254: Enable preserve_fp32_precision flag in moreh_sum op
- PR: #10265
#10305: Add INSTALLING.md to release assets and create new custom release notes with an installation and pipeline ID
- PR: #10377
#9901: Refactoring moreh norm
- PR: #10255
Ngrujic/profiling
- PR: #10268
#9747: Implement ttnn.tilize(_with_val_padding) Python bindings
- PR: #10289
Add fixture for checking if in CI env and invoke Falcon7b demo tests with only filename
- PR: #10371
Add native caching for Mamba convolution/hidden states
- PR: #10398
Add skip-first option to op perf results script
- PR: #10390
#10083: added unit tests for JSON serialization
- PR: #10358
#10323: Reenable Llama perf test in CI
- PR: #10410
#9747: Delete tilize ops from tt_eager
- PR: #10406
#8764: More docs changes for WH readiness, Part 5
- PR: #9403
Move Mamba embeddings onto device
- PR: #10414
#10257: Add ttnn binding for UnaryWithParam, UnaryOpType
- PR: #10258
#10224: Update offsets for GO signal commands to use sizeof prefetch/dispatch cmd rather than pcie aligned size
- PR: #10357
#10166: add device mesh apis to query by row and col
- PR: #10417
#10380: Migrate set 1,2 complex ops to TTNN
- PR: #10383
#9527: continue removing bcast
- PR: #10207
#10334: Migrate 7 Type 2 unary complex bw ops to TTNN
- PR: #10336
#9806: Migrate Complex binary backward ops to TTNN
- PR: #10003
Relu max sweep migration - TTLIB to TTNN
- PR: #10338
Refactor: common RMSNorm for Mixtral and Mistral
- PR: #10354
#9874: Update clamp_bw to match PyTorch API
- PR: #10393
[Falcon7b] Add perplexity tests to new pipeline and restructure pytests to invoke with filenames
- PR: #10355
#10322: Re-enable Mixtral CI tests due to corrupted cache in CI machine
- PR: #10433
Migration of relu_min from tt_eager to ttnn
- PR: #10431
#10147: Migrate addcmul_bw to ttnn
- PR: #10436
#10403: Use is_ci_env fixture instead of env variable
- PR: #10442
Optimized MLP with W/H fracturing, sharding and ReduceScatter
- PR: #10043
Minor refactoring
- PR: #10447
Revert "Migration of relu_min from tt_eager to ttnn"
- PR: #10451
Add falcon40b demo test with token matching
- PR: #10400
#10130: Delete scan op in favor of ssm_prefix_scan
- PR: #10418
Support any sequence length in Mamba prefill
- PR: #10174
#10200: Update umd for mmio flush array overrun bugfix
- PR: #10466
#10467: Move tt_eager folder content into ttnn/experimental
- PR: #10424
Mixtral prefill 128-32k
- PR: #9907
#8865: Update reference times for dispatch time measuring
- PR: #10430
Migrate unpad sweep to TTNN
- PR: #10434
Update demo token matching reference for falcon40b
- PR: #10478
TTNN fmod sweeps added
- PR: #10339
#10147: Migrated eltwise_relu_min to ttnn
- PR: #10481
#0: Upgrade WH and T3000 WH KMD and FW versions to v1.27.1 and v80.10.0.0 respectively
- PR: #7999
#0: Move concatenate heads into ttnn experimental
- PR: #10409
#9628: Update Test files with golden function
- PR: #10480
#9490: Remove eltwise_unary in tt_eager
- PR: #10438
#10137: Add structure for composite binary ops in ttnn
- PR: #10138
Fix/re-enable a few watcher tests
- PR: #10491
#10471: Fixed GCC13 compile time issue
- PR: #10473
#7887: remove deprecated device_pool
- PR: #10456
Add llama galaxy mlp to TG frequent tests
- PR: #10274
#0: Update CODEOWNERS
- PR: #10490
Move Mamba demo to models/demos/wormhole
- PR: #10461
#10180: Use last column for FD on BH
- PR: #10465
#10052: [Blackhole bringup] Add pack untilize
- PR: #10422

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.51.0-rc8

📦 Uncategorized