Release v0.51.0-rc3 · tenstorrent/tt-metal

📦 Uncategorized

Migrate Pad Device and All references
- PR: #9891
#0: Multi-CQ support for R-Chip
- PR: #10002
#10028: Remove skip and reduce test case for moreh_groupnorm test
- PR: #10029
#10005: Change input tensor parameter to optional in moreh_sum_backward
- PR: #10007
#10004: Revise bias tensor usage in moreh_linear_backward
- PR: #10006
#9663: support moreh_nll_loss_unreduced
- PR: #9804
#8865: Switch ported ops from tt_lib to ttnn for host dispatch time m…
- PR: #10009
#0: Update README.md grammar for idiomatic description of TT-NN
- PR: #9827
#9767: removed more no longer needed manually specified attributes for reflection
- PR: #10023
Add distributed layernorm kernel documentation
- PR: #9982
#10031: Fix -Werror=return-type error in composite_ops
- PR: #10036
#9492: update matmul path in CODEOWNERS
- PR: #10022
#9450: change silicon fixtures to session scope
- PR: #10019
Uplift UMD to grab support for configuring static TLBs and Hugepage for BH
- PR: #9934
#9441: add all typecasts to unit test
- PR: #10046
#9801: Add cb alignment fix for blackhole that was missed in rebase
- PR: #10051
#9973: Fix addrmod for reduce scalar, port over missing narrow tile c…
- PR: #10047
#10052: Add metal pack untilize test
- PR: #10057
Add ttnn matmul tests to TG unit tests
- PR: #9477
Add ssm_prefix_scan test coverage for N=16
- PR: #10061
Add PyBind to TTNN Slice (Formerly Referred to Unpad in TT Lib)
- PR: #10056
#8450: Cleanup items pending from PR #9068
- PR: #10053
#10030: fix moreh_nll_loss hang
- PR: #10040
#7736: Remove unused reduce dim & type from reduce_init*
- PR: #10060
#9871: Update backward files
- PR: #10037
#9874: Move Unary Backward ops to TTNN
- PR: #9949
Update op_perf_results
- PR: #10042
#9962: Enable flags for profiler globals in jit build
- PR: #9964
Added prefill mode for mamba modules
- PR: #10063
Increase timeout for Mamba full model tests
- PR: #10064
Support multiple user indices in paged_update_cache
- PR: #10050
#10085: Make ttnn::Buffer deallocate execute without querying a potentially destroyed buffer instance
- PR: #10095
Pack runtime arguments across brisc/ncrisc/trisc
- PR: #9781
Llama Demo Refactor
- PR: #10018
#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions
- PR: #10103
#0: Move t3k demo tests to perf pipeline because it requires perf governor
- PR: #10106
#5424: Delegated sfpu reciprocal calls to gs submodule functions
- PR: #10105
Add trace and multi cq implementations/tests for WH Resnet
- PR: #10021
#0: (MINOR) Update to v0.51.0
- PR: #10114
#0: bump python3.8 venv versioning since apt repos updated
- PR: #10111
#10099: fix semaphores init for packet mux/demux
- PR: #10134
#10112: Drop hard pin for installation instructions for python3.8-venv in dependencies
- PR: #10113
Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions"
- PR: #10135
#0: Remove stray assert forcing single CQ on R-Chips
- PR: #10098
#9490: Replace tt_dnn op's usage in C++ with TTNN
- PR: #9821
#9874: Merge Next set of unary backward ops to TTNN
- PR: #10066
#10073: Move unary backward ops to TTNN
- PR: #10065
Unary backward op migration
- PR: #10078
#10087: update tt-umd submodule
- PR: #10092
#9959: Migrated pad to ttnn sweeps
- PR: #10067
Adding distributed layernorm to llama prefill
- PR: #10054
Add pytest xdist multiprocess to single-chip demo tests
- PR: #10162
Revert "Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions""
- PR: #10171
#10071 : Move second set of Unary Backward ops to TTNN
- PR: #10038
#10083: added tt::stl::json::to_json and tt::stl::json::from_json
- PR: #10084
#10086: Add logic for splitting cmds that exceed the subcmd limit into separate cmds for semaphores
- PR: #10151
#5424: Delegated sqrt api call to thirdparty gs submodule sqrt call
- PR: #10183
#5424: Delegated sfpu api call to sqrt for wh to submodule sqrt call
- PR: #10185
#0: Fix galaxy eth dispatch init to only init the specified number of cqs (galaxy only supports single cq)
- PR: #10187
Fix undefined memory bug in ssm_prefix_scan
- PR: #10149
removed weight copies from DRAM to L1
- PR: #10189
fix syntax issues with test dispatch workflow
- PR: #10182
#9609: Reorganize libs into ttnn
- PR: #9870
#10165: Fix build error with g++-12
- PR: #10167
Adding support for dram sharded matmuls
- PR: #9878
#10076: Migrate Unary bw ops and replace tt_eager ops with ttnn ops
- PR: #10140
#10072: Move next set of Unary Backward ops to TTNN
- PR: #10080
#9082: ping individual falcon member since slack user group is not wo…
- PR: #10193
#8681: Add Floor, Trunc blocker ops
- PR: #9098
#9419: use memcpy to avoid mem misalignment
- PR: #10154
#10079: Move Unary Backward ops to TTNN
- PR: #10145
Migrate unary ops to TTNN
- PR: #10152
#9945: Skip SD for nightly FD, device perf tests, and single-card demos as it hangs on di/dt
- PR: #10179
#10045: use struct for matmul parameter passing and update doc string
- PR: #10153
#10045: remove use_1d_systolic_array from ttnn matmul
- PR: #10164
Ngrujic/profiling
- PR: #10150
#9319: Upload benchmark data for t3k falcon 7b tests
- PR: #10159
Aliu/build opt
- PR: #10096
#10107: Fix hangs w/ launch_msg size >32bytes
- PR: #10157
[CCL] Making buffer size dynamic to input slice
- PR: #10173
#7617: remove failing experimental model test
- PR: #10205
#7618: delete failing experimental model test
- PR: #10214
#0: fix prefill CI for mamba
- PR: #10227
Move Mamba tests to wh_b0_only_eth pipeline
- PR: #10206
#9747: Implement ttnn::tilize in C++
- PR: #10188
Aliu/prevent aho tanking
- PR: #10216
#10045: fix up missed parameter change in mamba block model
- PR: #10225
#9490: Added ttnn support for unary ops py file
- PR: #9883
#10101: [Blackhole Bringup] Revert Zeroacc to legacy behaviour
- PR: #10217
Update README.md
- PR: #10176
#0: Fix imports after tt_lib change
- PR: #10235
#10226: [Blackhole Bringup] Add new sfpu files
- PR: #10233
Suppress g++-12 build errors with -Wno flags
- PR: #10204
#0: Fix BH regression caused by unaligned L1_UNRESERVED_BASE
- PR: #10220
#10077: Migrate Unary comparison backward ops to TTNN with Overloading
- PR: #10198
#10175: Remove std::function and restructure ternary_bw
- PR: #10169
Falcon40b attn mask optimization
- PR: #10089
#10074: Move Unary backward ops to TTNN
- PR: #10196
Replace all TT Lib Unpad with TTNN Slice
- PR: #10104
#10082: Migrate unary bw ops to TTNN and remove std::function
- PR: #10239
#9715: Use build artifacts for profiler tests
- PR: #10218
#9021: adding resnet api into ci.
- PR: #10008
Update README.md
- PR: #10247
Move pad_on_host/unpad_on_host to host function in TTNN
- PR: #10178
#9874: Move polygamma_bw to TTNN
- PR: #10146
#5337: increase t3k frequent test timeout
- PR: #10202
Update falcon40b readme
- PR: #10261
#0: add layernorm rmsnorm pybind, move to ttnn
- PR: #10012
#0: Re-enable read cache in llama_model_optimized.
- PR: #10208
Update Mistral/Mixtral README files
- PR: #10259
#0: Update LLama2/3 readme with demo details
- PR: #10263
#0: resnet perf fix
- PR: #10273
Update Mamba README.md
- PR: #10262
OPT convs in RN50 to get better device perf
- PR: #10279
Increase timeout for N300 WH-only model pipeline
- PR: #10287
Prefill+Decode Demo Functional Implementation
- PR: #10281
[Falcon7b] Add wormhole demo perf mode and output verification tests
- PR: #10269
Update Falcon7/40b READMEs with details on model functionality and perf-mode
- PR: #10290
bump python 3.8 venv package version
- PR: #10315
Git bisect workflow on CI runners
- PR: #10316
#9613: scaffolding for weekly scheduled t3k perplexity tests
- PR: #10142
fix syntax issue with bisect script
- PR: #10328
#10231: Clean up t3k runs-on tags to minimum
- PR: #10232
#9490: Remove tt_eager unary ops and bindings
- PR: #10194
only build for arch that a dispatched workflow is running for
- PR: #10318

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.51.0-rc3

📦 Uncategorized