v0.51.0-rc3
Pre-release
Pre-release
github-actions
released this
16 Jul 02:20
·
2728 commits
to main
since this release
📦 Uncategorized
- Migrate Pad Device and All references
- PR: #9891
- #0: Multi-CQ support for R-Chip
- PR: #10002
- #10028: Remove skip and reduce test case for
moreh_groupnorm
test- PR: #10029
- #10005: Change input tensor parameter to optional in moreh_sum_backward
- PR: #10007
- #10004: Revise bias tensor usage in moreh_linear_backward
- PR: #10006
- #9663: support moreh_nll_loss_unreduced
- PR: #9804
- #8865: Switch ported ops from tt_lib to ttnn for host dispatch time m…
- PR: #10009
- #0: Update README.md grammar for idiomatic description of TT-NN
- PR: #9827
- #9767: removed more no longer needed manually specified attributes for reflection
- PR: #10023
- Add distributed layernorm kernel documentation
- PR: #9982
- #10031: Fix -Werror=return-type error in composite_ops
- PR: #10036
- #9492: update matmul path in CODEOWNERS
- PR: #10022
- #9450: change silicon fixtures to session scope
- PR: #10019
- Uplift UMD to grab support for configuring static TLBs and Hugepage for BH
- PR: #9934
- #9441: add all typecasts to unit test
- PR: #10046
- #9801: Add cb alignment fix for blackhole that was missed in rebase
- PR: #10051
- #9973: Fix addrmod for reduce scalar, port over missing narrow tile c…
- PR: #10047
- #10052: Add metal pack untilize test
- PR: #10057
- Add ttnn matmul tests to TG unit tests
- PR: #9477
- Add
ssm_prefix_scan
test coverage for N=16- PR: #10061
- Add PyBind to TTNN Slice (Formerly Referred to Unpad in TT Lib)
- PR: #10056
- #8450: Cleanup items pending from PR #9068
- PR: #10053
- #10030: fix moreh_nll_loss hang
- PR: #10040
- #7736: Remove unused reduce dim & type from reduce_init*
- PR: #10060
- #9871: Update backward files
- PR: #10037
- #9874: Move Unary Backward ops to TTNN
- PR: #9949
- Update op_perf_results
- PR: #10042
- #9962: Enable flags for profiler globals in jit build
- PR: #9964
- Added prefill mode for mamba modules
- PR: #10063
- Increase timeout for Mamba full model tests
- PR: #10064
- Support multiple user indices in paged_update_cache
- PR: #10050
- #10085: Make ttnn::Buffer deallocate execute without querying a potentially destroyed buffer instance
- PR: #10095
- Pack runtime arguments across brisc/ncrisc/trisc
- PR: #9781
- Llama Demo Refactor
- PR: #10018
- #5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions
- PR: #10103
- #0: Move t3k demo tests to perf pipeline because it requires perf governor
- PR: #10106
- #5424: Delegated sfpu reciprocal calls to gs submodule functions
- PR: #10105
- Add trace and multi cq implementations/tests for WH Resnet
- PR: #10021
- #0: (MINOR) Update to v0.51.0
- PR: #10114
- #0: bump python3.8 venv versioning since apt repos updated
- PR: #10111
- #10099: fix semaphores init for packet mux/demux
- PR: #10134
- #10112: Drop hard pin for installation instructions for python3.8-venv in dependencies
- PR: #10113
- Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions"
- PR: #10135
- #0: Remove stray assert forcing single CQ on R-Chips
- PR: #10098
- #9490: Replace tt_dnn op's usage in C++ with TTNN
- PR: #9821
- #9874: Merge Next set of unary backward ops to TTNN
- PR: #10066
- #10073: Move unary backward ops to TTNN
- PR: #10065
- Unary backward op migration
- PR: #10078
- #10087: update tt-umd submodule
- PR: #10092
- #9959: Migrated pad to ttnn sweeps
- PR: #10067
- Adding distributed layernorm to llama prefill
- PR: #10054
- Add pytest xdist multiprocess to single-chip demo tests
- PR: #10162
- Revert "Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions""
- PR: #10171
- #10071 : Move second set of Unary Backward ops to TTNN
- PR: #10038
- #10083: added tt::stl::json::to_json and tt::stl::json::from_json
- PR: #10084
- #10086: Add logic for splitting cmds that exceed the subcmd limit into separate cmds for semaphores
- PR: #10151
- #5424: Delegated sqrt api call to thirdparty gs submodule sqrt call
- PR: #10183
- #5424: Delegated sfpu api call to sqrt for wh to submodule sqrt call
- PR: #10185
- #0: Fix galaxy eth dispatch init to only init the specified number of cqs (galaxy only supports single cq)
- PR: #10187
- Fix undefined memory bug in
ssm_prefix_scan
- PR: #10149
- removed weight copies from DRAM to L1
- PR: #10189
- fix syntax issues with test dispatch workflow
- PR: #10182
- #9609: Reorganize libs into ttnn
- PR: #9870
- #10165: Fix build error with g++-12
- PR: #10167
- Adding support for dram sharded matmuls
- PR: #9878
- #10076: Migrate Unary bw ops and replace tt_eager ops with ttnn ops
- PR: #10140
- #10072: Move next set of Unary Backward ops to TTNN
- PR: #10080
- #9082: ping individual falcon member since slack user group is not wo…
- PR: #10193
- #8681: Add Floor, Trunc blocker ops
- PR: #9098
- #9419: use memcpy to avoid mem misalignment
- PR: #10154
- #10079: Move Unary Backward ops to TTNN
- PR: #10145
- Migrate unary ops to TTNN
- PR: #10152
- #9945: Skip SD for nightly FD, device perf tests, and single-card demos as it hangs on di/dt
- PR: #10179
- #10045: use struct for matmul parameter passing and update doc string
- PR: #10153
- #10045: remove use_1d_systolic_array from ttnn matmul
- PR: #10164
- Ngrujic/profiling
- PR: #10150
- #9319: Upload benchmark data for t3k falcon 7b tests
- PR: #10159
- Aliu/build opt
- PR: #10096
- #10107: Fix hangs w/ launch_msg size >32bytes
- PR: #10157
- [CCL] Making buffer size dynamic to input slice
- PR: #10173
- #7617: remove failing experimental model test
- PR: #10205
- #7618: delete failing experimental model test
- PR: #10214
- #0: fix prefill CI for mamba
- PR: #10227
- Move Mamba tests to wh_b0_only_eth pipeline
- PR: #10206
- #9747: Implement ttnn::tilize in C++
- PR: #10188
- Aliu/prevent aho tanking
- PR: #10216
- #10045: fix up missed parameter change in mamba block model
- PR: #10225
- #9490: Added ttnn support for unary ops py file
- PR: #9883
- #10101: [Blackhole Bringup] Revert Zeroacc to legacy behaviour
- PR: #10217
- Update README.md
- PR: #10176
- #0: Fix imports after tt_lib change
- PR: #10235
- #10226: [Blackhole Bringup] Add new sfpu files
- PR: #10233
- Suppress g++-12 build errors with -Wno flags
- PR: #10204
- #0: Fix BH regression caused by unaligned L1_UNRESERVED_BASE
- PR: #10220
- #10077: Migrate Unary comparison backward ops to TTNN with Overloading
- PR: #10198
- #10175: Remove std::function and restructure ternary_bw
- PR: #10169
- Falcon40b attn mask optimization
- PR: #10089
- #10074: Move Unary backward ops to TTNN
- PR: #10196
- Replace all TT Lib Unpad with TTNN Slice
- PR: #10104
- #10082: Migrate unary bw ops to TTNN and remove std::function
- PR: #10239
- #9715: Use build artifacts for profiler tests
- PR: #10218
- #9021: adding resnet api into ci.
- PR: #10008
- Update README.md
- PR: #10247
- Move pad_on_host/unpad_on_host to host function in TTNN
- PR: #10178
- #9874: Move polygamma_bw to TTNN
- PR: #10146
- #5337: increase t3k frequent test timeout
- PR: #10202
- Update falcon40b readme
- PR: #10261
- #0: add layernorm rmsnorm pybind, move to ttnn
- PR: #10012
- #0: Re-enable read cache in llama_model_optimized.
- PR: #10208
- Update Mistral/Mixtral README files
- PR: #10259
- #0: Update LLama2/3 readme with demo details
- PR: #10263
- #0: resnet perf fix
- PR: #10273
- Update Mamba README.md
- PR: #10262
- OPT convs in RN50 to get better device perf
- PR: #10279
- Increase timeout for N300 WH-only model pipeline
- PR: #10287
- Prefill+Decode Demo Functional Implementation
- PR: #10281
- [Falcon7b] Add wormhole demo perf mode and output verification tests
- PR: #10269
- Update Falcon7/40b READMEs with details on model functionality and perf-mode
- PR: #10290
- bump python 3.8 venv package version
- PR: #10315
- Git bisect workflow on CI runners
- PR: #10316
- #9613: scaffolding for weekly scheduled t3k perplexity tests
- PR: #10142
- fix syntax issue with bisect script
- PR: #10328
- #10231: Clean up t3k runs-on tags to minimum
- PR: #10232
- #9490: Remove tt_eager unary ops and bindings
- PR: #10194
- only build for arch that a dispatched workflow is running for
- PR: #10318