Releases · tenstorrent/tt-metal

17 Jul 18:24

v0.51.0-rc5

cf6d5ef

v0.51.0-rc5 Pre-release

Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/9978853510

📦 Uncategorized

Migrate Pad Device and All references
- PR: #9891
#0: Multi-CQ support for R-Chip
- PR: #10002
#10028: Remove skip and reduce test case for moreh_groupnorm test
- PR: #10029
#10005: Change input tensor parameter to optional in moreh_sum_backward
- PR: #10007
#10004: Revise bias tensor usage in moreh_linear_backward
- PR: #10006
#9663: support moreh_nll_loss_unreduced
- PR: #9804
#8865: Switch ported ops from tt_lib to ttnn for host dispatch time m…
- PR: #10009
#0: Update README.md grammar for idiomatic description of TT-NN
- PR: #9827
#9767: removed more no longer needed manually specified attributes for reflection
- PR: #10023
Add distributed layernorm kernel documentation
- PR: #9982
#10031: Fix -Werror=return-type error in composite_ops
- PR: #10036
#9492: update matmul path in CODEOWNERS
- PR: #10022
#9450: change silicon fixtures to session scope
- PR: #10019
Uplift UMD to grab support for configuring static TLBs and Hugepage for BH
- PR: #9934
#9441: add all typecasts to unit test
- PR: #10046
#9801: Add cb alignment fix for blackhole that was missed in rebase
- PR: #10051
#9973: Fix addrmod for reduce scalar, port over missing narrow tile c…
- PR: #10047
#10052: Add metal pack untilize test
- PR: #10057
Add ttnn matmul tests to TG unit tests
- PR: #9477
Add ssm_prefix_scan test coverage for N=16
- PR: #10061
Add PyBind to TTNN Slice (Formerly Referred to Unpad in TT Lib)
- PR: #10056
#8450: Cleanup items pending from PR #9068
- PR: #10053
#10030: fix moreh_nll_loss hang
- PR: #10040
#7736: Remove unused reduce dim & type from reduce_init*
- PR: #10060
#9871: Update backward files
- PR: #10037
#9874: Move Unary Backward ops to TTNN
- PR: #9949
Update op_perf_results
- PR: #10042
#9962: Enable flags for profiler globals in jit build
- PR: #9964
Added prefill mode for mamba modules
- PR: #10063
Increase timeout for Mamba full model tests
- PR: #10064
Support multiple user indices in paged_update_cache
- PR: #10050
#10085: Make ttnn::Buffer deallocate execute without querying a potentially destroyed buffer instance
- PR: #10095
Pack runtime arguments across brisc/ncrisc/trisc
- PR: #9781
Llama Demo Refactor
- PR: #10018
#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions
- PR: #10103
#0: Move t3k demo tests to perf pipeline because it requires perf governor
- PR: #10106
#5424: Delegated sfpu reciprocal calls to gs submodule functions
- PR: #10105
Add trace and multi cq implementations/tests for WH Resnet
- PR: #10021
#0: (MINOR) Update to v0.51.0
- PR: #10114
#0: bump python3.8 venv versioning since apt repos updated
- PR: #10111
#10099: fix semaphores init for packet mux/demux
- PR: #10134
#10112: Drop hard pin for installation instructions for python3.8-venv in dependencies
- PR: #10113
Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions"
- PR: #10135
#0: Remove stray assert forcing single CQ on R-Chips
- PR: #10098
#9490: Replace tt_dnn op's usage in C++ with TTNN
- PR: #9821
#9874: Merge Next set of unary backward ops to TTNN
- PR: #10066
#10073: Move unary backward ops to TTNN
- PR: #10065
Unary backward op migration
- PR: #10078
#10087: update tt-umd submodule
- PR: #10092
#9959: Migrated pad to ttnn sweeps
- PR: #10067
Adding distributed layernorm to llama prefill
- PR: #10054
Add pytest xdist multiprocess to single-chip demo tests
- PR: #10162
Revert "Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions""
- PR: #10171
#10071 : Move second set of Unary Backward ops to TTNN
- PR: #10038
#10083: added tt::stl::json::to_json and tt::stl::json::from_json
- PR: #10084
#10086: Add logic for splitting cmds that exceed the subcmd limit into separate cmds for semaphores
- PR: #10151
#5424: Delegated sqrt api call to thirdparty gs submodule sqrt call
- PR: #10183
#5424: Delegated sfpu api call to sqrt for wh to submodule sqrt call
- PR: #10185
#0: Fix galaxy eth dispatch init to only init the specified number of cqs (galaxy only supports single cq)
- PR: #10187
Fix undefined memory bug in ssm_prefix_scan
- PR: #10149
removed weight copies from DRAM to L1
- PR: #10189
fix syntax issues with test dispatch workflow
- PR: #10182
#9609: Reorganize libs into ttnn
- PR: #9870
#10165: Fix build error with g++-12
- PR: #10167
Adding support for dram sharded matmuls
- PR: #9878
#10076: Migrate Unary bw ops and replace tt_eager ops with ttnn ops
- PR: #10140
#10072: Move next set of Unary Backward ops to TTNN
- PR: #10080
#9082: ping individual falcon member since slack user group is not wo…
- PR: #10193
#8681: Add Floor, Trunc blocker ops
- PR: #9098
#9419: use memcpy to avoid mem misalignment
- PR: #10154
#10079: Move Unary Backward ops to TTNN
- PR: #10145
Migrate unary ops to TTNN
- PR: #10152
#9945: Skip SD for nightly FD, device perf tests, and single-card demos as it hangs on di/dt
- PR: #10179
#10045: use struct for matmul parameter passing and update doc string
- PR: #10153
#10045: remove use_1d_systolic_array from ttnn matmul
- PR: #10164
Ngrujic/profiling
- PR: #10150
#9319: Upload benchmark data for t3k falcon 7b tests
- PR: #10159
Aliu/build opt
- PR: #10096
#10107: Fix hangs w/ launch_msg size >32bytes
- PR: #10157
[CCL] Making buffer size dynamic to input slice
- PR: #10173
#7617: remove failing experimental model test
- PR: #10205
#7618: delete failing experimental model test
- PR: #10214
#0: fix prefill CI for mamba
- PR: #10227
Move Mamba tests to wh_b0_only_eth pipeline
- PR: #10206
#9747: Implement ttnn::tilize in C++
- PR: #10188
Aliu/prevent aho tanking
- PR: #10216
#10045: fix up missed parameter change in mamba block model
- PR: #10225
#9490: Added ttnn support for unary ops py file
- PR: #9883
#10101: [Blackhole Bringup] Revert Zeroacc to legacy behaviour
- PR: #10217
Update README.md
- PR: #10176
#0: Fix imports after tt_lib change
- PR: #10235
#10226: [Blackhole Bringup] Add new sfpu files
- PR: #10233
Suppress g++-12 build errors with -Wno flags
- PR: #10204
#0: Fix BH regression caused by unaligned L1_UNRESERVED_BASE
- PR: #10220
#10077: Migrate Unary comparison backward ops to TTNN with Overloading
- PR: #10198
#10175: Remove std::function and restructure ternary_bw
- PR: #10169
Falcon40b attn mask optimization
- PR: #10089
#10074: Move Unary backward ops to TTNN
- PR: #10196
Replace all TT Lib Unpad with TTNN Slice
- PR: #10104
#10082: Migrate unary bw ops to TTNN and remove std::function
- PR: #10239
#9715: Use build artifacts for profiler tests
- PR: #10218
#9021: adding resnet api into ci.
- PR: #10008
Update README.md
- PR: #10247
Move pad_on_host/unpad_on_host to host function in TTNN
- PR: #10178
#9874: Move polygamma_bw to TTNN
- PR: #10146
#5337: increase t3k frequent test timeout
- PR: #10202
Update falcon40b readme
- PR: #10261
#0: add layernorm rmsnorm pybind, move to ttnn
- PR: #10012
#0: Re-enable read cache in llama_model_optimized.
- PR: #10208
Update Mistral/Mixtral README files
- PR: #10259
#0: Update LLama2/3 readme with demo details
- PR: #10263
#0: resnet perf fix
- PR: #10273
Update Mamba README.md
- PR: #10262
OPT convs in RN50 to get better device perf
- PR: #10279
Increase timeout for N300 WH-only model pipeline
- PR: #10287
Prefill+Decode Demo Functional Implementation
- PR: #10281
[Falcon7b] Add wormhole demo perf mode and output verification tests
- PR: #10269
Update Falcon7/40b READMEs with details on model functionality and perf-mode
- PR: #10290
bump python 3.8 venv package version
- PR: #10315
Git bisect workflow on CI runners
- PR: #10316
#9613: scaffolding for weekly scheduled t3k perplexity tests
- PR: #10142
fix syntax issue with bisect script
- PR: #10328
#10231: Clean up t3k runs-on tags to minimum
- PR: #10232
#9490: Remove tt_eager unary ops and bindings
- PR: #10194
only build for arch that a dispatched workflow is running for
- PR: #10318
Allow overloading of job name with user-defined name for new dispatch workflows
- PR: #10331
#10242: Migrate unary bw ops with a generalized structure to TTNN
- PR: #10243
#10322: commented out failing t3k tests
- PR: #10327
#9491: Add structure for ternary ops in ttnn
- PR: #10240
Move downsample from tt_eager to ttnn
- PR: #9951
#10250: Migrate unary backward ops with a generalized structure to TTNN
- PR: #10253
#10280: Mistral README update
- PR: #10309
#9911: Add structure and migrate 20 composite unary ops
- PR: #9913
#0: fix rn50 block padding
- PR: #10329
#10300: get the correct operation id on subsequent run
- PR: #10303
#0: Move host tensor construction for halo into create_program to only happen on uncached runs
- PR: #10221
Flash decode v2
- PR: #10313
#9751: Restructure ttnn transformers to new folder structure
- PR: #10353
#10181: Disable test_reduce_h due to sporadic failures in slow dispatch
- PR: #10359
#10181: Disable test_reduce_h
- PR: #10362
Update README.md
- PR: #103...

Assets 9

17 Jul 02:18

github-actions

v0.51.0-rc4

921a3a8

v0.51.0-rc4 Pre-release

Pre-release

📦 Uncategorized

Migrate Pad Device and All references
- PR: #9891
#0: Multi-CQ support for R-Chip
- PR: #10002
#10028: Remove skip and reduce test case for moreh_groupnorm test
- PR: #10029
#10005: Change input tensor parameter to optional in moreh_sum_backward
- PR: #10007
#10004: Revise bias tensor usage in moreh_linear_backward
- PR: #10006
#9663: support moreh_nll_loss_unreduced
- PR: #9804
#8865: Switch ported ops from tt_lib to ttnn for host dispatch time m…
- PR: #10009
#0: Update README.md grammar for idiomatic description of TT-NN
- PR: #9827
#9767: removed more no longer needed manually specified attributes for reflection
- PR: #10023
Add distributed layernorm kernel documentation
- PR: #9982
#10031: Fix -Werror=return-type error in composite_ops
- PR: #10036
#9492: update matmul path in CODEOWNERS
- PR: #10022
#9450: change silicon fixtures to session scope
- PR: #10019
Uplift UMD to grab support for configuring static TLBs and Hugepage for BH
- PR: #9934
#9441: add all typecasts to unit test
- PR: #10046
#9801: Add cb alignment fix for blackhole that was missed in rebase
- PR: #10051
#9973: Fix addrmod for reduce scalar, port over missing narrow tile c…
- PR: #10047
#10052: Add metal pack untilize test
- PR: #10057
Add ttnn matmul tests to TG unit tests
- PR: #9477
Add ssm_prefix_scan test coverage for N=16
- PR: #10061
Add PyBind to TTNN Slice (Formerly Referred to Unpad in TT Lib)
- PR: #10056
#8450: Cleanup items pending from PR #9068
- PR: #10053
#10030: fix moreh_nll_loss hang
- PR: #10040
#7736: Remove unused reduce dim & type from reduce_init*
- PR: #10060
#9871: Update backward files
- PR: #10037
#9874: Move Unary Backward ops to TTNN
- PR: #9949
Update op_perf_results
- PR: #10042
#9962: Enable flags for profiler globals in jit build
- PR: #9964
Added prefill mode for mamba modules
- PR: #10063
Increase timeout for Mamba full model tests
- PR: #10064
Support multiple user indices in paged_update_cache
- PR: #10050
#10085: Make ttnn::Buffer deallocate execute without querying a potentially destroyed buffer instance
- PR: #10095
Pack runtime arguments across brisc/ncrisc/trisc
- PR: #9781
Llama Demo Refactor
- PR: #10018
#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions
- PR: #10103
#0: Move t3k demo tests to perf pipeline because it requires perf governor
- PR: #10106
#5424: Delegated sfpu reciprocal calls to gs submodule functions
- PR: #10105
Add trace and multi cq implementations/tests for WH Resnet
- PR: #10021
#0: (MINOR) Update to v0.51.0
- PR: #10114
#0: bump python3.8 venv versioning since apt repos updated
- PR: #10111
#10099: fix semaphores init for packet mux/demux
- PR: #10134
#10112: Drop hard pin for installation instructions for python3.8-venv in dependencies
- PR: #10113
Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions"
- PR: #10135
#0: Remove stray assert forcing single CQ on R-Chips
- PR: #10098
#9490: Replace tt_dnn op's usage in C++ with TTNN
- PR: #9821
#9874: Merge Next set of unary backward ops to TTNN
- PR: #10066
#10073: Move unary backward ops to TTNN
- PR: #10065
Unary backward op migration
- PR: #10078
#10087: update tt-umd submodule
- PR: #10092
#9959: Migrated pad to ttnn sweeps
- PR: #10067
Adding distributed layernorm to llama prefill
- PR: #10054
Add pytest xdist multiprocess to single-chip demo tests
- PR: #10162
Revert "Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions""
- PR: #10171
#10071 : Move second set of Unary Backward ops to TTNN
- PR: #10038
#10083: added tt::stl::json::to_json and tt::stl::json::from_json
- PR: #10084
#10086: Add logic for splitting cmds that exceed the subcmd limit into separate cmds for semaphores
- PR: #10151
#5424: Delegated sqrt api call to thirdparty gs submodule sqrt call
- PR: #10183
#5424: Delegated sfpu api call to sqrt for wh to submodule sqrt call
- PR: #10185
#0: Fix galaxy eth dispatch init to only init the specified number of cqs (galaxy only supports single cq)
- PR: #10187
Fix undefined memory bug in ssm_prefix_scan
- PR: #10149
removed weight copies from DRAM to L1
- PR: #10189
fix syntax issues with test dispatch workflow
- PR: #10182
#9609: Reorganize libs into ttnn
- PR: #9870
#10165: Fix build error with g++-12
- PR: #10167
Adding support for dram sharded matmuls
- PR: #9878
#10076: Migrate Unary bw ops and replace tt_eager ops with ttnn ops
- PR: #10140
#10072: Move next set of Unary Backward ops to TTNN
- PR: #10080
#9082: ping individual falcon member since slack user group is not wo…
- PR: #10193
#8681: Add Floor, Trunc blocker ops
- PR: #9098
#9419: use memcpy to avoid mem misalignment
- PR: #10154
#10079: Move Unary Backward ops to TTNN
- PR: #10145
Migrate unary ops to TTNN
- PR: #10152
#9945: Skip SD for nightly FD, device perf tests, and single-card demos as it hangs on di/dt
- PR: #10179
#10045: use struct for matmul parameter passing and update doc string
- PR: #10153
#10045: remove use_1d_systolic_array from ttnn matmul
- PR: #10164
Ngrujic/profiling
- PR: #10150
#9319: Upload benchmark data for t3k falcon 7b tests
- PR: #10159
Aliu/build opt
- PR: #10096
#10107: Fix hangs w/ launch_msg size >32bytes
- PR: #10157
[CCL] Making buffer size dynamic to input slice
- PR: #10173
#7617: remove failing experimental model test
- PR: #10205
#7618: delete failing experimental model test
- PR: #10214
#0: fix prefill CI for mamba
- PR: #10227
Move Mamba tests to wh_b0_only_eth pipeline
- PR: #10206
#9747: Implement ttnn::tilize in C++
- PR: #10188
Aliu/prevent aho tanking
- PR: #10216
#10045: fix up missed parameter change in mamba block model
- PR: #10225
#9490: Added ttnn support for unary ops py file
- PR: #9883
#10101: [Blackhole Bringup] Revert Zeroacc to legacy behaviour
- PR: #10217
Update README.md
- PR: #10176
#0: Fix imports after tt_lib change
- PR: #10235
#10226: [Blackhole Bringup] Add new sfpu files
- PR: #10233
Suppress g++-12 build errors with -Wno flags
- PR: #10204
#0: Fix BH regression caused by unaligned L1_UNRESERVED_BASE
- PR: #10220
#10077: Migrate Unary comparison backward ops to TTNN with Overloading
- PR: #10198
#10175: Remove std::function and restructure ternary_bw
- PR: #10169
Falcon40b attn mask optimization
- PR: #10089
#10074: Move Unary backward ops to TTNN
- PR: #10196
Replace all TT Lib Unpad with TTNN Slice
- PR: #10104
#10082: Migrate unary bw ops to TTNN and remove std::function
- PR: #10239
#9715: Use build artifacts for profiler tests
- PR: #10218
#9021: adding resnet api into ci.
- PR: #10008
Update README.md
- PR: #10247
Move pad_on_host/unpad_on_host to host function in TTNN
- PR: #10178
#9874: Move polygamma_bw to TTNN
- PR: #10146
#5337: increase t3k frequent test timeout
- PR: #10202
Update falcon40b readme
- PR: #10261
#0: add layernorm rmsnorm pybind, move to ttnn
- PR: #10012
#0: Re-enable read cache in llama_model_optimized.
- PR: #10208
Update Mistral/Mixtral README files
- PR: #10259
#0: Update LLama2/3 readme with demo details
- PR: #10263
#0: resnet perf fix
- PR: #10273
Update Mamba README.md
- PR: #10262
OPT convs in RN50 to get better device perf
- PR: #10279
Increase timeout for N300 WH-only model pipeline
- PR: #10287
Prefill+Decode Demo Functional Implementation
- PR: #10281
[Falcon7b] Add wormhole demo perf mode and output verification tests
- PR: #10269
Update Falcon7/40b READMEs with details on model functionality and perf-mode
- PR: #10290
bump python 3.8 venv package version
- PR: #10315
Git bisect workflow on CI runners
- PR: #10316
#9613: scaffolding for weekly scheduled t3k perplexity tests
- PR: #10142
fix syntax issue with bisect script
- PR: #10328
#10231: Clean up t3k runs-on tags to minimum
- PR: #10232
#9490: Remove tt_eager unary ops and bindings
- PR: #10194
only build for arch that a dispatched workflow is running for
- PR: #10318
Allow overloading of job name with user-defined name for new dispatch workflows
- PR: #10331
#10242: Migrate unary bw ops with a generalized structure to TTNN
- PR: #10243
#10322: commented out failing t3k tests
- PR: #10327
#9491: Add structure for ternary ops in ttnn
- PR: #10240
Move downsample from tt_eager to ttnn
- PR: #9951
#10250: Migrate unary backward ops with a generalized structure to TTNN
- PR: #10253
#10280: Mistral README update
- PR: #10309
#9911: Add structure and migrate 20 composite unary ops
- PR: #9913
#0: fix rn50 block padding
- PR: #10329
#10300: get the correct operation id on subsequent run
- PR: #10303
#0: Move host tensor construction for halo into create_program to only happen on uncached runs
- PR: #10221
Flash decode v2
- PR: #10313
#9751: Restructure ttnn transformers to new folder structure
- PR: #10353
#10181: Disable test_reduce_h due to sporadic failures in slow dispatch
- PR: #10359
#10181: Disable test_reduce_h
- PR: #10362
Update README.md
- PR: #10369
move groupnorm from ttlib to ttnn
- PR: #10363
Update README.md
- PR: #10370
Update README.md - missing footnote
- PR: #10372
#0: Update ttnn resnet 2cq bound due to variability
- PR: #10368

Assets 7

16 Jul 02:20

github-actions

v0.51.0-rc3

e1835e2

v0.51.0-rc3 Pre-release

Pre-release

📦 Uncategorized

Migrate Pad Device and All references
- PR: #9891
#0: Multi-CQ support for R-Chip
- PR: #10002
#10028: Remove skip and reduce test case for moreh_groupnorm test
- PR: #10029
#10005: Change input tensor parameter to optional in moreh_sum_backward
- PR: #10007
#10004: Revise bias tensor usage in moreh_linear_backward
- PR: #10006
#9663: support moreh_nll_loss_unreduced
- PR: #9804
#8865: Switch ported ops from tt_lib to ttnn for host dispatch time m…
- PR: #10009
#0: Update README.md grammar for idiomatic description of TT-NN
- PR: #9827
#9767: removed more no longer needed manually specified attributes for reflection
- PR: #10023
Add distributed layernorm kernel documentation
- PR: #9982
#10031: Fix -Werror=return-type error in composite_ops
- PR: #10036
#9492: update matmul path in CODEOWNERS
- PR: #10022
#9450: change silicon fixtures to session scope
- PR: #10019
Uplift UMD to grab support for configuring static TLBs and Hugepage for BH
- PR: #9934
#9441: add all typecasts to unit test
- PR: #10046
#9801: Add cb alignment fix for blackhole that was missed in rebase
- PR: #10051
#9973: Fix addrmod for reduce scalar, port over missing narrow tile c…
- PR: #10047
#10052: Add metal pack untilize test
- PR: #10057
Add ttnn matmul tests to TG unit tests
- PR: #9477
Add ssm_prefix_scan test coverage for N=16
- PR: #10061
Add PyBind to TTNN Slice (Formerly Referred to Unpad in TT Lib)
- PR: #10056
#8450: Cleanup items pending from PR #9068
- PR: #10053
#10030: fix moreh_nll_loss hang
- PR: #10040
#7736: Remove unused reduce dim & type from reduce_init*
- PR: #10060
#9871: Update backward files
- PR: #10037
#9874: Move Unary Backward ops to TTNN
- PR: #9949
Update op_perf_results
- PR: #10042
#9962: Enable flags for profiler globals in jit build
- PR: #9964
Added prefill mode for mamba modules
- PR: #10063
Increase timeout for Mamba full model tests
- PR: #10064
Support multiple user indices in paged_update_cache
- PR: #10050
#10085: Make ttnn::Buffer deallocate execute without querying a potentially destroyed buffer instance
- PR: #10095
Pack runtime arguments across brisc/ncrisc/trisc
- PR: #9781
Llama Demo Refactor
- PR: #10018
#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions
- PR: #10103
#0: Move t3k demo tests to perf pipeline because it requires perf governor
- PR: #10106
#5424: Delegated sfpu reciprocal calls to gs submodule functions
- PR: #10105
Add trace and multi cq implementations/tests for WH Resnet
- PR: #10021
#0: (MINOR) Update to v0.51.0
- PR: #10114
#0: bump python3.8 venv versioning since apt repos updated
- PR: #10111
#10099: fix semaphores init for packet mux/demux
- PR: #10134
#10112: Drop hard pin for installation instructions for python3.8-venv in dependencies
- PR: #10113
Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions"
- PR: #10135
#0: Remove stray assert forcing single CQ on R-Chips
- PR: #10098
#9490: Replace tt_dnn op's usage in C++ with TTNN
- PR: #9821
#9874: Merge Next set of unary backward ops to TTNN
- PR: #10066
#10073: Move unary backward ops to TTNN
- PR: #10065
Unary backward op migration
- PR: #10078
#10087: update tt-umd submodule
- PR: #10092
#9959: Migrated pad to ttnn sweeps
- PR: #10067
Adding distributed layernorm to llama prefill
- PR: #10054
Add pytest xdist multiprocess to single-chip demo tests
- PR: #10162
Revert "Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions""
- PR: #10171
#10071 : Move second set of Unary Backward ops to TTNN
- PR: #10038
#10083: added tt::stl::json::to_json and tt::stl::json::from_json
- PR: #10084
#10086: Add logic for splitting cmds that exceed the subcmd limit into separate cmds for semaphores
- PR: #10151
#5424: Delegated sqrt api call to thirdparty gs submodule sqrt call
- PR: #10183
#5424: Delegated sfpu api call to sqrt for wh to submodule sqrt call
- PR: #10185
#0: Fix galaxy eth dispatch init to only init the specified number of cqs (galaxy only supports single cq)
- PR: #10187
Fix undefined memory bug in ssm_prefix_scan
- PR: #10149
removed weight copies from DRAM to L1
- PR: #10189
fix syntax issues with test dispatch workflow
- PR: #10182
#9609: Reorganize libs into ttnn
- PR: #9870
#10165: Fix build error with g++-12
- PR: #10167
Adding support for dram sharded matmuls
- PR: #9878
#10076: Migrate Unary bw ops and replace tt_eager ops with ttnn ops
- PR: #10140
#10072: Move next set of Unary Backward ops to TTNN
- PR: #10080
#9082: ping individual falcon member since slack user group is not wo…
- PR: #10193
#8681: Add Floor, Trunc blocker ops
- PR: #9098
#9419: use memcpy to avoid mem misalignment
- PR: #10154
#10079: Move Unary Backward ops to TTNN
- PR: #10145
Migrate unary ops to TTNN
- PR: #10152
#9945: Skip SD for nightly FD, device perf tests, and single-card demos as it hangs on di/dt
- PR: #10179
#10045: use struct for matmul parameter passing and update doc string
- PR: #10153
#10045: remove use_1d_systolic_array from ttnn matmul
- PR: #10164
Ngrujic/profiling
- PR: #10150
#9319: Upload benchmark data for t3k falcon 7b tests
- PR: #10159
Aliu/build opt
- PR: #10096
#10107: Fix hangs w/ launch_msg size >32bytes
- PR: #10157
[CCL] Making buffer size dynamic to input slice
- PR: #10173
#7617: remove failing experimental model test
- PR: #10205
#7618: delete failing experimental model test
- PR: #10214
#0: fix prefill CI for mamba
- PR: #10227
Move Mamba tests to wh_b0_only_eth pipeline
- PR: #10206
#9747: Implement ttnn::tilize in C++
- PR: #10188
Aliu/prevent aho tanking
- PR: #10216
#10045: fix up missed parameter change in mamba block model
- PR: #10225
#9490: Added ttnn support for unary ops py file
- PR: #9883
#10101: [Blackhole Bringup] Revert Zeroacc to legacy behaviour
- PR: #10217
Update README.md
- PR: #10176
#0: Fix imports after tt_lib change
- PR: #10235
#10226: [Blackhole Bringup] Add new sfpu files
- PR: #10233
Suppress g++-12 build errors with -Wno flags
- PR: #10204
#0: Fix BH regression caused by unaligned L1_UNRESERVED_BASE
- PR: #10220
#10077: Migrate Unary comparison backward ops to TTNN with Overloading
- PR: #10198
#10175: Remove std::function and restructure ternary_bw
- PR: #10169
Falcon40b attn mask optimization
- PR: #10089
#10074: Move Unary backward ops to TTNN
- PR: #10196
Replace all TT Lib Unpad with TTNN Slice
- PR: #10104
#10082: Migrate unary bw ops to TTNN and remove std::function
- PR: #10239
#9715: Use build artifacts for profiler tests
- PR: #10218
#9021: adding resnet api into ci.
- PR: #10008
Update README.md
- PR: #10247
Move pad_on_host/unpad_on_host to host function in TTNN
- PR: #10178
#9874: Move polygamma_bw to TTNN
- PR: #10146
#5337: increase t3k frequent test timeout
- PR: #10202
Update falcon40b readme
- PR: #10261
#0: add layernorm rmsnorm pybind, move to ttnn
- PR: #10012
#0: Re-enable read cache in llama_model_optimized.
- PR: #10208
Update Mistral/Mixtral README files
- PR: #10259
#0: Update LLama2/3 readme with demo details
- PR: #10263
#0: resnet perf fix
- PR: #10273
Update Mamba README.md
- PR: #10262
OPT convs in RN50 to get better device perf
- PR: #10279
Increase timeout for N300 WH-only model pipeline
- PR: #10287
Prefill+Decode Demo Functional Implementation
- PR: #10281
[Falcon7b] Add wormhole demo perf mode and output verification tests
- PR: #10269
Update Falcon7/40b READMEs with details on model functionality and perf-mode
- PR: #10290
bump python 3.8 venv package version
- PR: #10315
Git bisect workflow on CI runners
- PR: #10316
#9613: scaffolding for weekly scheduled t3k perplexity tests
- PR: #10142
fix syntax issue with bisect script
- PR: #10328
#10231: Clean up t3k runs-on tags to minimum
- PR: #10232
#9490: Remove tt_eager unary ops and bindings
- PR: #10194
only build for arch that a dispatched workflow is running for
- PR: #10318

Assets 7

15 Jul 02:19

github-actions

v0.51.0-rc2

f1dc594

v0.51.0-rc2 Pre-release

Pre-release

📦 Uncategorized

Migrate Pad Device and All references
- PR: #9891
#0: Multi-CQ support for R-Chip
- PR: #10002
#10028: Remove skip and reduce test case for moreh_groupnorm test
- PR: #10029
#10005: Change input tensor parameter to optional in moreh_sum_backward
- PR: #10007
#10004: Revise bias tensor usage in moreh_linear_backward
- PR: #10006
#9663: support moreh_nll_loss_unreduced
- PR: #9804
#8865: Switch ported ops from tt_lib to ttnn for host dispatch time m…
- PR: #10009
#0: Update README.md grammar for idiomatic description of TT-NN
- PR: #9827
#9767: removed more no longer needed manually specified attributes for reflection
- PR: #10023
Add distributed layernorm kernel documentation
- PR: #9982
#10031: Fix -Werror=return-type error in composite_ops
- PR: #10036
#9492: update matmul path in CODEOWNERS
- PR: #10022
#9450: change silicon fixtures to session scope
- PR: #10019
Uplift UMD to grab support for configuring static TLBs and Hugepage for BH
- PR: #9934
#9441: add all typecasts to unit test
- PR: #10046
#9801: Add cb alignment fix for blackhole that was missed in rebase
- PR: #10051
#9973: Fix addrmod for reduce scalar, port over missing narrow tile c…
- PR: #10047
#10052: Add metal pack untilize test
- PR: #10057
Add ttnn matmul tests to TG unit tests
- PR: #9477
Add ssm_prefix_scan test coverage for N=16
- PR: #10061
Add PyBind to TTNN Slice (Formerly Referred to Unpad in TT Lib)
- PR: #10056
#8450: Cleanup items pending from PR #9068
- PR: #10053
#10030: fix moreh_nll_loss hang
- PR: #10040
#7736: Remove unused reduce dim & type from reduce_init*
- PR: #10060
#9871: Update backward files
- PR: #10037
#9874: Move Unary Backward ops to TTNN
- PR: #9949
Update op_perf_results
- PR: #10042
#9962: Enable flags for profiler globals in jit build
- PR: #9964
Added prefill mode for mamba modules
- PR: #10063
Increase timeout for Mamba full model tests
- PR: #10064
Support multiple user indices in paged_update_cache
- PR: #10050
#10085: Make ttnn::Buffer deallocate execute without querying a potentially destroyed buffer instance
- PR: #10095
Pack runtime arguments across brisc/ncrisc/trisc
- PR: #9781
Llama Demo Refactor
- PR: #10018
#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions
- PR: #10103
#0: Move t3k demo tests to perf pipeline because it requires perf governor
- PR: #10106
#5424: Delegated sfpu reciprocal calls to gs submodule functions
- PR: #10105
Add trace and multi cq implementations/tests for WH Resnet
- PR: #10021
#0: (MINOR) Update to v0.51.0
- PR: #10114
#0: bump python3.8 venv versioning since apt repos updated
- PR: #10111
#10099: fix semaphores init for packet mux/demux
- PR: #10134
#10112: Drop hard pin for installation instructions for python3.8-venv in dependencies
- PR: #10113
Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions"
- PR: #10135
#0: Remove stray assert forcing single CQ on R-Chips
- PR: #10098
#9490: Replace tt_dnn op's usage in C++ with TTNN
- PR: #9821
#9874: Merge Next set of unary backward ops to TTNN
- PR: #10066
#10073: Move unary backward ops to TTNN
- PR: #10065
Unary backward op migration
- PR: #10078
#10087: update tt-umd submodule
- PR: #10092
#9959: Migrated pad to ttnn sweeps
- PR: #10067
Adding distributed layernorm to llama prefill
- PR: #10054
Add pytest xdist multiprocess to single-chip demo tests
- PR: #10162
Revert "Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions""
- PR: #10171
#10071 : Move second set of Unary Backward ops to TTNN
- PR: #10038
#10083: added tt::stl::json::to_json and tt::stl::json::from_json
- PR: #10084
#10086: Add logic for splitting cmds that exceed the subcmd limit into separate cmds for semaphores
- PR: #10151
#5424: Delegated sqrt api call to thirdparty gs submodule sqrt call
- PR: #10183
#5424: Delegated sfpu api call to sqrt for wh to submodule sqrt call
- PR: #10185
#0: Fix galaxy eth dispatch init to only init the specified number of cqs (galaxy only supports single cq)
- PR: #10187
Fix undefined memory bug in ssm_prefix_scan
- PR: #10149
removed weight copies from DRAM to L1
- PR: #10189
fix syntax issues with test dispatch workflow
- PR: #10182
#9609: Reorganize libs into ttnn
- PR: #9870
#10165: Fix build error with g++-12
- PR: #10167
Adding support for dram sharded matmuls
- PR: #9878
#10076: Migrate Unary bw ops and replace tt_eager ops with ttnn ops
- PR: #10140
#10072: Move next set of Unary Backward ops to TTNN
- PR: #10080
#9082: ping individual falcon member since slack user group is not wo…
- PR: #10193
#8681: Add Floor, Trunc blocker ops
- PR: #9098
#9419: use memcpy to avoid mem misalignment
- PR: #10154
#10079: Move Unary Backward ops to TTNN
- PR: #10145
Migrate unary ops to TTNN
- PR: #10152
#9945: Skip SD for nightly FD, device perf tests, and single-card demos as it hangs on di/dt
- PR: #10179
#10045: use struct for matmul parameter passing and update doc string
- PR: #10153
#10045: remove use_1d_systolic_array from ttnn matmul
- PR: #10164
Ngrujic/profiling
- PR: #10150
#9319: Upload benchmark data for t3k falcon 7b tests
- PR: #10159
Aliu/build opt
- PR: #10096
#10107: Fix hangs w/ launch_msg size >32bytes
- PR: #10157
[CCL] Making buffer size dynamic to input slice
- PR: #10173
#7617: remove failing experimental model test
- PR: #10205
#7618: delete failing experimental model test
- PR: #10214
#0: fix prefill CI for mamba
- PR: #10227
Move Mamba tests to wh_b0_only_eth pipeline
- PR: #10206
#9747: Implement ttnn::tilize in C++
- PR: #10188
Aliu/prevent aho tanking
- PR: #10216
#10045: fix up missed parameter change in mamba block model
- PR: #10225
#9490: Added ttnn support for unary ops py file
- PR: #9883
#10101: [Blackhole Bringup] Revert Zeroacc to legacy behaviour
- PR: #10217
Update README.md
- PR: #10176
#0: Fix imports after tt_lib change
- PR: #10235
#10226: [Blackhole Bringup] Add new sfpu files
- PR: #10233
Suppress g++-12 build errors with -Wno flags
- PR: #10204
#0: Fix BH regression caused by unaligned L1_UNRESERVED_BASE
- PR: #10220
#10077: Migrate Unary comparison backward ops to TTNN with Overloading
- PR: #10198
#10175: Remove std::function and restructure ternary_bw
- PR: #10169
Falcon40b attn mask optimization
- PR: #10089
#10074: Move Unary backward ops to TTNN
- PR: #10196
Replace all TT Lib Unpad with TTNN Slice
- PR: #10104
#10082: Migrate unary bw ops to TTNN and remove std::function
- PR: #10239
#9715: Use build artifacts for profiler tests
- PR: #10218
#9021: adding resnet api into ci.
- PR: #10008
Update README.md
- PR: #10247

Assets 7

11 Jul 02:01

github-actions

v0.51.0-rc1

07aacde

v0.51.0-rc1 Pre-release

Pre-release

📦 Uncategorized

Migrate Pad Device and All references
- PR: #9891
#0: Multi-CQ support for R-Chip
- PR: #10002
#10028: Remove skip and reduce test case for moreh_groupnorm test
- PR: #10029
#10005: Change input tensor parameter to optional in moreh_sum_backward
- PR: #10007
#10004: Revise bias tensor usage in moreh_linear_backward
- PR: #10006
#9663: support moreh_nll_loss_unreduced
- PR: #9804
#8865: Switch ported ops from tt_lib to ttnn for host dispatch time m…
- PR: #10009
#0: Update README.md grammar for idiomatic description of TT-NN
- PR: #9827
#9767: removed more no longer needed manually specified attributes for reflection
- PR: #10023
Add distributed layernorm kernel documentation
- PR: #9982
#10031: Fix -Werror=return-type error in composite_ops
- PR: #10036
#9492: update matmul path in CODEOWNERS
- PR: #10022
#9450: change silicon fixtures to session scope
- PR: #10019
Uplift UMD to grab support for configuring static TLBs and Hugepage for BH
- PR: #9934
#9441: add all typecasts to unit test
- PR: #10046
#9801: Add cb alignment fix for blackhole that was missed in rebase
- PR: #10051
#9973: Fix addrmod for reduce scalar, port over missing narrow tile c…
- PR: #10047
#10052: Add metal pack untilize test
- PR: #10057
Add ttnn matmul tests to TG unit tests
- PR: #9477
Add ssm_prefix_scan test coverage for N=16
- PR: #10061
Add PyBind to TTNN Slice (Formerly Referred to Unpad in TT Lib)
- PR: #10056
#8450: Cleanup items pending from PR #9068
- PR: #10053
#10030: fix moreh_nll_loss hang
- PR: #10040
#7736: Remove unused reduce dim & type from reduce_init*
- PR: #10060
#9871: Update backward files
- PR: #10037
#9874: Move Unary Backward ops to TTNN
- PR: #9949
Update op_perf_results
- PR: #10042
#9962: Enable flags for profiler globals in jit build
- PR: #9964
Added prefill mode for mamba modules
- PR: #10063
Increase timeout for Mamba full model tests
- PR: #10064
Support multiple user indices in paged_update_cache
- PR: #10050
#10085: Make ttnn::Buffer deallocate execute without querying a potentially destroyed buffer instance
- PR: #10095
Pack runtime arguments across brisc/ncrisc/trisc
- PR: #9781
Llama Demo Refactor
- PR: #10018
#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions
- PR: #10103
#0: Move t3k demo tests to perf pipeline because it requires perf governor
- PR: #10106
#5424: Delegated sfpu reciprocal calls to gs submodule functions
- PR: #10105
Add trace and multi cq implementations/tests for WH Resnet
- PR: #10021
#0: (MINOR) Update to v0.51.0
- PR: #10114

Assets 7

10 Jul 22:04

github-actions

v0.50.0

f7c10a2

v0.50.0

📦 Uncategorized

Fix issue with Mamba SSM A weight preprocessing
- PR: #9443
Make buid key unique for mmio and remote devices with same harvest mask
- PR: #9435
#5337: Removed eth_dispatch yaml flag from mistral tests
- PR: #9421
New workflow for custom test dispatch on CI runners
- PR: #9536
#9312: Add single-header boost-ext/reflect library as dependency
- PR: #9328
Opt LayerNorm/RMSNorm with 2D reduce
- PR: #9603
Revert "#8630: support uint8 data type"
- PR: #9649
#0: Fix codeowners for metal bert
- PR: #9635
Revert "Revert "#8630: support uint8 data type""
- PR: #9651
#9642: fix matmul2d in1 sharded with batch>1
- PR: #9655
#0: add tile layout support for GN
- PR: #9645
FD2 packed binary commands
- PR: #9572
#9082: t3k demo with slack notifications for owners. split jobs
- PR: #9625
Rtawfik/issue 9142
- PR: #9674
#9688: Remove redundant left shift in DEBUG_SANITIZE_NOC_READ_TRANSACTION_FROM_STATE
- PR: #9689
#9500: Update eth_interface include in tt_cluster to not be hardcoded for WH
- PR: #9501
#9578: Add WITH_PYTHON_BINDINGS option to allow build w/o python
- PR: #9662
#9587: Update CB and worker Go signals to respect max sub cmd limit introduced by dispatch packed write local copy change
- PR: #9670
Add support for bfloat4 weights in Mamba
- PR: #8869
Use in-place binary operations in Mamba block
- PR: #9726
#5337: Relaxed Mistral expected compilation time in CI by 1 sec
- PR: #9731
Mo/9406 profiler build flags
- PR: #9549
Add support for single col/row/core output grid for matmul 2D
- PR: #9683
#9725: Set release candidate releases on GitHub to pre-release, not draft, to enable downstream users
- PR: #9729
add tagged docker image with releases
- PR: #9693
Rtawfik/issue 9164
- PR: #9700
#5562: resolve reduce scatter issues (nd hang and correctness)
- PR: #9423
Create benchmarking tools for saving run/measurement data (with Falcon7b example) and model-demo utilities for verifying tokens/perf
- PR: #9659
#0: Fix bug with var name in single-chip falcon7b demo tests
- PR: #9740
#9735: fix issues with including reflect library
- PR: #9737
#9527: Remove usage of bcast where multiply is used
- PR: #9717
Mchiou/9082 slack notification owners
- PR: #9690
#9681: set name attribute for ttnn operations when fast runtime m…
- PR: #9730
#9553: Add prefix scan op for Mamba prefill
- PR: #9554
#9628: Merge Binary backward ops from tt_eager to TTNN
- PR: #9570
Namhyeong kim/support fp32 dest acc in moreh adam
- PR: #9135
#0: Update t3k workflow timeouts (except freq pipeline)
- PR: #9772
Temporary update Mixtral perf times to pass CI
- PR: #9673
#9479: fix cpu core worker bug
- PR: #9739
#4858: add typecast fp32 <-> int32
- PR: #9736
#0: ViT demo fix
- PR: #9768
#9389: Add support for integer type in sum operation
- PR: #9548
Transfer llama2/3 from experimental to demo folder.
- PR: #9716
#9657: add topk multicore to support larger dimension sizes
- PR: #9718
#4858: add typecast bfp8_b
- PR: #9779
#9082: t3k model perf split tests with slack notifications, disabled cnn
- PR: #9761
#0: Add ttnn/cpp to packages to enable using ttnn kernels in tt_eager ops
- PR: #9784
#9741: Set stricter pytest timeouts
- PR: #9742
#9492: Change models matmul usage to ttnn
- PR: #9727
#9778: test prefetcher hanging with changes to test
- PR: #9795
#9490: TTNN eltwise/unary migration
- PR: #9732
Update timeout for falcon40b t3k demo test
- PR: #9777
#0: Remove extra t3k falcon40b matrix test group
- PR: #9802
#9044: Move dispatch core x y to be part of launch msg
- PR: #9743
Modify rot mat each iteration to avoid allocating 10k tensors upfront
- PR: #9809
Optimize bcast sharded op
- PR: #9822
Start using reflect library
- PR: #9780
#0: Properly delete source folders for wheel testing
- PR: #9829
#9479: Update Mixtral perf estimates
- PR: #9803
#0: Added github community issue workflow
- PR: #9833
#8729: Pytest multiprocess reset infrastructure
- PR: #9677
Enable switching between 1 and 2 cqs in the same process
- PR: #9832
Fixed failing tests for SD Conv tests for WH using new conv
- PR: #9799
#0: Switch org-membership check to an authenticated call
- PR: #9840
#0: Decrease num loops in trace stress tests
- PR: #9724
#9628: Support optional return tensor
- PR: #9769
#0: Use CV to wait for cq_reader in production mode. Remove enqueue_record_event for NB calls
- PR: #9793
#9628: Merge second set of binary backward op from tt_eager to TTNN
- PR: #9771
#0: Bump bert compile time threshold since it's been intermittently failing on ci
- PR: #9844
Mchiou/9792 t3k runner management
- PR: #9847
#0: Bump up Bert inference time due to instability on ci
- PR: #9850
#8865: For host dispatch time measureing increese failing reference t…
- PR: #9438
#9484: Add output_tensor queue_id to dependency ops
- PR: #9494
Adding the new op: Flash Decode!
- PR: #9794
#0: Add missing permissions to issue notification job
- PR: #9863
#9275: Fix Falcon7b demo failing to run by default on an Grayskull e75
- PR: #9859
#9801: Account for 64B BH PCIe alignment in cq cmd sizing
- PR: #9862
#0: Make prefetcher early exit after fetching/reading exec_buf
- PR: #9856
#8683: Add Unary bitwise AND, OR
- PR: #9437
Ngrujic/profiling
- PR: #9875
#9628: Merge third set of binary backward op from tt_eager to TTNN
- PR: #9846
#4858: add typecast uint32
- PR: #9843
Migrate Pad Host Code, Bindings, C++ Usages from TT Eager to TTNN
- PR: #9816
Support longer sequence lengths in ssm_prefix_scan
- PR: #9776
#9709: Add optional transpose_a and transpose_b to ttnn matmul and linear
- PR: #9836
#0: Only run batch 12 bert for GS profiling and tighten some bert/resnet thresholds
- PR: #9851
Asarje/resnet highres 20240624
- PR: #9660
#9492: replace falcon specific matmul calls
- PR: #9810
Extend ssm_eltwise_mul for num_users > 32
- PR: #9867
Update documentation for adding new ttnn operation
- PR: #9841
Extend ssm_1d_reduce for the batch>32
- PR: #9881
#0: rn50 fix add api
- PR: #9890
#9123: Add support for optional output tensors to run in the worker t…
- PR: #9894
#9861: support check_tensor helper_function
- PR: #9869
Fix syntax issues in custom test dispatch workflow
- PR: #9567
Add Mixtral accuracy tests and cleanup its other tests (CI-friendly)
- PR: #9864
#9876: Increase timeout on falcon7b perplexity tests.
- PR: #9880
#9492: Remove bmm/resnet_matmul from models
- PR: #9896
#9410: enable fp32 precision unpacking for interm. CBs
- PR: #9885
#9903: Fix conditional statements and indexing of y values in CoreRange::diff
- PR: #9915
#9860: fix test create device apis
- PR: #9919
#0: delete unused code
- PR: #9921
#9719: fixed l1 clear issue on nlp create qkv heads decode test case
- PR: #9924
Fixing type in llama demo readme
- PR: #9927
#9892: Device only op report
- PR: #9914
#8704: define consts for registers that hold x-y coordinates and amount to shift address to get x-y coord
- PR: #9897
CODEOWNERS update
- PR: #9930
Abhullar/bh misc fix
- PR: #9899
Auto-register C++ ttnn operations in python
- PR: #9900
#9788: Remove TopK from TTLib and replace all references with the TTNN api
- PR: #9884
#0: add owners for resnet demo
- PR: #9937
7-way split of eager tests
- PR: #9950
#9910: Improve Softplus kernel accuracy
- PR: #9893
#9818: Add cache check to op info V2
- PR: #9826
#0: update noc test bound
- PR: #9922
Fix branching bug in softplus kernel
- PR: #9955
propagate error upwards for tests in falcon 40b suite
- PR: #9957
#0: Fix falcon40b softmax import failure
- PR: #9958
#9755: move ttnn.concat to match the new file structure
- PR: #9923
#9837: Assign workers after performing ref count cleanup in async mode
- PR: #9944
#0: Make event_synchronize API safer
- PR: #9965
#0: Update buffer asserts to account for trace buffers
- PR: #9918
Clean up ttnn operation registration on python side
- PR: #9961
#9164: [Blackhole bringup] Add fix for unpack untilize
- PR: #9967
Aliu/no l1 clear
- PR: #9931
Restructure ttnn::permute to match the new standard format
- PR: #9917
#9815: Update host to pass packed write max unicast sub cmds to cq dispatch
- PR: #9868
Distributed layernorm op
- PR: #9382
#9831: re-enable test
- PR: #9976
#8835: cleaned up ttnn operation registration on C++ side
- PR: #9975
#9941: update dram/l1 to noc xy header to do the appropriate shift
- PR: #9948
#9336: Refactoring moreh layernorm
- PR: #9636
#9745: move unpad to slice ttnn cpp references
- PR: #9970
#9980: Update falcon updated outputs
- PR: #9981
Fix Main after Pad Merge
- PR: #9988
Update eltwise bcast unary ops to use memory_config and fix PCC issue for interleaved output
- PR: #9939
Update FD cmds to be PCIe aligned
- PR: #9929
Fix N150 product name to nebula_x1 even if its unharvested.
- PR: #9925
#0: add a second codeowner for conv
- PR: #9990
#0: Get tt-metal to compile with gcc-12
- PR: #9943
#9492: Change to ttnn matmul in tests and tt_eager
- PR: #9928
#9441: add typecast uint16->uint32
- PR: #9991
Move ttnn::embedding to match new pybind structure and replace C++ ttlib embeddings usage with it
- PR: #9969
  -...

Assets 7

12 Jun 14:05

github-actions

v0.49.0

d35ea9d

v0.49.0

📦 Uncategorized

#5044: Add optional output to addalpha
- PR: #8785
#9059: Fix matmul for single core grid
- PR: #9341
readme update
- PR: #9352
#0: (MINOR) Update to v0.49.0
- PR: #9353
#7586: Move common models for single-card nightly to ln model
- PR: #9351
Update Mamba README
- PR: #9344
TTLIB interval to sharded sweeps
- PR: #9003
#0: Update dataflow api comments
- PR: #9343
#9196: Merge new op: Fast reduce nc into main
- PR: #9330
#0: New resnet50 test skipped on WH since its WIP
- PR: #9355
#9329: Restructure ttnn::argmax
- PR: #9331
#9323: Introduce template for new ttnn pull requests
- PR: #9324
#0: skip release build on GH runners, we already test it via build a…
- PR: #9362
Remove unused dependencies and fetch gtest via CPM
- PR: #9332
#8764: Part 3 of docs and model demos changes
- PR: #9350
Ngrujic/profiling
- PR: #8939
[Mistral-7B] Add flags for weight paths
- PR: #9173
Typecast int32->fp16b
- PR: #9317
#9258: Remove ARCH_NAME and TT_METAL_ENV from wheel testing
- PR: #9354
Implemented SD using new Conv API
- PR: #8786
#9258: Re-add wheel into release assets
- PR: #9374
#9361: Install Clang-17 and gdb 14.2
- PR: #9363
#7525: Re-skip demo batch 7 metal_BERT_large_11 on WH because it still hangs ND
- PR: #9385
#9206: add sfpu config reg init to llk sfpu inits
- PR: #9358
#9059: Avoid a couple of fatals in matmul
- PR: #9387
Add Galaxy support.
- PR: #9068

Assets 7

10 Jun 18:09

github-actions

v0.48.0

a19eb11

v0.48.0

📦 Uncategorized

#7744: Add support for non-4D tensor in moreh_sum, moreh_sum_backward
- PR: #7745
#5544: Add output tensors parameter to moreh_nll_loss op
- PR: #7194
#5544: Add output tensors parameter to moreh_sgd op
- PR: #7193
#5544: Fix package build error
- PR: #7818
#5544: Add output tensors parameter to moreh_linear op
- PR: #7147
#5544: Prevent eager unit test failures
- PR: #7835
#7997: Support non-4D tensor in moreh_softmax
- PR: #7998
#7816: Bump SD perf target
- PR: #8140
#8098: Remove temp buffer copying when reading from hugepage to host buffer
- PR: #8138
#0: Specify DEBUG_STATUS as a string literal instead of multiple chars
- PR: #7981
#8212: Fix uneven shards for interleaved_to_sharded op
- PR: #8259
#0: Refactor unpad tile to modify rt args in place and remove dynamic…
- PR: #8308
#7838: Add support for non-4D tensor in moreh_linear OPs
- PR: #8388
#0: Use split_work_for_tilize in both tilize and untilize
- PR: #8470
#8131: resnet-50 fix for b20.
- PR: #8283
Add support for multiple parameters in EltwiseUnary
- PR: #8398
#7625: Enable multicore for tilize with padding by default
- PR: #8527
Trace Support
- PR: #8572
#0: Switch set runtime args assertion for if kernel was placed on core to TT_ASSERT
- PR: #8645
#7179: enabling test case. The issue was not reproducible on 8.12 dri…
- PR: #8613
#4625: Multicore runs for untilize with unpadding on interleaved tensors
- PR: #8622
#0: Cache program cmds, convert cb configs from write linear to write packed
- PR: #8604
#0: Make skip and xfail optional in defining sweep tests
- PR: #8687
Shwetank tt/bcast op
- PR: #8058
#8364: Disable implicit fallback for ttnn.pad
- PR: #8742
#8513: Add slack notifications to several more pipelines
- PR: #8685
#0: Update common RT args to use no stride flag for packed cmd.
- PR: #8696
#0: Option to write compile_commands.json from CMake
- PR: #8761
#8718: eltwise testing for bfloat8
- PR: #8753
Add support for bfloat8 input tensors in Mamba SSM block custom kernels
- PR: #8733
#8460: Enable Clang-17
- PR: #8516
#0: Remove overhead in calling functions wrapped in tensor_impl_wrapper
- PR: #8840
#0: Updating the perf thresold to incorporate Merge back uneven reshard commit.
- PR: #8849
#6365: Add ttnn host tests
- PR: #8210
#6365: Revert "#6365: Add ttnn host tests (#8210)"
- PR: #8879
#4382: fix GH reported vulnerabilities
- PR: #8876
#0: bump C++ timeout limit to 45 minutes
- PR: #8882
update unpad doc for slice generality
- PR: #8878
Convert Falcon7b tt_lib ops and tensors to ttnn.experimental
- PR: #8870
#6365: Fix ttnn host wheel tests
- PR: #8897
Add git bisect script
- PR: #8894
#0: Move falcon40b ci unit tests to different pipeline
- PR: #8891
#8437: remove default matmul program config
- PR: #8772
#0: Add myself to ttnn codeowners
- PR: #8905
#0: Update README.md to include mention of TTNN_CONFIG_OVERRIDES
- PR: #8909
#0: Fix typos and add TTNN_CONFIG_OVERRIDES parameter descriptions to readme
- PR: #8910
#0: Add basic sanity checks during matmul program config creation
- PR: #8875
#8907: Sweep tests for tilize/untilize
- PR: #8908
#8902: Fixed program caching bug in nlp load slice op and added additional test cases for the op
- PR: #8913
#8917: Add sweep test for the fold op
- PR: #8918
#0: Properly support trivial single core case for 1D matmuls
- PR: #8915
#6343: updated test_perf with test for bloom causal_lm
- PR: #8391
#6343: Add functional_bloom test_demo
- PR: #8431
Update README.md
- PR: #8927
Enable optimised attention by default in falcon prefill.
- PR: #8892
Replace FreeList shared_ptr with local_shared_ptr
- PR: #8798
Add dummy_weights mode for mixtral tests
- PR: #8864
Refactor operation calls: Replace operation::run() with operation::launch_op()
- PR: #8893
Use HiFi2 to bump Falcon7b prefill PCC
- PR: #8719
#8902: add input and attn_mask del
- PR: #8928
#8930: Disable llama perf test
- PR: #8935
#0: Add third codeowner to matmul path
- PR: #8934
#0: Add create_venv.sh as environment option in installation instructions
- PR: #8898
#7083: Composite conv fix for relu called after matmul
- PR: #8919
#7525: Skip batch 7 metal BERT on WH B0 because it still hangs too often
- PR: #8938
#8871: Add initial infra/support for dram sharding
- PR: #8901
#8531: delete all makefiles
- PR: #8546
#0: Delete dead code from work_split.hpp
- PR: #8950
#8853: Uplift SFPI to latest w/ BH support
- PR: #8854
#8725: Warn user if kernel cache is enabled
- PR: #8951
#0: Minor test_prefetcher fixes
- PR: #8955
#5389: Move ttnn.repeat to c++
- PR: #8911
#8131: temp fix for PCC issue on W0.
- PR: #8948
Optimize e2e perf Falcon40b modifying layernorm
- PR: #8969
#0: Relax Falcon7b perf target
- PR: #8972
#0: Resolve segfault in llama async mode
- PR: #8963
Resnet Optimizations
- PR: #8933
Create Falcon7b perplexity test and utility functions for text-gen datasets
- PR: #8960
Revert "#8131: temp fix for PCC issue on W0."
- PR: #8984
bmm dram sharded opt
- PR: #8947
#8943: Clean up profiler python_env build flow
- PR: #8949
#8904: Add slack notifications for T3000 unit-tests
- PR: #8906
Add unet shallow functional, performance and demo test files
- PR: #8884
#8932: Multi-Device Mixtral Argmax Support
- PR: #8990
#8264: Worker thread optimizations:
- PR: #8778
TTNN tests for bf8 with mk tiled scalar
- PR: #8485
Ihamer/7468 inject noc delays
- PR: #8889
Support changed csv row orderings in Mixtral's op_perf_results.py
- PR: #8999
Correct merge issue in op_perf_results.py
- PR: #9001
#0: Add kernel groups to test_pgm_dispatch
- PR: #8992
#0: Add docs requirements to python env cache key because it can change the environment as well
- PR: #9010
#0: Add helper function to create CBs
- PR: #8991
#8973: Remove TT_METAL_ENV because we don't need it anymore
- PR: #8974
#5773: Move SD model to demo folder
- PR: #8294
#6938: Implement softplus as a single kernel
- PR: #8249
Model team/rotary embeddings llama
- PR: #8812
#8735: Fix hw/inc/blackhole files for compilation
- PR: #8880
Improve Mixtral perf with ttlib
- PR: #8971
Update README.md
- PR: #9014
#3712: fix old version of GN test
- PR: #9017
#0: Don't error on unused functions in compiler call
- PR: #9018
Revert " #8904: Add slack notifications for T3000 unit-tests"
- PR: #9023
Rtawfik/bh llk api
- PR: #8809
#0: Added interactive demo
- PR: #9020
Move Falcon7b before Mixtral in demo pipeline to workaround issue
- PR: #9034
#8112: Add support for ND tensors to matmul
- PR: #9004
#0: fix dram read benchmark
- PR: #9019
Fix bug in utility_functions::Profiler
- PR: #9025
Remove 1x1 matmul fallback on convolution and generalize convo…
- PR: #8886
#5389: Remove ttnn.split
- PR: #9027
#8767: decouple build folder name from build.cpp
- PR: #8780
#8735: Update common flags for BH build after sfpi module update
- PR: #9024
#8895: Fix ttnn.as_tensor(..) method for placing tensors on-device
- PR: #8964
#8539: Add cq_id to run_operation function args
- PR: #9039
#8632: Support fp32 dest acc en in moreh_sum and moreh_sum_backward
- PR: #8724
#5044: Add optional output tensor and remove autoformat in eltwise binary ops
- PR: #8394
#8895: Fix failing regression test in dump_tensor(...) API
- PR: #9040
More Resnet Optimizations
- PR: #8993
#4858: add typecast fp32 to uint32 op
- PR: #9033
#8995: refactoring moreh arange
- PR: #8996
#0: Add ccache option to build_metal.sh
- PR: #9015
Update Mixtral perf figures
- PR: #9048
#8349: Use BFP4_B for attention mask in falcon7b optimised prefill.
- PR: #9047
#0: Add CODEOWNERS for build_metal.sh
- PR: #9053
Rtawfik/add binary reuse metal
- PR: #8727
Update watcher.rst - use double backticks
- PR: #9054
Falcon40b tt_lib to ttnn.experimental
- PR: #9008
#0: fix dram sharded program cache
- PR: #9031
#7083: New halo fix for enabled program cache
- PR: #8987
#9051: Enable Llama model perf test
- PR: #9052
#8764: Single card WH demo tests
- PR: #9058
#8764: Various docs fixes for WH release
- PR: #8975
#0: Correct script locations for nightly single card
- PR: #9062
#8764: Use new device_l1_small_size fixture for SD demo interactive test
- PR: #9063
#9059: Update matmul test pcc
- PR: #9061
#0: Ensure weka mount is active for demo tests otherwise it won't run
- PR: #9069
#0: remove reserve to avoid bad alloc
- PR: #9067
#8764: Separate n150/n300 demo tests to not run BERT 11 on N150
- PR: #9073
Remove unnecessary llk sfpu param files
- PR: #9065
#9059: Add fallback for getting matmul program config
- PR: #9077
Add grouped convolution support
- PR: #8341
#8282: Support non-4d tensor and fp32_dest_acc_en for moreh nllloss backward
- PR: #8966
#8976: moreh_getitem receive signed integer index tensors
- PR: #8978
#9049: fix moreh_sgd callback and add callback test
- PR: #9050
#0: Remove argmax multi-device test due to segfault
- PR: #9089
#7724: Add prototype for autonomous streams for use in tunneller
- PR: #8207
#9036: GS & BH --> Combine llk param files using variable args
- PR: #9078
#0: optimize allgather for small tensor sizes
...

Assets 5

05 Apr 13:57

github-actions

v0.46.0

cd00276

v0.46.0

📦 Uncategorized

user-triggerable C++ post-commit suite
- PR: #6626
#6406: add missing position_ids/attention_mask to bert demo
- PR: #6617
#6282: Add AdamW
- PR: #6333
#6315: Fix dprint tests for T3000
- PR: #6599
FD2: prefetch stall, dispatch wait, linear read, delay and cleanup
- PR: #6620
#6609: update wording in demo section of main README.md
- PR: #6639
#6364: Autocomplete for pybinded types
- PR: #6440
Asarje/ttnn rn50 b20
- PR: #6629
FD2.0 Test - Fix l1 buffer not page-size aligned in after FD-on-eth changes to L1_UNRESERVED_BASE
- PR: #6646
#6593: Add resharding to Llama2 model when possible.
- PR: #6595
#6572: Fix ttnn.repeat_interleave example in documentation
- PR: #6574
#5780: Re-enable 100K enqueue program stress test on grayskull
- PR: #6648
Enable basic width sharding support in all-gather
- PR: #6642
Alex/metal/remove cb wait markers
- PR: #6628
#6657: Use sysmem manager cq size instead of recomputing it each time…
- PR: #6658
#0: (MINOR) Add Grayskull purchase link and update version to 0.46.0
- PR: #6667
#5063: add TopK API to metal
- PR: #6563
#5480: FD2.0 Test - Fix test_prefetcher for dram paged read test (-t 3) on whb0
- PR: #6663
Fix logit low pcc
- PR: #6538
Backward op - Fixed ldexp, hardsigmoid and asin
- PR: #6542
#6598: Fix softplus
- PR: #6675
Add support for BFP4_B tensor serialization
- PR: #6545
Eltwise mul for different batch size
- PR: #6587
#6575: Split docs into separate Metalium and nn docs
- PR: #6666
#0: Add two separate links for documentation (tt-metalium/ttnn) on README
- PR: #6697
#6361: Update ttnn repeat to use correct shapes when formatting output
- PR: #6526
#0: Sayonaraaaaaaa
- PR: #6702
FD2.0 Test fix test_prefetcher add_paged_dram_data_to_worker_data dropping start_page
- PR: #6703
#5785: Watcher ringbuffer implementation
- PR: #6652
Add FD 2.0 WriteHost Command
- PR: #6614
#0: Put back frequent api tests because I'm an idiot
- PR: #6698
Optimize All Gather Interleaved Worker send/receive
- PR: #6706
#0: changing all #include common/* to #include tt_metal/common/*
- PR: #6669
#6676: Fix issues related to unary lte and gte
- PR: #6685
#5817: Fix lerp
- PR: #6630
#6589: Fix for relu_bw
- PR: #6631
#6633: Backward test update
- PR: #6679
#0: Skip logit, logiteps test
- PR: #6714
#0: Testing CI fix
- PR: #6708
#5480: Update test_prefetcher to pass added hugepage args to dispatch kernel
- PR: #6717
Fix l1 acc, add whb0 optimized conv tests
- PR: #6668
Alignment fix for eth core kernels
- PR: #6696
Add data parallel (multi-chip) for Falcon7b (prefill/decode) model and corresponding tests
- PR: #6656
CQ_DISPATCH_CMD_WRITE_PAGED support in test_dispatcher and passing tests
- PR: #6641
#6647: disable failing ci cpp tests and reenable cpp pipeline on CI
- PR: #6704
Backward test updates
- PR: #6692
Ngrujic/check bugs
- PR: #6688
Add Llama matmul perf tests to main
- PR: #6690
TTLIB: removing working tests from broken
- PR: #6718
#6443: Update backward asin and addcdiv logic
- PR: #6715
#0: Fix output cb size calculation in reshard op for bfp8b
- PR: #6739
#0: use smart ptrs in allocator
- PR: #6719
Jvasilje docs 0322
- PR: #6745
DRAM based device profiler with Tracy support
- PR: #6460
#6553: Fix ttnn.reshape(..) handling for bfloat16, TILE_LAYOUT
PR: #6746
Add Llama2 demo to tt-metal docs
- PR: #6682
Mistral-7B WH demo
- PR: #6501
Revert "#0: Put back frequent api tests because I'm an idiot"
- PR: #6755
FP32 support
- PR: #6747
#0: Add back frequent api tests to run.sh
- PR: #6756
Bteng/watcher ci3
- PR: #6530
Remove cpuprof
- PR: #6758
logo update
- PR: #6762
#6184: sharded row major silu support.
- PR: #6643
#6443: Update div_bw and backward ops test file
- PR: #6742
#6705: Relax forcing of keyword argument in ttnn.open_device
- PR: #6707
Forward op tests
- PR: #6730
#6691: Allow blocking of inner dim within a core for shaded in0 for 2d and 1d systolic matmuls
- PR: #6640
#6662: Width Sharding support for eltwise OP
- PR: #6671
Stable diffusion python API level perf improvements
- PR: #6681
Add get_compute_kernel_config_args function
- PR: #6768
#0: Add fd-2/main triggers for pull_request and push for post-commit
- PR: #6709
#5480: FD2 refactor for pre/dis patch variants
- PR: #6655
#6654: Add perf tests for ttnn ResNet50
- PR: #6673
#5480: Fix fd gtest unit test test_write_host
- PR: #6778
#0: Set myself as setup.py owner
- PR: #6779
#6780: Add mistral7b to demos list in getting started
- PR: #6781
#4003: re-added TTNN_ENABLE_LOGGING as runtime flag
- PR: #6750
#0: Fix semaphore address gen bug
- PR: #6233
#6769: Disable program caching for failing Llama tests.
- PR: #6770
#5480: Fix zero sized write transaction request that could occur in write_linear_host
- PR: #6784
#6077: Fix unet pcc issues
- PR: #6660
Remove DstSync from llk api templates
- PR: #6753
FP32 Support
- PR: #6785
#6680: Reverting move op change
- PR: #6811
#6443: Update asinh and softsign backward
- PR: #6773
Backward tests with updated test modules
- PR: #6765
Ngrujic/check bugs 1
- PR: #6734
#6654: Moving init for self.compute_kernel_config
- PR: #6782
#6805: reproduce the bug with sharded split_query_key_value_and_split_heads
- PR: #6806
#6832: Account for tile-padding in softmax for mistral 7B
- PR: #6833
Enable support for uint32 format to be consumed by SFPU (issue #4624)
- PR: #6796
#4252: fix clang build error since std::log2 only constexpr in gcc
- PR: #6835
#4003: log, debug and add pre- and post- hooks only for top-level ttnn ops
- PR: #6841
#6823: Fix core count to not include dispatch cores in op reprot
- PR: #6831
#6197: Align pages for interleaved <-> sharded.
- PR: #6828
METALIUM_GUIDE
- PR: #6846
Bteng/watcher post commit
- PR: #6760
#6443: update backward test file for relational ops and concat op
- PR: #6817
Revert "Bteng/watcher post commit"
- PR: #6866
#6443: Update backward ops
- PR: #6826
Backward test updates
- PR: #6822
#0: Add the dim 0 support repeat backward
- PR: #5596
Update hard related test ops
- PR: #6816
#6757: Remove set_profiler_location
- PR: #6824
#6443: Update backward ops erfinv elu hypot cos sin
- PR: #6827
#6861: Enable Watcher/dprint tests on T3000 CI
- PR: #6869
Update Mistral perf regression for CI, until issue is resolved
- PR: #6883
Mamba/perf v1
- PR: #6744
#0: remove data movement ops related to silu in SD
- PR: #6798
#4003: added proper fallback for getitem of ttnn.Tensor. Slice the tensor only on the tile boundary but set the shape based on whatever user provided
- PR: #6886
#4003: added proper fallbacks for every op that falls back to torch
- PR: #6888
#6731: add fix to LN width sharding
- PR: #6891
#5797: add back sweep test for ln
- PR: #6893
Integrate GroupNorm V2 to SD model
- PR: #6862
METALIUM_GUIDE.md updates
- PR: #6863
[Falcon7b] Fix bugs with inference throughput measurements in demo
- PR: #6884
#0: shallow unet add perf_mode
- PR: #6904
#6154: 2d matmul in0 height, in1 width sharding
- PR: #6821
#5249: Various Falcon40b test and demo cleanup
- PR: #6764
#0: fix incremental build
- PR: #6914
#0: remove upsample spill to DRAM
- PR: #6905
[Llama2 Prefill] Model Functionality completed
- PR: #6800
Watcher alignment checking for PCIe/DRAM <-> L1
- PR: #6901
#6920: fixed the error in whisper
- PR: #6921
Update METALIUM_GUIDE.md
- PR: #6902
#6644: save l1 buffers to data base
- PR: #6856
Update usage.rst
- PR: #6929
#6804: fix ttnn falcon7b demo regression + add to CI regressions
- PR: #6924
#6285: Add backward support for floor round and div_no_nan
- PR: #6290
[skip ci] Update INSTALLING.md
- PR: #6936
#6873: Add more test combinations to tt_lib sweeps add, add_unary, su…
- PR: #6887
Ngrujic/check bugs 3
- PR: #6951
#6882: Updated Mistral-7b perf estimate
- PR: #6892
#6850: Update install links in Sphinx docs to point directly to INSTALLING.md
- PR: #6953
#6619: Fix per op profiler sum
- PR: #6955
#6644: sync before calling print l1 buffers
- PR: #6958
Barsic/ttlib ops check
- PR: #6772
Barsic/ttlib params fix
- PR: #6944
#6962: Move cd tt-metal earlier in the command list of INSTALLING.md
- PR: #6966
#6819: Add support for CreateKernel absolute file paths
- PR: #6922
#6356: Remove half-half grid logic for bmms
- PR: #6968
#4003: added a flag to disable ttnn fallbacks. Don't throw an error w…
- PR: #6961
#0: Correct FW versions, tt-smi versions, and add note about tt-topology
- PR: #6971
#0: Capitalize tt to TT consistently for marketing
- PR: #6973
#0: Add myself as CODEOWNER for INSTALLING.md
- PR: #6974
#6644: ttnn visualizer
- PR: #6935
#6847: Allow disabling individual watcher features
- PR: #6855
#6889: Support printing/padding/tilizing multi-device tensors
- PR: #6976
#4003: removed ttnn.print_l1_buffers and consolidated all ttnn flags into a CONFIG class
- PR: #6980
#6217: tt_lib async mode support (single chipp tensors supported)
- PR: #6700
Reshard With Ranges
- PR: #6919
#4003: updated buffer report to show...

Assets 5

22 Mar 18:03

github-actions

v0.45.0

4f11681

v0.45.0

🚀 Features

#6204: added support for num_users < 32 for update cache op.
- PR: #6213
#6247 Llama2 Galaxy MLP implementation
- PR: #6265

📦 Uncategorized

#4736: Add support for moreh_norm op
- PR: #4864
Fix moreh_layernorm rstd
- PR: #5616
#5508: Change test_moreh_layernorm.py for debugging
- PR: #5619
#4686: add infra for sharing global struct among ops
- PR: #5456
#5592: Fix pcc on Falcon 7b prefill by turning on l1 packer on MLP 4h-to-h matmul
- PR: #5686
Fix layernorm beta data format reconfig
- PR: #5760
Add linked support for in0 in1 mcast in matmul
- PR: #5759
#4957: optimizing construct_2d_padded_tensor_list
- PR: #5614
#4003: added ttnn.as_tensor and enabled support for caching torch tensor
- PR: #5809
Revert "#0: Fix for fail in asinh backward"
- PR: #5886
#5829: Use moreh_common.hpp for data movement kernels across moreh OPs
- PR: #5833
Barsic/ttnn ops
- PR: #5892
#6030: Update resnet performance metrics
- PR: #6030
#5876: pytest & c++ test logging cleanup
- PR: #5987
#0: Use both 2x2 and 2x4 machines on every scheduled run
- PR: #6091
Add single core matmul benchmark
- PR: #5997
#6079: Update FORCE_INLINE to be nop when watcher is enabled
- PR: #6092
#5980: Fix a hard-coded bounds check in dprint
- PR: #6028
#5389: merged ttl and ttnn tensor classes into one
- PR: #6051
Initial Performance Model
- PR: #6025
fix ci
- PR: #6089
TTNN RN50 :: on the road to match perf with TTLIB version
- PR: #6046
#4438: Optimized single-core fold op
- PR: #5999
#5589: Add repeat-interleave and addcmul sweeps
- PR: #6102
#6055: Add square backward support
- PR: #6071
#6057: Add backward support for lgamma
- PR: #6059
#6056: Add backward support for frac and trunc
- PR: #6065
#6066: Add support for backward log sigmoid
- PR: #6069
#6002: Add backward support for binary maximum
- PR: #6003
Ngrujic/improve conversion to bfloat8b in sweeps
- PR: #6068
#5829: Use moreh_common.hpp for compute kernels across moreh OPs
- PR: #6122
#0: Remove post-commit label from multi device pipeline because it's not actually post commit
- PR: #6142
Add pack l1 acc to resnet conv
- PR: #6054
#6144: Skip 512x512 cross attn 2d upblock for now in nightly because it hangs
- PR: #6145
#6061: Add tanhshrink, threshold, Unary EQ backward ops support
- PR: #6137
Width Sharded Concat for Unet
- PR: #5776
#5184: uncommenting various moreh test case.
- PR: #6143
Fix compute kernel config arg for resnet50
- PR: #6147
Nsmith/untilize unit test
- PR: #6105
Revert "Revert "#5389: merged ttl and tensor classes into one""
- PR: #6158
#4438: Do not use the new fold op in Resnet tests
- PR: #6153
Remove corerangeset that does not work on wormhole
- PR: #6156
#6129: Expose kernel config attrs and use 4 dst tiles for fp32 configs
- PR: #6134
#5391: Add device perf
- PR: #5875
#0: Use multiplier for wormhole b0 mulsi3
- PR: #6160
#4003: removed ttnn.Tensor autoclass from tensor.rst
- PR: #6170
TTNN MultiDevice Support
- PR: #6131
build artifacts
- PR: #6111
#4947: Add noc alignment checks to watcher
- PR: #5998
Add ttnn multi-chip unit test for checking device shards
- PR: #6179
Nsmith/fix unet
- PR: #6141
#6043: Random program stress test of command queues
- PR: #6044
Logit and logiteps backward support
- PR: #6016
Backward support for log2
- PR: #6064
Add missing ttnn tests and disable broken tests until issues are fixed
- PR: #6186
Fix Events feature for FD1.3 (out-of-order event ids, events feature missing) #6093
- PR: #6181
#5873: make top-level post commit workflow re-useable
- PR: #6188
#5589: add groupnorm for ttnn sweeps
- PR: #6167
Ngrujic/ttnn sweeps 4
- PR: #6135
Add ethernet datamover (EDM) - a foundational ethernet transfer engine
- PR: #5718
#6116: Add backward support for softshrink
- PR: #6118
#0: Add verbose make logs to artifact and make nicer name on metal
- PR: #6199
#0: Only use 2x4 setup for multi-card WH CI as 2x2 does not provide us good feedback
- PR: #6202
#4809 dprint tensix regs
- PR: #6072
#4003: fixed bloom perf test
- PR: #6208
#6187: Conv bugfix
- PR: #6205
#0: concat RM support variable stick widths across inputs
- PR: #6207
TTNN RN50 on WHB0
- PR: #6173
#6084: Lower thresholds slightly after using proper configs for device resnet
- PR: #6214
Fast dispatch 2.0 proof of concept
- PR: #6176
#6218: add pytest for matmul 1d 2d
- PR: #6219
#6177: use is_tensor_storage_on_device so it works for MultiDeviceStorage
- PR: #6178
#6082: support workers + eth cores in one program
- PR: #6172
#6215: Rename TensorToMeshMapper/MeshToTensorComposer
- PR: #6220
#6164: Update test_noc_unicast_vs_multicast_to_single_core_latency to not use same cores for producer and consumer on WH
- PR: #6224
#6117: Add backward support for softplus
- PR: #6128
#6223: remove redundant call to context switch
- PR: #6225
Integrate EDM with all-gather.
- PR: #6169
#6136: Add backward support for unary LE and GE
- PR: #6138
#5398: fix unicast binaries
- PR: #6231
Barsic/ttnn ops 2
- PR: #6070
#5380: Add wormhole_b0 model perf tests, only falcon7b in ttlib for now
- PR: #6216
#5372: Updated README.md file for demo
- PR: #6060
#4003: updated ttnn.concat to have a registered fallback
- PR: #6127
Llama2 functional bringup
- PR: #6087
#5589: Add working BFLOAT8_B sweeps to working folder
- PR: #6192
FD2.0 rename HostQ->PrefetchQ, add multi-core capability, fix NOC coords
- PR: #6229
#0: bugfix in ttnn resnet caught by nightly
- PR: #6251
#0: fix tt_bisect build bug
- PR: #6256
Watcher Asserts
- PR: #6175
#6183: add unit test for sd matmul ops
- PR: #6246
#6254: Make program cache per device:
- PR: #6255
#5394: Add functional version of Mamba architecture
- PR: #5948
#6257: Add temporary convenience script for 800MHz / new eth reset dependent CI
- PR: #6258
#5661: Enable gtests for fast dispatch + R chip
- PR: #6110
Alex/metal/bmm large block untilize out
- PR: #6201
#5389: made tensor attributes public and use ttnn::Shape instead of tt::tt_metal::Shape for storing shape
- PR: #6261
Revert "#6183: add unit test for sd matmul ops"
- PR: #6278
#4003: print all of the L1 buffers using ttnn.print_l1_buffer_state
- PR: #6268
#4003: print all of the L1 buffers using ttnn.print_l1_buffers
- PR: #6279
#4438: Implement sharded multi-core fold op for Resnet50
- PR: #6275
#6149: disabled the check for comparing generated report with GOLDEN_L1_BUFFER_REPORT becauson pipelines it looks different than when running locally
- PR: #6280
FD2.0 fixes+mcast support for write and packed_write
- PR: #6263
Shwetank tt/config
- PR: #5843
#0: Change order of device and use_program_cache fixture in remaining pytests
- PR: #6269
Softplus with beta and threshold param
- PR: #6239
Build tests during artifact creation
- PR: #6286
#6149: disabled test_print_l1_buffers_of_add_operation
- PR: #6299
#4003: updated ttnn.to_torch to work with bfloat8_b tensors that are not multiple of tile size without tile padding
- PR: #6277
#0: add to/from L1 reshard test
- PR: #6309
#0: Add back deleted shape assertions for interleaved concat
- PR: #6307
test errors flagged by watcher
- PR: #6320
#0: fix incremental build
- PR: #6103
Merge xuncai/llama-attention-galaxy to main: First version of llama-attention galaxy on emulated chips
- PR: #6297
#6329: Fixing a bug causing mismatch on indices
- PR: #6330
#6321: Test which sweeps read/write buffer and just checks that the e…
- PR: #6322
Support moreh_getitem forward
- PR: #6227
#6125: Update in0_block_w to be full shard width for sharded 2D systolic matmul
- PR: #6262
#6107: Add softsign, sign, unary ceil backward support
- PR: #6191
#6226: Add backward support for div
- PR: #6235
#6234: Add backward support for rdiv
- PR: #6238
#6236: Add backward support for fmod and remainder
- PR: #6240
#4003: added positional embeddings to bert and updated ttnn_sharded_optimized_bert to run with batch size of 12
- PR: #6327
Indexed Fill
- PR: #6328
#5589: remove dtype in gen function sweep tests where needed
- PR: #6249
#6347: Print built-in defines once only
- PR: #6351
#0: Add Mo as code owner on profiler code
- PR: #6352
#0: Simplify tt_lib.scripts package by adding a specific tt_eager/scripts directory and putting the production scripts in there, whereas development scripts will stay in /scripts
- PR: #6324
#0: Fixture reorder changes reverted for falcon_7b perf test
- PR: #6318
#5424: remove metal_ckernel_sfpu
- PR: #5665
#0: Update remaining tt_lib.program_cache calls to use device APIs
- PR: #6357
#6183: add unit test for sd matmul ops
- PR: #6323
#6289: fix dispatcher page calculation
- PR: #6340
#5924: Enable unet on wormhole_b0 changes
- PR: #6198
#6325: skip test_multi_device.py for grayskull arch
- PR: #6332
Alex/metal/pack untilize no repack
- PR: #6371
#6144: Not hanging on GS or WH with or without Watcher
- PR: #6373
Agrebenisan/swq hwq cardinality cleanup
- PR: #6369
#6146: Add backward support for conj
- PR: #6272
#0: bug fix UTWH div_up instead of div trunc for calculating CB sizes
- PR: #6367
Fix To/From Sharded Bug
- PR: #6381
#6206: Fix resharding page mapp...

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

🚀 Features

📦 Uncategorized

Releases: tenstorrent/tt-metal

v0.51.0-rc5

📦 Uncategorized

v0.51.0-rc4

📦 Uncategorized

v0.51.0-rc3

📦 Uncategorized

v0.51.0-rc2

📦 Uncategorized

v0.51.0-rc1

📦 Uncategorized

v0.50.0

📦 Uncategorized

v0.49.0

📦 Uncategorized

v0.48.0

📦 Uncategorized

v0.46.0

📦 Uncategorized

v0.45.0

🚀 Features

📦 Uncategorized